Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Chore]: Monitor presidio-analyzer releases #1054

Open
aponcedeleonch opened this issue Feb 14, 2025 · 2 comments
Open

[Chore]: Monitor presidio-analyzer releases #1054

aponcedeleonch opened this issue Feb 14, 2025 · 2 comments

Comments

@aponcedeleonch
Copy link
Contributor

aponcedeleonch commented Feb 14, 2025

Description

We're using presidio-analyzer==2.2.357 (latest release) for our PII pipeline step. There's a known bug with presidio-analyzer using numpy>=2.0.0. The workaround is to keep pinned numpy==1.26.4. The bug on presidio-analyzer seems that was caused by a bug in thinc which is fixed. We need to contribute upstream with a patch in presidio-analyzer or monitor their releases to be able to bump numpy

Additional Context

presidio-analyzer==2.2.357 dependency tree.

presidio-analyzer 2.2.357 Presidio Analyzer package
├── phonenumbers >=8.12,<9.0.0
├── pyyaml *
├── regex *
├── spacy >=3.4.4,<3.7.0 || >3.7.0,<4.0.0
│   ├── catalogue >=2.0.6,<2.1.0
│   ├── cymem >=2.0.2,<2.1.0
│   ├── jinja2 *
│   │   └── markupsafe >=2.0
│   ├── langcodes >=3.2.0,<4.0.0
│   │   └── language-data >=1.2
│   │       └── marisa-trie >=1.1.0
│   │           └── setuptools *
│   ├── murmurhash >=0.28.0,<1.1.0
│   ├── numpy >=1.19.0
│   ├── packaging >=20.0
│   ├── preshed >=3.0.2,<3.1.0
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   └── murmurhash >=0.28.0,<1.1.0 (circular dependency aborted here)
│   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0
│   │   ├── annotated-types >=0.6.0
│   │   ├── pydantic-core 2.27.2
│   │   │   └── typing-extensions >=4.6.0,<4.7.0 || >4.7.0
│   │   └── typing-extensions >=4.12.2 (circular dependency aborted here)
│   ├── requests >=2.13.0,<3.0.0
│   │   ├── certifi >=2017.4.17
│   │   ├── charset-normalizer >=2,<4
│   │   ├── idna >=2.5,<4
│   │   └── urllib3 >=1.21.1,<3
│   ├── setuptools * (circular dependency aborted here)
│   ├── spacy-legacy >=3.0.11,<3.1.0
│   ├── spacy-loggers >=1.0.0,<2.0.0
│   ├── srsly >=2.4.3,<3.0.0
│   │   └── catalogue >=2.0.3,<2.1.0 (circular dependency aborted here)
│   ├── thinc >=8.2.2,<8.3.0
│   │   ├── blis >=0.7.8,<0.8.0
│   │   │   └── numpy >=1.19.0 (circular dependency aborted here)
│   │   ├── catalogue >=2.0.4,<2.1.0 (circular dependency aborted here)
│   │   ├── confection >=0.0.1,<1.0.0
│   │   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0 (circular dependency aborted here)
│   │   │   └── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   ├── cymem >=2.0.2,<2.1.0 (circular dependency aborted here)
│   │   ├── murmurhash >=1.0.2,<1.1.0 (circular dependency aborted here)
│   │   ├── numpy >=1.19.0,<2.0.0 (circular dependency aborted here)
│   │   ├── packaging >=20.0 (circular dependency aborted here)
│   │   ├── preshed >=3.0.2,<3.1.0 (circular dependency aborted here)
│   │   ├── pydantic >=1.7.4,<1.8 || >1.8,<1.8.1 || >1.8.1,<3.0.0 (circular dependency aborted here)
│   │   ├── setuptools * (circular dependency aborted here)
│   │   ├── srsly >=2.4.0,<3.0.0 (circular dependency aborted here)
│   │   └── wasabi >=0.8.1,<1.2.0
│   │       └── colorama >=0.4.6
@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio releases to be able to bump numpy [Chore]: Monitor presidio releases to bump numpy Feb 14, 2025
@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio releases to bump numpy [Chore]: Monitor presidio-analyzer releases to bump numpy Feb 14, 2025
@aponcedeleonch
Copy link
Contributor Author

aponcedeleonch commented Feb 14, 2025

Incidentally spacy was also brought in by presidio-analyzer. spacy brings thinc which brings blis (see the dependency tree above). There's a bug in blis==1.2.0 when building in arm which we hit and solved in #1047 . The workaround is capping spacy<3.8.0 Whenever we bump presidio-analyzer we need to be careful with its sub-dependencies making sure nothing breaks. In the meantime, we won't be able to bump spacy to avoid also bumping blis

@aponcedeleonch aponcedeleonch changed the title [Chore]: Monitor presidio-analyzer releases to bump numpy [Chore]: Monitor presidio-analyzer releases Feb 17, 2025
@aponcedeleonch
Copy link
Contributor Author

presidio-analyzer is also preventing us of having support on Python 3.13. We pinned Python to 3.12 in #1009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant