Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Unstructured with NLTK changes #1852

Merged
merged 1 commit into from
Jan 21, 2025
Merged

Update Unstructured with NLTK changes #1852

merged 1 commit into from
Jan 21, 2025

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Jan 21, 2025

Fixes Unstructured Docker image with changes required in Unstructured-IO/unstructured#3853


Important

Updates Docker setup for Unstructured service by removing an outdated Dockerfile and adding NLTK data setup.

  • Dockerfile Changes:
    • Deletes py/Dockerfile.unstructured.
    • Updates services/unstructured/Dockerfile.unstructured to include NLTK data setup.
  • Environment Setup:
    • Sets NLTK_DATA environment variable in services/unstructured/Dockerfile.unstructured.
    • Downloads NLTK data (punkt_tab, averaged_perceptron_tagger_eng) to specified directory.

This description was created by Ellipsis for 1cf92f5. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 1cf92f5 in 16 seconds

More details
  • Looked at 50 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. services/unstructured/Dockerfile.unstructured:21
  • Draft comment:
    Ensure that the NLTK data is correctly downloaded and available in the specified directory. This is crucial for the application to function correctly if it relies on NLTK for natural language processing tasks.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The NLTK data download command is correct, but the comment should be placed on the relevant lines.

Workflow ID: wflow_9wPdygLFRr3UANJz


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem merged commit 197e8b8 into main Jan 21, 2025
13 of 14 checks passed
@NolanTrem NolanTrem deleted the Nolan/Unstructured branch January 21, 2025 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant