Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Wikipedia corpus metadata accessible #3007

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Commits on Nov 26, 2020

  1. Update wikicorpus.py

    Let the users have metadata (e.g. title) if they need it. Added an argument in WikiCorpus __init__() to specify if metadata is needed. Previously, it was set to False and could not be toggled.
    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    a48ec39 View commit details
    Browse the repository at this point in the history
  2. Update wikicorpus.py

    Make Wikipedia corpus metadata accessible.
    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    a133ea7 View commit details
    Browse the repository at this point in the history
  3. Update wikicorpus.py

    Allow users to access metadata by allowing self.metadata in WikiCorpus to be set by a parameter. However, Dictionary() raises "TypeError: decoding to str: need a bytes-like object, list found" if metadata is returned. So, introduced a dictionary_mode parameter in get_texts() so that metadata bypasses the dictionary, and goes directly to the user.
    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    52b8ffc View commit details
    Browse the repository at this point in the history
  4. Update wikicorpus.py

    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    c2a35c3 View commit details
    Browse the repository at this point in the history
  5. Update wikicorpus.py

    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    1aaefb6 View commit details
    Browse the repository at this point in the history
  6. Update wikicorpus.py

    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    79ee20e View commit details
    Browse the repository at this point in the history
  7. Update wikicorpus.py

    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    151ce19 View commit details
    Browse the repository at this point in the history
  8. Update wikicorpus.py

    kumarneelabh13 authored Nov 26, 2020
    Configuration menu
    Copy the full SHA
    ce6ebfa View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2021

  1. Configuration menu
    Copy the full SHA
    92d64e8 View commit details
    Browse the repository at this point in the history