Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

juliasilge / tidytext Public

Notifications You must be signed in to change notification settings
Fork 181
Star 1.2k

Code
Issues 11
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: juliasilge/tidytext

Releases · juliasilge/tidytext

tidytext 0.2.3

04 Mar 15:59

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.2.3

Wrapper tokenization functions for n-grams, characters, sentences, tweets, and more, thanks to @ColinFay (#137).
Simplify get_sentiments() thanks to @jennybc (#151).
Fix flaky tests for corpus tidiers.

Assets 2

Loading

All reactions

tidytext 0.2.2

30 Jul 14:05

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.2.2

Access NRC lexicon via textdata package

Assets 2

Loading

All reactions

tidytext 0.2.1

14 Jun 17:00

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.2.1

Fix bug in augment() function for stm topic model.
Warn when tf-idf is negative, thanks to @EmilHvitfeldt (#112).
Switch from importing broom to importing generics, for lighter dependencies (#133).
Add functions for reordering factors (such as for ggplot2 bar plots) thanks to @tmastny (#110).
Update to tibble() where appropriate, thanks to @luisdza (#136).
Clarify documentation about impact of lowercase conversion on URLs (#139).
Change how sentiment lexicons are accessed from package (remove NRC lexicon entirely, access AFINN and Loughran lexicons via textdata package so they are no longer included in this package).

Assets 2

Loading

All reactions

tidytext 0.2.0

18 Oct 19:27

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.2.0

Improvements to documentation (#117)
Fix for NSE thanks to @lepennec (#122).
Tidier for estimated regressions from stm package thanks to @jefferickson (#115).
Tidier for correlated topic model from topicmodels package (#123).

Assets 2

Loading

All reactions

tidytext 0.1.9

29 May 20:46

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.9

Updates to documentation (#109) thanks to Emil Hvitfeldt.
Add new tokenizers for tweets, Penn Treebank to unnest_tokens().
Better error message (#111) and code styling.
Declare dependency for tests.

Assets 2

Loading

All reactions

tidytext 0.1.8

25 Mar 23:27

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.8

Updates to documentation (#102), README, and vignettes.
Add tokenizing by character shingles thanks to Kanishka Misra (#105).
Fix tests for skip grams thanks to Lincoln Mullen (#106).

Assets 2

Loading

All reactions

tidytext 0.1.7

20 Feb 03:22

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.7

unnest_tokens can now unnest a data frame with a list column (which formerly threw the error unnest_tokens expects all columns of input to be atomic vectors (not lists)). The unnested result repeats the objects within each list. (It's still not possible when collapse = TRUE, in which tokens can span multiple lines).
Add get_tidy_stopwords() to obtain stopword lexicons in multiple languages in a tidy format.
Add a dataset nma_words of negators, modals, and adverbs that affect sentiment analysis (#55).
Updated various vignettes/docs/tests so package can build on R-oldrel.

Assets 2

Loading

All reactions

tidytext 0.1.5

18 Nov 14:27

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.5

Change how NA values are handled in unnest_tokens so they no longer cause other columns to become NA (#82).
Update tidiers and casters to align with quanteda v1.0 (#87).
Handle input/output object classes (such as data.table) consistently (#88).

Assets 2

Loading

All reactions

tidytext 0.1.4

30 Sep 17:20

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.4

Fix tidier for quanteda dictionary for correct class (#71).
Add a pkgdown site.
Convert NSE from underscored function to tidyeval (unnest_tokens, bind_tf_idf, all sparse casters) (#67, #74).
Added tidiers for topic models from the stm package (#51).

Assets 2

Loading

All reactions

tidytext 0.1.3

19 Jun 16:42

juliasilge

Compare

Choose a tag to compare

Loading

tidytext 0.1.3

get_sentiments now works regardless of whether tidytext has been loaded or not (#50).
unnest_tokens now supports data.table objects (#37).
Fixed to_lower parameter in unnest_tokens to work properly for all tokenizing options.
Updated tidy.corpus, glance.corpus, tests, and vignette for changes to quanteda API
Removed the deprecated pair_count function, which is now in the in-development widyr package
Added tidiers for LDA models from the mallet package
Added the Loughran and McDonald dictionary of sentiment words specific to financial reports
unnest_tokens preserves custom attributes of data frames and data.tables

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.