Skip to content

kvangorp/SENG474

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

SENG474 Project

Data Processing

  • To run the data processing script, you need to have newspaper3k and nltk installed. It takes about an hour to run but it outputs a csv file with the processed data, so we won't need to run it each time we train our models.
  • The cleaned and processed data can be found in data/processed-dataset.csv
  • Each row of the csv file contains the number of occurrences of each english word in an article, excluding stop words
  • The last column is the label and is 1 is the article is real, 0 if the article is fake.
  • I was having problems with stemming the words, but we can take another crack at this if we need to reduce our number of features
  • We could also try using sklearn's TfidfVectorizer instead CountVectorizer later to see if that improves our accuracy

Attribution

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages