Skip to content
This repository has been archived by the owner on Apr 10, 2019. It is now read-only.

chakki-works/elephant_sense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ba7c95e · Dec 10, 2018

History

58 Commits
Mar 24, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Dec 10, 2018
Apr 7, 2017
Apr 7, 2017
Apr 7, 2017
Apr 5, 2017

Repository files navigation

elephant-sense

Content itself quality evaluation by machine learning

top.PNG

You can try from Here.

Setup

Get Qiita API token and set it to environment variable.

$ export QiitaToken=xxx

(only read_qiita scope is required)

Then use Dockerfile and run!

For Training the Model

Data Preparation

  • Locate the Qiita posts on data/raw/items
    • You can get Qiita posts by Qiita API
    • 1 post is 1 json file whose name is post id (like 0a0000aa0a0000a00aa0.json).
  • Locate the annotated file labeled_qiita_posts.csv on data/raw.
    • It's format is No,url,Title, and annotator1, annotator2... (column names are as you like ).

Data Preprocessing

Run the following script.

python scripts/data/make_data.py

Then, labeled json file is stored at data/processed/items.

Next, execute preprocessing.

python scripts/data/preprocessing.py

posts.json will be created at data/processed/.
posts.json includes splited tokens of each posts. You can use this to get the words in the posts.