This project aims to predict the most frequently used facebook reaction for a given text.
Virtualenv can be used to simply the process.
- python3
- Clone the repository
- Run
python3 stup.py
Imagine you want to train your Model with facebook posts from CNN. This is the standard procedure you would do:
- Find the id of the page you want to crawl. The fastest way to retrieve a page id is https://findmyfbid.com/. (e.G. For CNN it is 5550296508.)
- Get yourself a facebook graph API access token using the graph API explorer https://developers.facebook.com/tools/explorer/.
- Or crawl pages. (dont forget to create /data/datasets folder)
python3 crawlpagepreferences.py YOUR_FB_ACCESS_TOKEN -c 500
- Crawling posts.
python3 crawl.py -i 5550296508 YOUR_FB_ACCESS_TOKEN
- Filter the crawled data using the filter script.
python3 filter.py cnn.json
- Normalize the filtered data using the normalize script.
python3 normalize.py cnn_filtered.json
- Train the model using the train script.
python3 train.py cnn_filtered_normalized.json
- Question the trained model using the requestmodel script.
python3 requestmodel.py "Your newest FB post!"
The usage of the script crawlpagepreferences.py
is as follows:
usage: crawlpagepreferences.py [-h] [-c, --count page_count]
[-l, --limit rate_limit] [-e, --erase erase]
[-f, --file FILE] [-sp, --specific SPECIFIC]
[-s, --skip] [-nj, --nojoy] [-v, --value value]
access_token
Crawl facebook and represent page preferences.
positional arguments:
access_token a facebook access token
optional arguments:
-h, --help show this help message and exit
-c, --count page_count
amount of page to be fetched
-l, --limit rate_limit
limit of API requests per hour
-e, --erase erase overwrite existing files
-f, --file FILE a json file [{"id": xxxx, "name": "page_name"}, ...]
-sp, --specific SPECIFIC
only crawl specific pages from category list
-s, --skip skip steps 0-4 (Crawling datas)
-nj, --nojoy show or not joy reaction
-v, --value value how many difference between main and other reactions
in percent
You can also provide a file in order to only plot preferences figures. To only crawl specific pages from category list, please provide a list of category (one by line) in a external file.
The usage of the script crawl.py
is as follows:
usage: crawl.py [-h] [-c, --count post_count] [-l, --limit rate_limit]
(-f, --file FILE | -i, --id PAGE_ID)
access_token
Crawl facebook reactions from pages.
positional arguments:
access_token a facebook access token
optional arguments:
-h, --help show this help message and exit
-c, --count post_count amount of posts to be fetched from each page
-l, --limit rate_limit limit of API requests per hour
-f, --file FILE a json file [{"id": xxxx, "name": "page_name"}]
-i, --id PAGE_ID a facebook page id
You have to provide your Facebook access token, as well as either a page id or a file, containing ids.
If you choose to provide a file, e.g. for crawling multiple pages at once, use the following schema:
[{
"id": 12345678,
"name": "a facebook page"
},{
"id": 87654321,
"name": "another facebook page"
}]
The output is written to a file for each provided page individually.
The usage of the script filter.py
is as follows:
usage: filter.py [-h] [-u, --filter-urls filter_urls]
[-c, --min-char min_char] [-r, --min-reactions min_reactions]
[-g, --reaction-gap reaction_gap]
filename
Filter crawled facebook reactions.
positional arguments:
filename a crawled json file
optional arguments:
-h, --help show this help message and exit
-u, --filter-urls filter_urls whether to filter URLs
-c, --min-char min_char a minimal character count
-r, --min-reactions min_reactions a minimal reaction count
-g, --reaction-gap reaction_gap a percentage value the dominant reaction has
to be above the secondary reaction
The usage of the script normalize.py
is as follows:
usage: normalize.py [-h] filename
Normalize crawled and filtered facebook reactions.
positional arguments:
filename a filtered json file
optional arguments:
-h, --help show this help message and exit
The usage of the script train.py
is as follows:
usage: train.py [-h] filename
Train model based on normalized facebook reactions.
positional arguments:
filename a normalized json file
optional arguments:
-h, --help show this help message and exit
The usage of the script requestmodel.py
is as follows:
usage: requestmodel.py [-h]
Load a trained model and place requests.
optional arguments:
-h, --help show this help message and exit
How do I get a Facebook access token?
Go to the Graph API Explorer and request an access token with your Facebook user on the top right corner.
How do I get a page id?
Go to Facebook and navigate to the desired page. Now open the source code (e.g.
ctrl + u
) and search for"uid":
. You just found your page id!