Scraping trending video page every day and comments posted to those videos every 30 mins.
Crawled comments are stored in comments.json
; each line of the file consists
of a JSON object outputted by
youtube-comment-scraper.
See the project page for more information about the format.
After cloning this repository, install related modules via npm:
$ git clone https://github.com/itslab-kyushu/youtube-comment-crawler.git
$ cd youtube-comment-crawler
$ npm install
To start the crawling service and store database files into ./data
, run
$ npm start --dir ./data
By default, it crawls English page;
to crawl pages in another language, give the language via --lang
option.
For example, the following command starts to crawl Japanese pages:
$ npm start --dir ./data --lang JP
Youtube Comment Crawler is also provided as a docker image,
itslabq/youtube-comment-crawler.
It stores database files in /data
and you shouldn't give --dir
option.
To run a container and mount ./data
so that database files are stored in
./data
:
$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler
If you want to crawl pages in another language, give the language via --lang
option. The following example starts to crawl Japanese pages:
$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler --lang JP
This software is released under the MIT License, see LICENSE.