Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

itslab-kyushu/youtube-comment-crawler

Repository files navigation

Youtube Comment Crawler

MIT License CircleCI wercker status Release Dockerhub MicroBadger

Scraping trending video page every day and comments posted to those videos every 30 mins.

Crawled comments are stored in comments.json; each line of the file consists of a JSON object outputted by youtube-comment-scraper. See the project page for more information about the format.

Run via npm

Prepare

After cloning this repository, install related modules via npm:

$ git clone https://github.com/itslab-kyushu/youtube-comment-crawler.git
$ cd youtube-comment-crawler
$ npm install

Start

To start the crawling service and store database files into ./data, run

$ npm start --dir ./data

By default, it crawls English page; to crawl pages in another language, give the language via --lang option. For example, the following command starts to crawl Japanese pages:

$ npm start --dir ./data --lang JP

Run as a docker container

Youtube Comment Crawler is also provided as a docker image, itslabq/youtube-comment-crawler. It stores database files in /data and you shouldn't give --dir option.

To run a container and mount ./data so that database files are stored in ./data:

$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler

If you want to crawl pages in another language, give the language via --lang option. The following example starts to crawl Japanese pages:

$ docker run -d --name crawler -v $(pwd)/data:/data:Z itslabq/youtube-comment-crawler --lang JP

License

This software is released under the MIT License, see LICENSE.