From bf88d4f6cd5c50f463325f37196df4aa5fc50950 Mon Sep 17 00:00:00 2001 From: Larry Davis Date: Fri, 13 Mar 2020 20:24:20 -0700 Subject: [PATCH] docs: add details to README, LICENSE --- LICENSE | 9 +++++++++ README.md | 46 +++++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 52 insertions(+), 3 deletions(-) create mode 100644 LICENSE diff --git a/LICENSE b/LICENSE new file mode 100644 index 000000000..4f476ffb3 --- /dev/null +++ b/LICENSE @@ -0,0 +1,9 @@ +Copyright (c) 2020, Lawrence Davis +All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. +2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. diff --git a/README.md b/README.md index 4ae9a4246..c5de6abe3 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,50 @@ -# corona-scraper +# coronadatascraper +> A scraper that pulls coronavirus case data from verified sources. -Scrape case data from goverment websites. +## Running the scraper -## Usage +Before following these instructions, install [yarn](https://classic.yarnpkg.com/en/docs/install/). ``` yarn install yarn start ``` + +## Contributing + +Contributions for any place in the world are welcome. Write clean and clear code, and please ensure to follow the criteria below for sources. + +Send a pull request with your scraper, and be sure to run the scraper first with the instructions above to make sure the data is valid. + +It's a tough challenge to write scrapers that will work when websites are inevitably updated. Here are some tips: + +* Write your scraper so it handles aggregate data with a single scraper entry (i.e. find a table, process the table) +* Try not to hardcode county or city names, instead let the data on the page populate that +* Try to make your scraper less brittle by generated class names (i.e. CSS modules) +* When targeting elements, don't assume order will be the same (i.e. if there are multiple `.count` elements, don't assume the second one is deaths, verify it by parsing the label) + +## Criteria for sources + +Any source added to the scraper must meet the following criteria: + +### 1. Sources must be government or health organizations + +No news articles, no aggregated sources. + +### 2. Sources must provide the number of cases at a bare minimum + +Additional data is welcome. + +### 3. Presumptive cases are not considered confirmed + +As of now, presumptive cases should not be considered. + +## License + +This project is licensed under the permissive [BSD 2-clause license](LICENSE). + +The data produced by this project is public domain. + +## Attribution + +Please cite this project if you use it in your visualization or reporting.