Data Cleaning with OpenRefine for Ecologists Lesson for Data Carpentry
The current version has been tested with OpenRefine 3.7.2 on May 2023.
- This data set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
- Taxon names were put back into the file.
- The number of rows was reduced to simplify the reconciliation and URL parsing exercises.
- These modifications were made in order to illustrate some features of Open Refine.
- Errors were added to the taxon names (
scientificName
field), to demonstrate OpenRefine's ability to find likely mis-entered data. - These errors can be found using clustering algorithms on the
scientificName
column, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.
- Errors were added to the taxon names (
We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way. We'd like to ask you to familiarize yourself with our Contribution Guide.
Please see the current list of issues for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon.
Look for the tag . This indicates that the maintainers will welcome a pull request fixing this issue.
- Luis J. Villanueva ([email protected])
- Abigail Cabunoc
- Aleksandra Nenadic
- April M. Wright
- Betty Rozum
- Bill Mills
- Brian Yandell
- C. Titus Brown
- Cam Macdonell
- Dan Mazur
- Debbie Paul
- Erin Becker
- Francois Michonneau
- Gabriel A. Devenyi
- Greg Wilson
- Hilmar Lapp
- Hugo Tavares
- Ian Carroll
- James Allen
- James Mickley
- Jeffrey W. Hollister
- Jon Pipitone
- Jonah Duckles
- Kari L. Jordan
- Lisa Zilinski
- Maxim Belkin
- Michael Hansen
- Nick Young
- Piotr Banaszkiewicz
- Raniere Silva
- Ross Dickson
- Ryan E. Johnson
- Rémi Emonet
- Timothée Poisot
- Tracy Teal
- W. Trevor King
- Zack Brym
- dlstrong
- evanwill
- trelogan
See the Authors page for details.