Skip to content

Latest commit

 

History

History
65 lines (50 loc) · 2.68 KB

README.md

File metadata and controls

65 lines (50 loc) · 2.68 KB

HerbariaOCR

Smriti Suresh, Dima Kazlouski, Douglas Moy - 2023-10-06 v1.0.0-dev

Note: The file README.md is generated from nbs/index.ipynb. Do not edit README.md directly, but rather edit nbs/index.ipynb. It is also rendered as the front page to the project documentation.

See project documentation for the rendered documentation output.

Overview

The changing climate increases stressors that weaken plant resilience, disrupting forest structure and ecosystem services. Rising temperatures lead to more frequent droughts, wildfires, and invasive pest outbreaks, leading to the loss of plant species. That has numerous detrimental effects, including lowered productivity, the spread of invasive plants, vulnerability to pests, altered ecosystem structure, etc. The project aims to aid climate scientists in capturing patterns in plant life concerning changing climate. The herbarium specimens are pressed plant samples stored on paper. The specimen labels are handwritten and date back to the early 1900s. The labels contain the curator’s name, their institution, the species and genus, and the date the specimen was collected. Since the labels are handwritten, they are not readily accessible from an analytical standpoint. The data, at this time, cannot be analyzed to study the impact of climate on plant life. The digitized samples are an invaluable source of information for climate change scientists, and are providing key insights into biodiversity change over the last century. Digitized specimens will facilitate easier dissemination of information and allow more people access to data. The project, if successful, would enable users from various domains in environmental science to further studies pertaining to climate change and its effects on flora and even fauna.

Install

pip install HerbariaOCR

How to use

Refer to the EDA tab for exploring the Herbaria OCR data sources

Refer to the Azure Vision tab for exploring the implementation of the OCR pipeline to obtain results from the models.

Refer to the LLM Evaluations tabs for exploring the Evaluation metrics used as well as Accuracy results for English, Cyrillic and Chinese samples respectively.

Contributing

If you want to contribute to this project, there are primarily two ways:

  1. Contribute bug reports and feature requests by submitting issues to the GitHub repo.
  2. If you want to create Pull Requests with code changes, read the contributing guide on the github repo.