ZNO-Dataset is a repository containing scripts, notebooks, and documentation for processing and analyzing Ukrainian standardized testing data. The project covers data loading, cleaning, transformation into normalized tables, database setup using Docker, and analysis of EIE(ZNO) exams.
-
data_loader/
Contains raw data directories (2016–2024) and the main script for loading data:load_zno.py
. -
db_info/
Contains database-related files including the PostgreSQL schema:db_schema.sql
and a Jupyter notebook to test the connection:test_connection.ipynb
. See also the database setup guide indb_info/README.md
. -
notebooks/
Includes Jupyter notebooks for data analysis and table creation. For example, visit the tables creation README for more details. -
datasheet.md
A comprehensive datasheet outlining the dataset's motivation, composition, collection process, and maintenance. Refer todatasheet.md
for details. -
Dockerfile
Used to build a Docker image for setting up the PostgreSQL database. -
Additional directories include logs, images, and supporting files.
-
Clone the Repository:
git clone https://github.com/<username>/ZNO-Dataset.git cd ZNO-Dataset
-
Obtain the Data:
You have two options:
-
Option 1 (Recommended): Load Pre-Cleaned Data
Download the pre-cleaned data from HuggingFace Hub
Load example:
from datasets import load_dataset dataset = load_dataset('DSRL/student') dataset['train'].to_pandas().head()
-
Option 2: Load and Process Data Manually
Run the data loader script to process raw data:
python data_loader/load_zno.py
Then, run all notebooks in the
notebooks/tables_creation
folder.
-
-
Setup the Database using Docker:
Follow the instructions in
db_info/README.md
or run:docker build -t my_db . docker run -p 5432:5432 my_db psql --host=127.0.0.1 --port=5432 --username=myuser --dbname=EIE
-
Explore the Notebooks:
Open the Jupyter notebooks in the
notebooks
directory for data analysis.
For detailed information on the dataset’s composition, collection, preprocessing, and usage, refer to datasheet.md
.
If you need any help, please contact us at [email protected] or open an issue on GitHub.
This project is licensed under the MIT License.