Lacuna Project Data Engineering and Data Pipelines

Overview

This repository is dedicated to managing the Lacuna project's data engineering process and data pipelines. The project aims to build a comprehensive machine-learning dataset on "Household Electricity Consumption" in Sri Lanka. The data is stored in our manifold machine within the directory LIRNEasia/projects/lacuna/data/raw. Access to this data is restricted to users within the LIRNEasia unix user group, which includes all members of our DAP team.

Data Storage

The raw data for the Lacuna project is stored in the manifold machine at the following location:

LIRNEasia/projects/lacuna/data/raw

Access to this directory is limited to users within the LIRNEasia unix user group.

This is an overview of the data files located in the "raw" folder.

1. LA_lacuna_sample_18092023.xlsx, LA_lacuna_sample.csv

Description: These files contain details obtained from LECO regarding approximately 8000 households, which were utilized to draw a sample of 4500 households for the Lacuna project survey. Both files contain identical data, differing only in file format.
Data Provider: LECO

2. LA_LECO_Main_Member_Combined 12122023.csv

Description: This file contains survey data related to social safety net projects.
Data Provider: SRL, LA_core_stat_team

3. LA_social_safety_nets_sample.csv

Description: Contains electricity consumption data about social safety nets projects.
Data Provider: LECO

4. LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters (1).csv, LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters.csv

Description: These files contain the electricity consumption records of 100 smart-metered households. These households are part of the initial 8000 household sample used to draw the 4500 household sample for the study.
Data Provider: LECO

5. Variable properties.xlsx

Description: This file serves as the data dictionary for the survey dataset of the social safety nets project.

Database

We utilize a PostgreSQL server to store the data in SQL tables. This allows for efficient querying and manipulation of the dataset. For database management and interaction, we recommend using DBeaver, a database tool for developers, administrators, and analysts. DBeaver provides an intuitive interface for managing PostgreSQL databases and executing SQL queries.

Data Pipelines

To streamline and automate the process of getting data from raw files to the PostgreSQL server, we use the pipeline.sh script provided in this repository. The pipeline.sh script has the necessary steps to extract data from raw files, transform it as needed, and load it into the PostgreSQL database.

Getting Started

To begin using the data engineering process and data pipelines, follow these steps:

Ensure that you have access to the LIRNEasia unix user group to access the raw data directory.
Log into the PostgreSQL server, where the database is hosted.
Install a database client on your local machine to connect to the server. EX: DBeaver, but you can use others, like SQuirreL, or DBVisualizer. When connecting, remember that the user is "Postgres" and get the password from DAP members.
The files stored in this repository will be in the
```
/lirneasia/projects/lacuna/data
```
directory
Customize the pipeline.sh script to match data processing requirements.
Execute the pipeline.sh script to start the data pipeline process.

Contributors

@xuxoramos
@KasunAmare
@merlsyrex
@chanukaravishan

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.venv		.venv
code		code
notebooks		notebooks
.gitignore		.gitignore
FlowDiagram.png		FlowDiagram.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lacuna Project Data Engineering and Data Pipelines

Overview

Data Storage

1. LA_lacuna_sample_18092023.xlsx, LA_lacuna_sample.csv

2. LA_LECO_Main_Member_Combined 12122023.csv

3. LA_social_safety_nets_sample.csv

4. LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters (1).csv, LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters.csv

5. Variable properties.xlsx

Database

Data Pipelines

Getting Started

Contributors

About

Releases

Packages

Contributors 2

Languages

LIRNEasia/lacuna

Folders and files

Latest commit

History

Repository files navigation

Lacuna Project Data Engineering and Data Pipelines

Overview

Data Storage

1. LA_lacuna_sample_18092023.xlsx, LA_lacuna_sample.csv

2. LA_LECO_Main_Member_Combined 12122023.csv

3. LA_social_safety_nets_sample.csv

4. LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters (1).csv, LOAD_PROFILE_HISTORICAL_READINGS_INVENTORY_20_12_2023 - 50 meters.csv

5. Variable properties.xlsx

Database

Data Pipelines

Getting Started

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages