The program was created for Team project - development of data analysis system course run by @pbiecek.
The aim of the created program is to generate a summary report of provided database. It extracts basic or extended information such as:
- table, column names
- data stats (top values, mean, quartiles etc.)
- types of data and number of rows
Generated report has a searching feature which allows to find tables or columns by names and exact values.
Full specification and more detailed description of summarization features (in Polish) can be found in this file.
- Clone repository:
git clone https://github.com/mi2-warsaw/dbexplorer.git
- Change to directory:
cd dbexplorer
- Install package:
pip install .
- Run program with proper arguments:
dbexplorer -s 192.2.3.4 -p 5432 -n dvdrental -u dbadmin -pass password -t postgres -o out.html
The program can be run from command line with following arguments:
- -e (--extended) — generating report in extended format (default: basic format), parameterless,
- -s (--server) — address of the database host,
- -p (--port) — port of the database host,
- -n (--database_name) — name of the database,
- -u (--user) — user name on behalf of whom the extraction will be done,
- -pass (--password) — password for the user name,
- -t (--database_type) — type of database (currenly supported: Redshift, Postgress, Mysql and Teradata),
- -o (--output) — output file path,
- -sc (--schema) — schema name (only postgres, default: public),
- -d (--odbc_driver) — odbc driver name for Teradata connection (only TeraData).
- -top (--top_number) — number of desired most frequent values (default: 5)
- -m (--max_text_length) — max length of text in given column that will allow to summarise top values and distinct count (default: 100)
- Postgres:
dbexplorer -s 192.2.3.4 -p 5432 -n dvdrental -u dbadmin -pass password -t postgres -o out.html
- Teradata:
dbexplorer -e -t teradata -s 192.168.44.128 -u dbc -n sample1 -pass dbc -o test.html -d 'Teradata Database ODBC Driver 16.20'
Examples of generated reports can be found here.
Live example is here
Live example is here
- Karol Prusinowski
- Paweł Pollak
- Karol Szczawiński