-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add WriteFile command #89
Changes from all commits
0c5eaa0
632ae1a
1f1d690
9f50f11
56d9706
9ccd88f
7e2e9b0
44ffdf3
ec3adf5
12b4947
fbd08c7
f3274c8
b4cf103
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
[build-system] | ||
requires = ["setuptools >= 40.6.0", "wheel >= 0.31"] | ||
build-backend = "setuptools.build_meta" | ||
|
||
[tool.pytest.ini_options] | ||
markers = [ | ||
"postgres_db", | ||
"mssql_db", | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,3 +31,4 @@ test = | |
pytest-docker | ||
pytest-dependency | ||
mara_app>=1.5.2 | ||
mara-db[postgres,mssql] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Test notes | ||
========== | ||
|
||
There are several types of tests: | ||
* tests run without docker | ||
* tests run with docker | ||
|
||
The tests running in docker are marked with their execution setup. E.g. mark `postgres_db` is used for a setup where PostgreSQL is used as data warehouse database, `mssql_db` is used for a setup where SQL Server is used as data warehouse database / and so on. Docker tests are executed sequential, because otherwise they would override their mara configuration. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import sqlalchemy | ||
from mara_db import dbs | ||
|
||
|
||
def db_is_responsive(db: dbs.DB) -> bool: | ||
"""Returns True when the DB is available on the given port, otherwise False""" | ||
engine = sqlalchemy.create_engine(db.sqlalchemy_url, pool_pre_ping=True) | ||
|
||
try: | ||
with engine.connect() as conn: | ||
return True | ||
except: | ||
return False | ||
|
||
|
||
def db_replace_placeholders(db: dbs.DB, docker_ip: str, docker_port: int, database: str = None) -> dbs.DB: | ||
"""Replaces the internal placeholders with the docker ip and docker port""" | ||
if db.host == 'DOCKER_IP': | ||
db.host = docker_ip | ||
if db.port == -1: | ||
db.port = docker_port | ||
if database: | ||
db.database = database | ||
return db |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This file contains secrets used by the tests | ||
|
||
from mara_db import dbs | ||
|
||
# supported placeholders | ||
# host='DOCKER_IP' will be replaced with the ip address given from pytest-docker | ||
# port=-1 will be replaced with the ip address given from pytest-docker | ||
|
||
POSTGRES_DB = dbs.PostgreSQLDB(host='DOCKER_IP', port=-1, user="mara", password="mara", database="mara") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not pick that from an env var and add a fixture to set the env var? Or just create this in the fixture, where you have all the information (and return the DB) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yee... I am not 100% sure about that. This is actually a copy from the test suite of the The main purpose that I defined it in a separate config file is that you have the option to simply activate / disable the tests for specific database engines. Especially for test against cloud services this is handy. I don't want to share the credentials for my cloud with everybody but at the same time share the option for those who want to test their changes against the cloud ;-) Removing this and integrating it into the fixture makes sence for now, but maybe not in the future... |
||
MSSQL_SQLCMD_DB = dbs.SqlcmdSQLServerDB(host='DOCKER_IP', port=-1, user='sa', password='YourStrong@Passw0rd', database='master', trust_server_certificate=True) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
1,Elinor Meklit | ||
2,Triana Mahalah | ||
3,Eugraphios Esmae | ||
4,Agustín Alvilda | ||
5,Behruz Hathor | ||
6,Mathilde Tola | ||
7,Kapel Tupaq | ||
8,Shet Badulf | ||
9,Ruslan Vančo | ||
10,Madhavi Traian |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
CREATE TABLE names | ||
( | ||
id INT, | ||
name nvarchar(max) | ||
); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
DROP TABLE IF EXISTS names; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
import pathlib | ||
import pytest | ||
from typing import Tuple, Iterator | ||
|
||
from mara_app.monkey_patch import patch | ||
from mara_db import dbs, formats | ||
from mara_pipelines.commands.sql import ExecuteSQL | ||
from mara_pipelines.commands.files import ReadFile, Compression | ||
|
||
from tests.command_helper import run_command | ||
from tests.db_test_helper import db_is_responsive, db_replace_placeholders | ||
from tests.local_config import POSTGRES_DB, MSSQL_SQLCMD_DB | ||
|
||
import mara_pipelines.config | ||
patch(mara_pipelines.config.data_dir)(lambda: pathlib.Path(__file__).parent) | ||
|
||
FILE_PATH = pathlib.Path(__file__).parent | ||
|
||
|
||
if not POSTGRES_DB: | ||
pytest.skip("skipping MSSQL tests: variable POSTGRES_DB not set", allow_module_level=True) | ||
if not MSSQL_SQLCMD_DB: | ||
pytest.skip("skipping MSSQL tests: variable MSSQL_SQLCMD_DB not set", allow_module_level=True) | ||
|
||
|
||
@pytest.fixture(scope="session") | ||
def mssql_db(docker_ip, docker_services) -> Tuple[str, int]: | ||
"""Ensures that MSSQL server is running on docker.""" | ||
|
||
postgres_docker_port = docker_services.port_for("postgres", 5432) | ||
_mara_db = db_replace_placeholders(POSTGRES_DB, docker_ip, postgres_docker_port) | ||
|
||
# here we need to wait until the PostgreSQL port is available. | ||
docker_services.wait_until_responsive( | ||
timeout=30.0, pause=0.1, check=lambda: db_is_responsive(_mara_db) | ||
) | ||
|
||
mssql_docker_port = docker_services.port_for("mssql", 1433) | ||
master_db = db_replace_placeholders(MSSQL_SQLCMD_DB, docker_ip, mssql_docker_port) | ||
|
||
# here we need to wait until the MSSQL port is available. | ||
docker_services.wait_until_responsive( | ||
timeout=30.0, pause=0.1, check=lambda: db_is_responsive(master_db) | ||
) | ||
|
||
# create the dwh database | ||
conn: dbs.DB = None | ||
try: | ||
conn = dbs.connect(master_db) # dbt.cursor_context cannot be used here because | ||
# CREATE DATABASE cannot run inside a | ||
# transaction block | ||
try: | ||
cur = conn.cursor() | ||
conn.autocommit = True | ||
cur.execute('CREATE DATABASE [dwh]') | ||
finally: | ||
if cur: | ||
cur.close() | ||
finally: | ||
if conn: | ||
conn.close() | ||
|
||
dwh_db = db_replace_placeholders(MSSQL_SQLCMD_DB, docker_ip, mssql_docker_port, database='dwh') | ||
|
||
import mara_db.config | ||
patch(mara_db.config.databases)(lambda: { | ||
'mara': _mara_db, | ||
'dwh': dwh_db | ||
}) | ||
patch(mara_pipelines.config.default_db_alias)(lambda: 'dwh') | ||
|
||
return dwh_db | ||
|
||
|
||
@pytest.mark.dependency() | ||
@pytest.fixture | ||
def names_table(mssql_db) -> Iterator[str]: | ||
""" | ||
Provides a 'names' table for tests. | ||
""" | ||
ddl_file_path = str((pathlib.Path(__file__).parent / 'names_dll_create.sql').absolute()) | ||
assert run_command( | ||
ExecuteSQL(sql_file_name=ddl_file_path), | ||
|
||
base_path=FILE_PATH | ||
) | ||
|
||
yield "names" | ||
|
||
ddl_file_path = str((pathlib.Path(__file__).parent / 'names_dll_drop.sql').absolute()) | ||
assert run_command( | ||
ExecuteSQL(sql_file_name=ddl_file_path), | ||
|
||
base_path=FILE_PATH | ||
) | ||
|
||
|
||
@pytest.mark.mssql_db | ||
def test_read_file(names_table): | ||
"""Tests command ReadFile""" | ||
assert run_command( | ||
ReadFile(file_name='names.csv', | ||
compression=Compression.NONE, | ||
target_table=names_table, | ||
file_format=formats.CsvFormat()), | ||
|
||
base_path=FILE_PATH | ||
) | ||
|
||
with dbs.cursor_context('dwh') as cur: | ||
try: | ||
result = cur.execute(f'SELECT COUNT(*) FROM "{names_table}";') | ||
assert 10, result.fetchone()[0] | ||
|
||
finally: | ||
cur.execute(f'DELETE FROM "{names_table}";') | ||
|
||
|
||
@pytest.mark.mssql_db | ||
def test_read_file_old_parameters(names_table): | ||
"""Tests command ReadFile""" | ||
assert run_command( | ||
ReadFile(file_name='names.csv', | ||
compression=Compression.NONE, | ||
target_table=names_table, | ||
csv_format=True), | ||
|
||
base_path=FILE_PATH | ||
) | ||
|
||
with dbs.cursor_context('dwh') as cur: | ||
try: | ||
result = cur.execute(f'SELECT COUNT(*) FROM "{names_table}";') | ||
assert 10, result.fetchone()[0] | ||
|
||
finally: | ||
cur.execute(f'DELETE FROM "{names_table}";') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... on the shell on the machine which runs mara (e.g. in the docker container!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true! I am not happy with that everything in Mara runs through a shell command. For example, when I run mara in a Jupyter Notebook, it is pretty stupid that I have to carry out commands to make sure that the shell toolings are installed. It would be much smarter to use sqlalchemy or a DB API instead. I had already in mind to add a
SqlExecutionContext
which doesn't use shell but the DB API or SQLAlchemy, but ... didn't really had a usecase where I needed it...