Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executors for Database read / write #789

Open
accforgithubtest opened this issue Jan 13, 2025 · 9 comments
Open

Executors for Database read / write #789

accforgithubtest opened this issue Jan 13, 2025 · 9 comments

Comments

@accforgithubtest
Copy link

Hello, just discovered dagu, and seems a like neat lightweight tool - thank you for creating it!

Is there any way for me to save the results of a http executor to a postgres / mariadb database ?
I have seen examples for using the http executor, however there are no examples of saving any data to a database.

Is this currently possible ? If yes, may I pls request to add an example using postgres / mariadb to the docs.
Cheers!

@yohamta
Copy link
Collaborator

yohamta commented Jan 13, 2025

Hi, thanks for your interest and kind words!

While there isn't a direct function to interact with the database, there are a few ways to accomplish what you want to do.

Here are some example DAGs:

1. Run python script

steps:
  - name: make_http_request
    executor:
      type: http
      config:
        timeout: 10
        silent: true
    command: GET https://api.example.com/data
    output: HTTP_RESULT

  - name: save_to_database
    command: python
    depends: make_http_request
    script: |
      import psycopg2
      import json
      import os
      
      # Python program to insert data to the database

2. Uses Docker executor to run database operations

steps:
  - name: make_http_request
    executor:
      type: http
      config:
        silent: true
    command: GET https://api.example.com/data
    output: HTTP_RESULT

  - name: save_to_database
    depends: make_http_request
    executor:
      type: docker
      config:
        image: postgres:17-alpine
        autoRemove: true
        container:
          env:
            - PGHOST="$DB_HOST"
            - PGPORT="$DB_PORT"
            - PGDATABASE="$DB_NAME"
            - PGUSER="$DB_USER"
            - PGPASSWORD="$DB_PASS"
    command: psql -c "INSERT INTO some_table (response_data) VALUES ('$HTTP_RESULT');"

Hope that helps!

@accforgithubtest
Copy link
Author

accforgithubtest commented Jan 13, 2025

Thank you for the reply @yohamta ! I guess writing few libes of python code is a way of trying this out.

Is a proper database executor in the roadmap ? Can i make this a feature request for a database executor please ?

Ideally, a named database connection could be created via environment variables in docker compose, and an executor that can do something like

in docker-compose

services:
...
  environment:
    - `DATABASE_TESTDBNAME1=postgres://user:password@host:port`
    - `DATABASE_TESTDBNAME2=mariadb://user:password@host:port`
- name: execute_sql
  depends: make_http_request
  executor:
    type: sql
    config:
      connection: DATABASE_TESTDBNAME1
      sql: |
        insert into table ..... 
- name: execute_sql
  depends: make_http_request
  executor:
    type: sql
    config:
      connection: DATABASE_TESTDBNAME2
      sql: |
        select * from ......

@yohamta
Copy link
Collaborator

yohamta commented Jan 13, 2025

This looks useful, so I would be grateful if you could write a Feature request.
While reading environment variables from docker-compose.yaml is convenient, it could potentially be a footgun.

Would it be acceptable to define commonly used environment variables in ~/.config/dagu/base.yaml like this?

env:
  - DATABASE_TESTDBNAME1: "postgres://user:password@host:port"
  - DATABASE_TESTDBNAME2: "mariadb://user:password@host:port"

Also, regarding the SQL execution executor, what are your thoughts on this format?

- name: execute_sql
  depends: make_http_request
  executor:
    type: sql
    conn: ${DATABASE_TESTDBNAME1}
  script: |
    insert into table ...

I would appreciate your feedback on these suggestions! I'm also open to alternative ideas or recommendations. :)

@accforgithubtest
Copy link
Author

accforgithubtest commented Jan 13, 2025

thanks for the reply @yohamta. Happy to write up a feature request for this.

Your suggested format for the sql executor looks great, as well as the suggestion for database configuration in dagu/base.yaml. They are both good to go.

I have a question, what would passing output to another step look like for sql ? would the results be in a json / csv format ?
Can we do multiple sql's one after another, with a depends condition to chain them like this?

- name: execute_sql_1
  depends: make_http_request
  executor:
    type: sql
    conn: ${DATABASE_TESTDBNAME1}
  script: |
    select * from .... where id in (<data from http request>)
  output: SQL_OUT_1

- name: execute_sql_2
  depends: execute_sql_1
  executor:
    type: sql
    # conn: ${DATABASE_TESTDBNAME1} ## NO DB CONNECTION since this sql runs on the output from previous step
  script: |
    select * from (SQL_OUT_1) [where <some conditions on data here>] [group by ...] [order by ...]
  output: SQL_OUT_2

This is just an example to see how a SQL chaining feature, if possible, would work in dagu.
Perhaps this is not a necessary feature initially, as multiple sqls could all be written up in a way where it just runs once, I am not sure about this yet, so want to know what your thoughts are.

@yohamta
Copy link
Collaborator

yohamta commented Jan 13, 2025

Great point. I was thinking we could introduce format field for sql step output:

- name: execute_sql_1
  depends: make_http_request
  executor:
    type: sql
    config:
      driver: postgres
      dsn: ${DATABASE_TESTDBNAME1}
      format: json # options: json, csv, or raw
  script: |
    select * from .... where id in (<data from http request>)
  output: SQL_OUT_1

(sorry! I've updated the config format a bit)

This is just an example to see how a SQL chaining feature, if possible, would work in dagu.

Yes, that's a really interesting idea! While I think the select * from (SQL_OUT_1) approach is useful and feasible, we'd need to figure out how to handle SQL properly. We could use SQLite under the hood or possibly integrate with dbt? Just thinking out loud here, I'd love to hear your thoughts on this.

@accforgithubtest
Copy link
Author

Adding a output format is a neat and necessary idea. The updated config looks good.

If sqlite is used, would it interfere in any way in dealing with database specific data types and SQL functions ? Or it would be possible to handle this properly since sqlite only temporarily stores the results for the next query to use ?
But then would it mean the second query will need to only use sqlite specific datatypes and syntax, instead of postgres (assuming we start with postgres as a example) ?

Apologies,but I am not sure what dbt is in this context.

@accforgithubtest
Copy link
Author

I have been reading a bit more, and turns out to implement sql chaining properly, a more complex implementation like DataFusion, which supports dataframes etc will need to be included and integrated in dagu.

So, may be the initial feature request for an sql executor could probably avoid the sql chaining features, and focus on implementing sql support for the basic sql operations across postgres, mariadb and sqlite ? Keen to hear your opinions on this.

@yohamta
Copy link
Collaborator

yohamta commented Jan 14, 2025

Thank you for sharing your thoughts! I completely agree. Initially, we could focus on implementing a basic SQL executor that supports postgres, mysql (MariaDB), and sqlite.

@accforgithubtest
Copy link
Author

hi @yohamta - apologies for the delay.

I am going thru the Passing output to Next Step docs and Variable Passing to write up some examples for the sql feature request.

Can you pls help me coming up with a real life example.
Lets assume we are fetching a list of users from this example url.
We want to only insert the id, firstName, lastName, address.address, address.city, address.stateCode, postalCode.

Does this example below look right to you ?
Will this insert all users from the json response ?

steps:
  - name: make_http_request
    executor:
      type: http
      config:
        silent: true
    command: GET https://dummyjson.com/users
    output: HTTP_RESULT

  - name: execute_sql_1
    depends: make_http_request
    executor:
      type: sql
      config:
        driver: postgres
        dsn: ${DATABASE_TESTDBNAME1}
        format: json # options: json, csv, or raw
    script: |
      INSERT INTO user_details (id, firstName, lastName, address.address, address.city, address.stateCode, postalCode) VALUES ('${HTTP_RESULT.id}, ${HTTP_RESULT.firstName}, ${HTTP_RESULT.lastName}, ${HTTP_RESULT.address.address}, ${HTTP_RESULT.address.city}, ${HTTP_RESULT.address.stateCode}, ${HTTP_RESULT.address.postalCode} ');
    output: SQL_OUT_1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants