Help with use of SQLQueryDataSet #1169
meaningfromdata
started this conversation in
Idea
Replies: 1 comment 10 replies
-
Kedro itself doesn't include the SqlAlchemy library, you need to add it to your requirements definition/install explicitly:
See our documentation on installing optional dependencies here. |
Beta Was this translation helpful? Give feedback.
10 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a sql table I am trying to load into a pandas dataframe as the first step of my Kedro data_processing pipeline. All I am trying to do at this point is read the table from the database and save it as a csv just to be sure I can read data in from the database in kedro and understand how to use the catalog and credentials. I can do this in pandas using pd.read_sql without issue but I can't seem to get it right using SQLQueryDataSet in kedro. I have added my connection string for the database to my credentials.yml and then created an entry for the SQL table/query in the catalog.yml as follows:
cohort_sql_query:
type: pandas.SQLQueryDataSet
credentials: my_credentials_for_db
sql: SELECT * FROM ..<table_with_cohort>
I have added a node to my data_processing pipeline that takes cohort_sql_query as the input and attempts to output a csv of the cohort, which is registered in the catalog.yml.
The catalog entry is:
cohort_csv:
type: pandas.CSVDataSet
filepath: data/02_intermediate/patient_cohort.csv
The node is simply:
def get_cohort_from_db(query_dataset):
cohort_df = query_dataset
return cohort_df
The entry in the pipeline is as follows:
node(
func=get_cohort_from_db
inputs="cohort_sql_query"
outputs= "cohort_csv"
name= "get_cohort_from_db_node"
However, when I try to run the pipeline I am getting a DataSetError that, in brief, states:
"Object 'SQLTableDataSet' cannot be loaded from 'kedro.extras.datasets.pandas' Please see the documentation on how to install relevant dependencies for kedro.extras.datasets.pandas.SQLTableDataSet"
I took a look at the documentation but it wasn't clear to me what more I needed to install. I am using a conda environment with kedro 0.17.6 and kedro-telemetry 0.1.3 installed.
Any help getting this very basic step working would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions