Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using --to-nodes instead of --to-outputs for kedro runs #4

Open
Nrbassili opened this issue Dec 5, 2024 · 2 comments
Open
Assignees

Comments

@Nrbassili
Copy link

For LAR and MLAR, we have a node that validates the MLAR and LAR data by checking that the row counts are the same.

With the current cron job and job templates, jobs are generated with the --to-outputs kedro run parameter, which means this validation node is not run for LAR and MLAR datasets.

If we switch to using the --to-nodes kedro run parameter, in the job and cronjob templates, we can set the final validation node to be the target node. This would mean that both MLAR and LAR files are generated in the same kedro run, which will take longer, but allow us to validate the counts at the end. Otherwise, we should consider removing this node, since it isn't used.

@Nrbassili
Copy link
Author

Nrbassili commented Dec 5, 2024

N/A

@rkovalik-raft rkovalik-raft self-assigned this Dec 11, 2024
@aharjatiRaft
Copy link
Contributor

we can utilize tag to group these nodes together. Tag allows us to organize/group nodes into their business logic.
For example: we can add tag named public_modified_lar_flat_file_{year} and add this into create_mlar_flat_file and validate_lar_and_mlar_row_counts nodes, and change our job and cron templates to use this argument --tags=public_modified_lar_flat_file_{year}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants