-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog post: The story of kedro-telemetry
- from start to now
#125
Comments
@idanov Had a great point about reading more into GDPR to understand the design of There's one important thing though, we follow an opt-in based consent due to GDPR. Here's the differences: https://termly.io/resources/articles/opt-in-vs-opt-out/ And here there's a nice table to compare both: https://seersco.com/articles/opt-in-vs-opt-out-consent/ Opt-in flow in the context of data collection means that the user has to explicitly give their consent before we start collecting any data. Opt-out flow means that the user has has the right to withdraw their consent at any time, but we might still start collecting data by default even without their initial consent. GDPR requires that users must be given the option to enable cookies out of their free will. Since there are various types of cookies serving different purposes, such as advertising cookies and analytics cookies, the user must have separate opt-in checkboxes for different cookie categories based on their purposes. In short, the GDPR requires consent to be opt-in. GDPR defines consent as “freely given, specific, informed and unambiguous” given by a “clear affirmative action.” It is not acceptable to assign consent through the data subject’s silence or by supplying “pre-ticked boxes.” |
kedro-telemetry
- from start to nowkedro-telemetry
- from start to now
Moved this into the |
Introduction
I thought it'd be cool to share a detailed story of our
kedro-telemetry
journey – where we started, the challenges we faced, and how far we've come. It’s been quite a ride, and I think it’s essential for all of us to understand the backstory, especially as we keep improving this tool.What were the early days?
Remember how nervous we were about starting telemetry? We all saw those threads on Reddit and Hackernews about other open-source projects getting heat for how they handled user data. Plus, being privacy nerds ourselves, we wanted to ensure we were doing right by our users. And, of course, there was that added pressure of Kedro being an enterprise-owned open-source project back then – we didn’t want any missteps to affect our reputation.
Therefore, we had serious brainstorming sessions with our InfoSec and Legal teams to ensure we were GDPR compliant. This was a challenge because we tried to interpret the law and how it applied to us. The legal team that we work with now is in LF AI & Data.
What design and architectural decisions did we make?
.telemetry
file that was not committed to git. This meant that users were asked to opt-in tokedro-telemetry
; if they said yes, then the decision only applied to their project where.telemetry
was present, and the decision applied to the Kedro CLI, Kedro-Viz CLI and Kedro-Viz UI.username
to identify internal users only because we could hash internalusername
. This methodology is inactive now - talk to @datajoely.Opt-in/opt-out workflow of
kedro-telemetry
What data do we collect?
kedro-telemetry
has evolved to collect more data as we have had more questions about our users. It's easiest to see aspects of this as a table and describe additional collection points. When users opt-in to usingkedro-telemetry
,kedro-telemetry
will collect project and user metadata, record usage of the Kedro Framework and Kedro-Viz CLI and track all feature usage of the Kedro-Viz UI. Identifying project (project name and package name) and user (computer name and username) metadata is hashed for anonymity requirements.kedro run --pipeline=ds --env=test
kedro run --pipeline ***** --env *****
kedro
project versionkedro-telemetry
versionWhat was the original data collection strategy for
kedro-telemetry
?Here's what the first version of
kedro-telemetry
proposed doing:What analytics tools does
kedro-telemetry
integrate with?To facilitate in-depth data analysis,
kedro-telemetry
employs Heap Analytics and Snowflake databases as data stores. This integration allows us to process complex datasets and glean valuable insights into how users interact with Kedro, influencing our development strategies.The text was updated successfully, but these errors were encountered: