Replies: 3 comments
-
Hi, Yes, Eurybia works on structured data. To understand your 3 cases, can you provide us the code to generate your data? Let's respectively name features, target and prediction of a model X, Y and P(X, Y). P(X, Y) can be decompose as : P(X, Y) = P(Y|X)P(X), with P(Y|X), the conditional probability of ouput given the model features, and P(X) the probability density of the model features. Data drift: Evolution of the production data over time compared to training or test data before deployment. With formulas, compare P(Xtraining) to P(XProduction) Concept drift: When change in P(Y|XProduction) compared to P(Y|Xtraining) |
Beta Was this translation helpful? Give feedback.
-
Hi Thomas, Thank you for your update. I am using our internal data for testing Eurybia, hence cannot share it. But This is what I'm considering.
So, thank you for your explanation on the drift concepts, can you please let me know how is Eurybia working in the backend considering these different drifts? Now when the AUC score is being generated, it is not showing anywhere what kind of drift is it but we can only observe if there is drift or not based on the value. Is my understanding here correct? Also, does Eurybia primarily work on "Concept drift" ? Thanks and Regards |
Beta Was this translation helpful? Give feedback.
-
Hi, if you generate data like this :
is that the first two cases? For Data drift, you can read section : "How Eurybia detect data drift" of readme.md (https://github.com/MAIF/eurybia/blob/master/README.md) For the moment Eurybia does not deal with the drift concept. Auc score of Eurybia is only for datadrift. If you want to detect data drift on review texts (as Eurybia does not yet have specific features for text), you have to do some preprocessing to monitor the changes you want. |
Beta Was this translation helpful? Give feedback.
-
Hi Team,
I am working on drift detection on structured and un-structured data and recently got to know about Eurybia.
I observed that Eurybia works on structured data and not on unstructured data, is my understanding correct?
When we want to test this on a set of dataset and then later use it for production, in such case, let us consider an example:
Let us consider a piece of text to be present in production data whose label is 1, now the model is trained and identified it to be under drift, we then have to re-train out model on the production data yo be used for the next drift computation. Now if it occurs again, but it is labeled as 1 but in fact there shouldn't be a drift.
Aren't we inducing bias or forcing the value to the input and as a result introducing the error.
Can you explain how this requirement can be solved using a classification model?
I explained the above scenario based on the following observations:
Case 1:
Data: A dataframe with just 1 column having text of reviews of variable length. df_basline and df_current are having 80% and 20% of the total data respectively.
Case 2:
Data: Same data used as case 1, with an additional column where I labeled the data as 0 and 1 based on 80:20 ratio, so that the first 80% of the data is training/df_baseline having '0' as label and the rest 20% data is for testing/df_current having label '1'.
Case 3:
Data: Same data used in Case 2, df having 2 columns (text, label). Now I considered the first 80% data for both df_baseline and df_current. Both the df_basline and df_current have the same text and same label ('0').
When the data is same as in 'Case 3', why is there s drift, how is the model actually working and aren't we introducing error adding the label?
Thanks and Regards
Hrudhay
Beta Was this translation helpful? Give feedback.
All reactions