Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look at data upload/curation software from students #14

Open
PGijsbers opened this issue Sep 27, 2024 · 2 comments
Open

Look at data upload/curation software from students #14

PGijsbers opened this issue Sep 27, 2024 · 2 comments
Assignees

Comments

@PGijsbers
Copy link

Have a look at the dashboards the students made, and see what aspects we would like to keep - and if the code is salvageable.

@SubhadityaMukherjee
Copy link

SubhadityaMukherjee commented Oct 28, 2024

  1. https://github.com/IwkooO/Dataset-Uploader-OpenML
    TL;DR I do think its worth looking at. Especially the UI and perhaps the type extraction.

Summary : Better interface in multiple steps, dataset viewer, automatic feature type extraction using OpenAI api, feature editor, this prompt (You are the creator of a dataset. You want to upload the dataset to an online repository. You are requested to provide a dataset description. Knowing the column names and their sample values you will write a concise and informative description within 250 words limit without use only ASCII standard characters.)

I could not test mostly anything (except UI) because the entire codebase is dependant on the OpenAI api to run, which needs me to put money on it now it seems. (I could modify it if it is of interest.)

My opinion - I do think the UI looks a lot more user friendly than what we have now. Automatic feature type extraction is based on a different (previous OpenML paper?) and that seems fine (sorting).
The code needs a fair amount of work. and I am not certain about the OpenAI part.

@SubhadityaMukherjee
Copy link

SubhadityaMukherjee commented Oct 28, 2024

  1. https://github.com/Sanderror/OpenML_Data_Cleaner
    TL;DR - Nice as a separate tool
    image
    Summary - Performs these actions (from the image). I believe the cryptic attribute name uses the OpenAI api and I face the same problem as the previous one with it needing me to put money on it.

My opinion - I think the tool by itself is a very nice idea. It does take a very long time to run though and I am not entirely sure if it is a good idea to integrate it with OpenML. it might be nice as a separate data processing library. As for the code, a lot needs to be done to make it maintainable and I am unsure how to speed it up without digging very deep.
Something useful would be the feature type check, but perhaps the previous one is more user friendly in doing that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants