This datathon platform is fully developped in python using streamlit in only 115 lines of code!
As written in the title, it is designed for small datathon and the scripts are easy to understand.
Clone the repo into your server.
git clone mini_datathon; cd mini_datathon
You need 5 simple steps to setup your mini hackathon:
- modify the password of the admin user in users.csv
- add the participants in users.csv
- modify the
load_target
andevaluate
function in main.py according to your needs (see Example) - edit the templates.py to change the content of the different pages (
markdown
format). - run the command
streamlit run main.py
Please do not forget to notify the participants that the submission file need to be a csv ordered the same way as given
in test and should contain the column predictions
.
An example version of the code is deployed on heroku here: web app
In the current version, the step #3 functions are implemented using the UCI Secom imbalanced dataset (binary classification) and evaluated by the PR-AUC score:
from sklearn.metrics import average_precision_score
@st.cache
def load_target():
labels = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/secom/secom_labels.data',
header=None, sep=' ', names=['target', 'time'])
y_train = labels.sample(**train_test_sampling)
y_test = labels.loc[~labels.index.isin(y_train.index), 'target']
return y_test
def evaluate(y_true, y_pred):
return average_precision_score(y_true, y_pred, average='micro')
The platform needs only 2 components to be saved:
The leaderboard is in fact a csv file that is being updated everytime a user submit predictions. The csv file contains 2 columns:
- id: the login of the user
- score: the maximum score of the user
We will have only 1 row per user since only the maximum score is being saved.
By default, a benchmark score is pushed to the leaderboard:
id | score |
---|---|
benchmark | 0.6 |
For more details, please refer to the script leaderboard.
Like the leaderboard, it is a csv file. It is supposed to be defined by the admin of the competition. It contains 2 columns:
- login
- password
A default user is created at first to begin to play with the platform:
login | password |
---|---|
admin | password |
In order to add new participants, simply add rows to the current users.csv file.
For more details, please refer to the script users.
- allow to have a private and public leaderboard like it is done on kaggle.com
- store the encrypted password in users.csv
- allow to connect using oauth
- define user permissions
MIT License here.
We could not find an easy implementation for our yearly internal hackathon at Intel. The idea originally came from my dear devops coworker Elhay Efrat and I took the responsability to develop it.
This version is not the one used at intel.
If you like this project, let me know by buying me a coffee :)