The solution is based on a finetuning of BERT-type language models and the use of its embeddings in classification with CatBoost.
Our results:
🥇The 1st place in leaderboard of metric (ROC_AUC ovr) in both tasks.
🥉The 4th place in the quality of EDA and visualization.
P.S We love Yandex.DataSphere! ^_^
notebooks - Jupyter notebooks
reports - Generated analysis as HTML, PDF, LaTeX, etc.
requirements.txt - The requirements file for reproducing the analysis environmen
scripts - Scripts for this project (finetuning models, extracting feutures, training models, using trained models)