- Resampling to obtain similar distributions (http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/RUBNER/emd.htm)
- Kaggle submission ensembling with correlations and submission performance taken into account
- Random forest imputation
- Model performance boxplot (like Airbnb)
- Optimize the parameters of the exponential smoothing methods through train/test splitting
- Top-terms classifier should accept sparse matrices
- Every class should pass scikit-learn tests