Skip to content

Data analysis assignment for the Trustworthy ML PhD position candidates

Notifications You must be signed in to change notification settings

mmc-tudelft/trustworthy-ml-interview-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IMPORTANT: PLEASE DO NOT WATCH, STAR OR FORK THIS REPOSITORY, TO PRESERVE YOUR ANONYMITY AS A CANDIDATE!!!

Data analysis assignment

This repository contains a notebook with the data analysis assignment for the Trustworthy ML PhD position.

The notebook involves data, which is synthesized based on the bank telemarketing dataset. A detailed description of the included variables (columns) can be found in the UCI repository.

Being synthesized, the data may not 100% share characteristics with the 'official' bank telemarketing dataset. The candidate is asked to assume they only have the synthesized dataset at hand.

We are interested in classifying the variable y: whether a client subscribed to a term deposit, and want to use several basic classifiers in the sklearn package for this, based on a cross-validation procedure.

We put some starter code for this in a notebook, which deliberately has some problematic aspects to it. Candidates are asked to transform the notebook into a more trustworthy and more appropriate implementation.

More specifically, we ask for the candidates to:

  • investigate and justify pre-processing steps to be performed on the data;
  • correct the implementation into a proper cross-validation procedure;
  • choose and justify evaluation strategies for the given problem.

We will discuss the candidate's solution during the interview.

We would appreciate if the candidate can share their solution beforehand with us, through a private repository to which Cynthia Liem (@informusica) is invited.

About

Data analysis assignment for the Trustworthy ML PhD position candidates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published