Thank you for deciding to honour us with your participation in the the Prescient Coding Challenge. We trust you've brought your thinking caps and are ready to tackle an interesting problem. Most of all, we hope this stimulates you mentally and that you have a lot of fun. We think the grand prize is worth it.
You are given a set of stocks and their monthly returns (return per stock per month) spanning from January 2010 up to last month. We challenge you to construct a portfolio that yields a higher total return than an equally weighted index (the benchmark index). In other words, we want you to programmatically come up with a weighted portfolio index of stock returns each month that does better than if all stock returns in this index were just combined in equal weights through time. We will measure the outperformance on the test data only.
You are given a training data set of dimensions
The total return,
The challenge is to calculate a set of portfolio weights
and
where
That is, the latest value in the portfolio's total return series is strictly greater than the corresponding value in the benchmark index's total return series.
- As stated in the constraints above, no stock may have a weight higher than 10% (it can have a zero weight) at any point in time. Also, all weights must sum to 1 for any given month.
- You are welcome and encouraged to use any external resources at your disposal (chatGPT, Google, stackoverflow, etc.).
- You can use any methodology, machine learning, optimisation or algorithm of your choice. Make sure you have comments that explain what you are doing and why. The better we understand what you are attempting, the higher your chances of success.
- You can use any IDE of your choice (VScode, spyder, RStudio, etc). It is the solution template that will matter at the end of the day.
- We will evaluate your submission based on the following criteria:
- Your methodology's performance against the equally weighted benchmark index.
- How well you are able to articulate and explain your own answer. (Hint: Don't just throw all sorts of technical jargon onto a page. You don't have a lot of time. Be concise, show us how much of your answer you understand well.)
- Generative AI is useful, but we will bias our evaluation towards more original answers. If you do use AI, use it to help you work smart. Don't try to have it do all the work for you.
- We will also be keeping an eye out for look-ahead bias, hence the separation between test and training sets.
You will find the data split into two .csv
files, returns_train.csv
and returns_test.csv
:
- The data sets are not necessarily ordered by month end.
- The training set is the first part of the data, and the test set the second.
- There is a return for each stock at each month end.
You are provided with a solution template to submit your answer in. If you are using R, work in solution_skeleton.R
. For python, use solution_skeleton.py
. The solution template contains three functions:
equalise_weights()
: Generates a dataframe of equal weights for the index.generate_portfolio()
: The skeleton or sample code to generate the portfolio weights.plot_total_return()
: Plot the total return series of both the index and portfolio.
The recomendation is to modify the weight generation section in the generate_portfolio()
function (see below)
for python
# <<--------------------- YOUR CODE GOES BELOW THIS LINE --------------------->>
# This is your playground. Delete/modify any of the code here and replace with
# your methodology. Below we provide a simple, naive estimation to illustrate
# how we think you should go about structuring your submission and your comments:
# We use a static Inverse Volatility Weighting (https://en.wikipedia.org/wiki/Inverse-variance_weighting)
# strategy to generate portfolio weights.
# Use the latest available data at that point in time
for i in range(len(df_test)):
# latest data at this point
df_latest = df_returns[(df_returns['month_end'] < df_test.loc[i, 'month_end'])]
# vol calc
df_w = pd.DataFrame()
df_w['vol'] = df_latest.std(numeric_only=True) # calculate stock volatility
df_w['inv_vol'] = 1/df_w['vol'] # calculate the inverse volatility
df_w['tot_inv_vol'] = df_w['inv_vol'].sum() # calculate the total inverse volatility
df_w['weight'] = df_w['inv_vol']/df_w['tot_inv_vol'] # calculate weight based on inverse volatility
df_w.reset_index(inplace=True, names='name')
# add to all weights
df_this = pd.DataFrame(data=[[df_test.loc[i, 'month_end']] + df_w['weight'].to_list()], columns=df_latest.columns)
df_weights = pd.concat(objs=[df_weights, df_this], ignore_index=True)
# <<--------------------- YOUR CODE GOES ABOVE THIS LINE --------------------->>
for R
# YOUR CODE GOES BELOW THIS LINE ------------------------------------------
# This is your playground. Delete/modify any of the code here and replace
#with your methodology. Below we provide a simple, naive estimation to
# illustrate how we think you should go about structuring your submission
# and your comments:
data <- bind_rows(training_data, test_data)
# We use a static Inverse Volatility Weighting (https://en.wikipedia.org/wiki/Inverse-variance_weighting)
# strategy to generate portfolio weights.
data_long <-
data |>
pivot_longer(
contains("Stock"),
names_to = "stock",
values_to = "value"
) # pivot data to long format (columns: month_end, stock, value)
# To avoid look-ahead bias we calculate the weight for each stock in each
# month using only returns going up until the date in question. We split
# this into two steps: 1) We create a function to calculate portfolio weights
# and then 2) iterate this function over the test data with purrr::map_df() using
# the test data's month end dates to guide it. See it as a vectorised loop. Read more
# on purrr here: https://purrr.tidyverse.org/
calculate_weight <- function(date) {
data_long |>
filter(month_end < date) |>
summarise(vol = sd(value), .by = stock) |> # calculate standard deviation (volatility) for each stock's returns
mutate(
inv_vol = 1/vol, # calculate the inverse of the volatility
tot_inv_vol = sum(inv_vol), # create a new column = the sum of the volatilities
weight = inv_vol/tot_inv_vol # create a weight column
) |>
mutate(month_end = date) |> # add back the date
select(month_end, name = stock, value = weight) |>
pivot_wider()
}
df_weights <- map_df(
.x = test_data$month_end,
.f = calculate_weight
)
# YOUR CODE GOES ABOVE THIS LINE ------------------------------------------
Once you have the function generating the weights you can use the plot_total_return
to compare the index and your portfolio output. Please keep the output consistent so that you don't have to recode the plotting function.
Assuming you have a Github account and git installed on your computer, follow these steps:
- Log in to Github
- Go to https://github.com/PIM-Data-Science/prescient-coding-challenge-2023
- Fork the repository
- On your version of the project, click the
Code
button and copy the link under the HTTPS header. - On your computer, go to the location you would like to store this project
- In this location, right click and open Git Bash if on Windows, or Terminal if on Mac.
- In terminal, type and run
git clone your-copied-link
- Open the project in your preferred IDE and start coding!
- Once everything has been saved, open Git Bash (or terminal) again in that same location
- Run
git status
. You should see your file listed as modified. - Run
git commit -am "Insert your team members' names here."
- Run
git push
- Go to your github profile
- Open the project online, you should see your changes
- Open Pull Requests and then click on
New Pull Request
- Add your team's names to the pull request subscription, and finish creating the pull request.