Understanding IndexKernel for Hadamard Multitask model #1474

david-vicente · 2021-02-11T00:41:12Z

david-vicente
Feb 11, 2021

How should I construct the training index array full_train_i when I have more than 2 tasks and the shape of the inputs is more than 1?

For 2 tasks and inputs with shape equal to 1 the example notebook uses the following code:

train_x1 = torch.rand(50)
train_x2 = torch.rand(50)

train_i_task1 = torch.full_like(train_x1, dtype=torch.long, fill_value=0)
train_i_task2 = torch.full_like(train_x2, dtype=torch.long, fill_value=1)

full_train_x = torch.cat([train_x1, train_x2])
full_train_i = torch.cat([train_i_task1, train_i_task2])

is fill_value the index of the task?

For example if I had 3 tasks and my inputs were two dimensional should I construct a train_i_task3 filled with the number 2? And should the arrays train_i_task* have the same shape as the arrays train_x*, i.e., 2?

Answered by david-vicente

Mar 10, 2021

I think that the documentation for the IndexKernel is still confusing, and since the Hadamard Multi-task notebook is the only example (as far as I know) that tells the user to use this kernel explicitly, but for a specific case (inputs are 1 dimensional and there are only 2 tasks), the code in the example might confuse people. The problem here might be my lack of knowledge though, but please consider this:

The Hadamard Multi-task notebook uses the function torch.full_like to create the index tensors, like this

train_i_task1 = torch.full_like(train_x1, dtype=torch.long, fill_value=0)

which when we have multidimensional inputs creates a tensor that repeats the index in every dimension. If t…

View full answer

david-vicente · 2021-02-17T00:27:25Z

david-vicente
Feb 17, 2021
Author

I managed. fill_value is indeed the index.

6 replies

david-vicente Mar 8, 2021
Author

For example if I had 3 tasks and my inputs were two dimensional should I construct a train_i_task3 filled with the number 2? And should the arrays train_i_task* have the same shape as the arrays train_x*, i.e., 2?

This is correct.

I'm curious to why the index tensors need to have the same shape (number of columns / input dimensions) as the input tensors. Couldn't the index tensors simply have the same number of observations as the inputs but a single column indicating to which target it belongs to?

Balandat Mar 8, 2021
Maintainer

You may find this related discussion helpful: pytorch/botorch#736

david-vicente Mar 8, 2021
Author

You may find this related discussion helpful: pytorch/botorch#736

I think I got even more confused 😅, does this mean that here we have 2 tasks and 2 outputs? Then why don't we have for example a tensor filled with [0,0] and [1,0]? Where the first number is the task and the second number the output.

Balandat Mar 8, 2021
Maintainer

Yeah there there are multiple outputs per task. You could either do this by defining an index kernel on a single index and use indices 00, 01, 10, 11, or you can define a kernel on tuples of indices (0, 0), (0, 1), (1, 0), (1, 1), functionally they'd be the same

david-vicente Mar 8, 2021
Author

Yes, that part makes sense. But consider changing the notebook for Hadamard Multitask GP to a case where the inputs are 2-dimentional.

If I leave all of the code unchanged apart from changing

train_x1 = torch.rand(50)
train_x2 = torch.rand(50)

to

train_x1 = torch.rand(50,2)
train_x2 = torch.rand(50,2)

we will only have [0,0] and [1,1] inside of full_train_i. So my question is if this is the intended way to be used. Because after reading the issue from that link, it now seems that by doing so I'm saying that train_x1 corresponds to the first task and first output, and train_x2 corresponds to the second task and second output. Shouldn't train_x2 be second task and first output?

Sorry if i'm making stupid questions.

EDIT: we would also have to change the code in the example notebook for train_y1 and train_y2, but we can ignore that for now.

david-vicente · 2021-03-10T12:50:24Z

david-vicente
Mar 10, 2021
Author

I think that the documentation for the IndexKernel is still confusing, and since the Hadamard Multi-task notebook is the only example (as far as I know) that tells the user to use this kernel explicitly, but for a specific case (inputs are 1 dimensional and there are only 2 tasks), the code in the example might confuse people. The problem here might be my lack of knowledge though, but please consider this:

The Hadamard Multi-task notebook uses the function torch.full_like to create the index tensors, like this

train_i_task1 = torch.full_like(train_x1, dtype=torch.long, fill_value=0)

which when we have multidimensional inputs creates a tensor that repeats the index in every dimension. If this is supposed to be used like this I don't really understand why, because it makes me wonder why a single column of indices isn't enough. So I decided to try and build the indices tensor the way it makes sense to me.

In this example I have 3 tasks, and the inputs have 2 dimensions. The indices tensor is 1 dimensional though.

class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()

        self.task_covar_module = gpytorch.kernels.IndexKernel(num_tasks=3, rank=1)

    def forward(self,x,i):
        mean_x = self.mean_module(x)

        # Get input-input covariance
        covar_x = self.covar_module(x)
        # Get task-task covariance
        covar_i = self.task_covar_module(i)
        # Multiply the two together to get the covariance we want
        covar = covar_x.mul(covar_i)

        return gpytorch.distributions.MultivariateNormal(mean_x, covar)

def f1(v):
    return torch.sin(torch.sum(v) * 2 * math.pi) + torch.randn(1) * 0.2

def f2(v):
    return torch.cos(torch.sum(v) * 2 * math.pi) + torch.randn(1) * 0.2

def f3(v):
    return torch.exp(torch.sum(v)) + torch.randn(1) * 0.2

num_obs = 5

train_x1 = torch.rand(num_obs, 2)
train_x2 = torch.rand(num_obs, 2)
train_x3 = torch.rand(num_obs, 2)

train_y1 = torch.tensor([f1(v) for v in train_x1])
train_y2 = torch.tensor([f2(v) for v in train_x2])
train_y3 = torch.tensor([f3(v) for v in train_x3])

train_i_task1 = torch.full((num_obs,), dtype=torch.long, fill_value=0)
train_i_task2 = torch.full((num_obs,), dtype=torch.long, fill_value=1)
train_i_task3 = torch.full((num_obs,), dtype=torch.long, fill_value=2)


full_train_x = torch.cat([train_x1, train_x2, train_x3])
full_train_i = torch.cat([train_i_task1, train_i_task2, train_i_task3])
full_train_y = torch.cat([train_y1, train_y2, train_y3])

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = MultitaskGPModel((full_train_x, full_train_i), full_train_y, likelihood)

notice that I used torch.full() instead of torch.full_like()

train_i_task1 = torch.full((num_obs,), dtype=torch.long, fill_value=0)
# I could also have used 
# train_i_task1 = torch.full((num_obs, 1), dtype=torch.long, fill_value=0) 
# beacuse the __init__ of 
# gpytorch.models.ExactGP reshapes to this shape anyway.

this way the indices tensor ends up like this

>>> full_train_i
tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2])

and

>>> full_train_i.shape
torch.Size([15])

And if I try running

likelihood = gpytorch.likelihoods.GaussianLikelihood()
# Here we have two iterms that we're passing in as train_inputs
model = MultitaskGPModel((full_train_x, full_train_i), full_train_y, likelihood)

# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iterations = 2 if smoke_test else 50


# Find optimal model hyperparameters|
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iterations):
    optimizer.zero_grad()
    output = model(full_train_x, full_train_i)
    loss = -mll(output, full_train_y)
    loss.backward()
    print('Iter %d/50 - Loss: %.3f' % (i + 1, loss.item()))
    optimizer.step()

Iter 1/50 - Loss: 1.758
Iter 2/50 - Loss: 1.665
Iter 3/50 - Loss: 1.589

It runs.
So is torch.full_like() really the function that the users should use, and if so, what makes the way I did wrong?

Thank you in advance.

2 replies

gpleiss Mar 15, 2021
Maintainer

torch.full is probably better. If you think this would make the documentation clearer to users, could you please submit a PR updating the example?

david-vicente Mar 16, 2021
Author

Sure, I just wanted to be sure that what I was stating is indeed correct and not me misunderstanding the kernel usage. I'll submit the PR in the next days :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding IndexKernel for Hadamard Multitask model #1474

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Understanding IndexKernel for Hadamard Multitask model #1474

david-vicente Feb 11, 2021

Replies: 2 comments · 8 replies

david-vicente Feb 17, 2021 Author

david-vicente Mar 8, 2021 Author

Balandat Mar 8, 2021 Maintainer

david-vicente Mar 8, 2021 Author

Balandat Mar 8, 2021 Maintainer

david-vicente Mar 8, 2021 Author

david-vicente Mar 10, 2021 Author

gpleiss Mar 15, 2021 Maintainer

david-vicente Mar 16, 2021 Author

david-vicente
Feb 11, 2021

Replies: 2 comments 8 replies

david-vicente
Feb 17, 2021
Author

david-vicente Mar 8, 2021
Author

Balandat Mar 8, 2021
Maintainer

david-vicente Mar 8, 2021
Author

Balandat Mar 8, 2021
Maintainer

david-vicente Mar 8, 2021
Author

david-vicente
Mar 10, 2021
Author

gpleiss Mar 15, 2021
Maintainer

david-vicente Mar 16, 2021
Author