Question about using model on separate datasets #2477

overflow-399 · 2024-02-19T05:59:35Z

overflow-399
Feb 19, 2024

Hello, I am relatively new to GPytorch and Pytorch in general. I have trained up a relatively simple GP model and have been able to evaluate it using a separate test dataset. However, whenever I try to evaluate it using a dataset that hasn't been split off of the original data set, the model returns NA values.

Some things I have considered:

NA values: handled by GaussianLikelihoodWithMissingObs()
Input Size: the test, train, and new dataset all have the same number of columns as found by using .size(). I have also tried padding the new dataset with rows of all ones to try to make it the same shape as the test data, but that hasn't worked either. No luck with unsqueezing since the dimensions are correct
Tensor Compatability Issues: The model, likelihood, and tensors are all on cpu. The datatypes have all been manually set to floats. The tensors are both sparse

Example Code:

import torch
import gpytorch
import pandas as pd
df = pd.read_csv('exampledata.csv')
train_df = df[df['train'] == True]
test_df = df[df['train'] == False]

train_y = torch.tensor(train_df['target_feature'].values)
test_y = torch.tensor(test_df['target_feature'].values)

train_x = train_df.drop(['target_feature'], axis=1)
test_x = test_df.drop(['target_feature'], axis=1)

# Convert these to PyTorch tensors
train_x = torch.tensor(train_x.values)
test_x = torch.tensor(test_x.values)
train_x = train_x.float()
train_y = train_y.float()
test_x = test_x.float()
# test_x.size is torch.Size([3437, 164])
test_y = test_y.float()

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, x, y, likelihood):
        super(ExactGPModel, self).__init__(x, y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.keops.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
    
# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihoodWithMissingObs()
model = ExactGPModel(train_x, train_y, likelihood)

state_dict = torch.load('model.pth')
model.load_state_dict(state_dict)

model.eval()
likelihood.eval()

# This part works with no issues
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # Make predictions on the test data
    test_pred = likelihood(model(test_x))
    # Get the predicted mean
    predicted_means = test_pred.mean
    print("Predicted Means:", predicted_means) 

df2 = pd.read_csv('newdata.csv')
new_x = torch.tensor(df2.values)
new_x = new_x.float()
# new_x.size is torch.Size([332, 164])

# Predicted means is all NA values
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # Make predictions on the test data
    test_pred = likelihood(model(new_x))
    # Get the predicted mean
    predicted_means = test_pred.mean
    print("Predicted Means:", predicted_means)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about using model on separate datasets #2477

{{title}}

Replies: 0 comments

Select a reply

Question about using model on separate datasets #2477

overflow-399 Feb 19, 2024

Replies: 0 comments

overflow-399
Feb 19, 2024