Training a Gaussian Process model in MOOSE #26267

charlieowen3065 · 2023-12-06T18:28:41Z

charlieowen3065
Dec 6, 2023

Hello!

I am currently trying to use the stochastic tools module to train a gaussian process model, but am having issues with the hyperparameters.

My basic setup is this: I load in my training data from CSV files (using CSVSampler for X and VectorPostprocessor for Y) and specify the "Signal Variance", "Noise Variance", and "Length Factor" for a Matern covariance function with p = 1 (nu = 3/2). I also run the same model (same inputs, hyperparameters, and covariance function) in python using sklearn just to have a reference point. I am able to get a good, general model working on python, but when I use those same hyperparameters in MOOSE, when I go to evaluate the model using my testing data, it only predicts a vector of ~0's (the given values are ~4*10^-301, which is zero). I have gone into both codes and found the actual used hyperparameters for these evaluations, and they do match.

I believe the issue has to do with how I am training the algorithm, but I am not sure where. An issue that may be wrong is in the length factor HP. For this one, MOOSE requires an input for each feature, but since I have 102 features, I added in a few lines that takes in the input length factor and copies this into a 1x102 vector, then uses that vector for the input. I am not sure if this is maybe causing an unforeseen consequence somewhere, so I thought I'd mention it.

I have also tried using the 'adam' tuning algorithm, but this also does not give any better results. (I have tired with batch sizes from 1-50, itterations from 10-1000, and with learning rates from 0.001 - 10).

Here is the training input file:

# =+=+=+=+=+=+=+=+=+ USER INPUTS =+=+=+=+=+=+=+=+=+=+=+=+=

X_filename = "X_train.csv"

# Hyperparameters
signal_variance = 0.01
noise_variance = 0.01
length_factor = 0.01
p = 1

# =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

[StochasticTools]
[]

[Samplers]
  [X_train]
    type = CSVSampler
    samples_file = ${X_filename}
    execute_on = 'initial'
  []
[]

[VectorPostprocessors]
  [Y_train]
    type = CSVReader
    csv_file = 'Y_train.csv'
  []
[]

[Covariance]
  [Matern_32]
    type = MaternHalfIntCovariance
	constant_length_factor = True
    signal_variance = ${signal_variance}
    noise_variance = ${noise_variance}
    length_factor = "${length_factor}"
	p = ${p}
  []
[]

[Trainers]
  [train]
    type = GaussianProcessTrainer
    execute_on = timestep_end
    covariance_function = 'Matern_32'
    standardize_params = 'false' #Center and scale the training params
    standardize_data = 'false' #Center and scale the training data
    sampler = X_train
    response = Y_train/${column_name}
	# Tuning
	tuning_algorithm = 'adam'
	tune_parameters = 'signal_variance length_factor'
	# tune_parameters = 'signal_variance'
	iter_adam = 100
	batch_size = 20
	learning_rate_adam = 10
	
  []
[]

[Outputs]
  [out]
    type = SurrogateTrainerOutput
    trainers = 'train'
	file_base = "Trained_Model"
    execute_on = FINAL
  []
[]

To make my life simpler, I already standardized my X, so that is why those options are false. I also run this through a shell script where I input my hyperparameters there, so you can ignore the values on the top of the file. Those are just there to initialize the variables.

Any advise on this? I have tried a large range of hyperparameters for the MOOSE GP model, but nothing has given results better than a vector of zeros (my actual Y's are in the ~2000 range).

I am pretty stuck, so I am willing to try just about anything.

Thanks!

Charlie

Answered by somu15

Dec 12, 2023

I was on personal leave and hence my responses have been delayed so far.

For the Matern Half Int covariance, p should be a positive integer. See the documentation here: https://mooseframework.inl.gov/source/surrogates/MaternHalfIntCovariance.html

As far as the zero predictions go, this typically relates to problem with your training data. Some problems could be:

Not standardizing the training data inputs and outputs. This is very important for GPs. You can standardize by simply setting standardize_params = 'true' and standardize_data = 'true'
The training data doesn't make physical sense.
Or some other issue.

Standardize the data and params and see how it goes. Also, when doing predicti…

View full answer

GiudGiud · 2023-12-06T20:25:41Z

GiudGiud
Dec 6, 2023
Collaborator

@zachmprince @somu15

1 reply

somu15 Dec 7, 2023

Hi. Did you modify the source code to take in 102 length factors? If so, what did you change in the code?

Also, a GP performs poorly in such a high dimensional space. Even if the training is successful, it'd require a lot of training data. I recommend two things:

For your verification, benchmark MOOSE and Python on lower dimensional problem, maybe 10.
Use dimensionality reduction to reduce the number of length factors. A simple PCA should help you.

Reach out for any further questions.

Som

charlieowen3065 · 2023-12-07T21:49:18Z

charlieowen3065
Dec 7, 2023
Author

Hi Som!

To alter the length factor, I appended the code in MaternHalfIntCovariance.C to be:

void
MaternHalfIntCovariance::computeCovarianceMatrix(RealEigenMatrix & K,
                                                 const RealEigenMatrix & x,
                                                 const RealEigenMatrix & xp,
                                                 const bool is_self_covariance) const
{
  
  if ((unsigned)x.cols() != _length_factor.size()){
    
	if (_constant_length_factor == false){
	  mooseError("length_factor size does not match dimension of trainer input.");
	} else{
	  std::vector<Real> _new_length_factor;
	  _new_length_factor.resize(x.cols());
	  for (int i=0; i<x.cols(); i++){
		  _new_length_factor[i] = _length_factor[0];
	  }	  
	  
	  std::cout << "WARNING: length_factor size (" << _length_factor.size() << ") does not match dimension of trainer input (" << x.cols() << "). \n The input value has been copied (" << x.cols() << ") times. (MaternHalfIntCovariance)" << std::endl;
	  
	  // casting away const
	  std::vector<Real> *ptr;
	  ptr = (std::vector<Real>*)( &_length_factor );
	  *ptr = _new_length_factor;
	  //_length_factor = _new_length_factor;
	  
	  std::cout << "_length_factor.size(): " << _length_factor.size() << std::endl;
	  
	}
  
  }

(Before, it just took _length_factor to be the input value). Is this done correctly? In the evaluation stage, I print all the HPs that are loaded in, and this one seems to be correctly loaded in as a vector of 102 of the same points.

My feature size is high, but we do plan on keeping it at this scale. However, we do have a large number of training points (~2000).

I also used a much smaller dataset (8 features, ~1000 points), and it was still predicting zeros, so the error persists.

The output of both is in the form

gp_surroagte, gp_surrogate_std
1.39e-313, 1.39e-313
4.66e-313, 1.39e-313
4.66e-313, 1.39e-313
4.66e-313, 1.39e-313
..., ...

and really just repeats like that for all points.

So for some reason, it is only predicting zeros with 0-std. I assume it has to be with the code I added in (as this is the only difference between my code and the original), but I am not sure where the error would arise?

Thanks!

Charlie

7 replies

charlieowen3065 Dec 12, 2023
Author

I no longer think that my added code here is the issue, as I ran the case with only 8 features again, but deleted my code and placed it into the input properly, and I am still seeing no predictions (all 0's).

The only other code I have added was in

void
LoadCovarianceDataAction::load(GaussianProcess & model)
{
  const std::string & covar_type = model.getGPHandler().getCovarType();
  const std::unordered_map<std::string, Real> & map = model.getGPHandler().getHyperParamMap();
  const std::unordered_map<std::string, std::vector<Real>> & vec_map =
      model.getGPHandler().getHyperParamVectorMap();
  const UserObjectName & covar_name = model.name() + "_covar_func";

  InputParameters covar_params = _factory.getValidParams(covar_type);

  for (auto & p : map){
	
	std::cout << p.first << ": " << p.second << std::endl;
	
	// Added functionality -- was trying to assign "p" (an int) to a double ----------------------------------------------------------------- (Charlie Owen, 11/30/2023)
	if (covar_params.type(p.first) == "unsigned int"){
		std::cout << "WARNING: Parameter [" << p.first << "] is an 'unsigned int', and could therefore not be moved into \n a double. Functionality added to account for this. ~CO, 11/30/2023. (LoadCovarianceDataAction)" << std::endl;
		covar_params.set<unsigned int>(p.first) = p.second;
	}
	else {
      covar_params.set<Real>(p.first) = p.second;
	}
  }
  for (auto & p : vec_map){
	std::cout << "Here 2" << std::endl;
	// std::cout << p.first << ": " << p.second << std::endl;
    std::cout << p.first << ":" << std::endl;
	for (int i = 0; i < p.second.size(); i++){
		std::cout << i << ": " << p.second[i] << std::endl;
	}
	
	covar_params.set<std::vector<Real>>(p.first) = p.second;
  }
  _problem->addObject<CovarianceFunctionBase>(
      covar_type, covar_name, covar_params, /* threaded = */ false);

  model.setupCovariance(covar_name);
}

Where I added a line as it was trying to read "p" as a double, while it was declared an 'unsigned int' before. This was a wired bug, so maybe I am misinterpreting something here? What is the proper way to load in a Matern covariance function?

Without the piece I added, I was unable to do so, so maybe it is training fine, but I am loading it in wrong? The p should just be an int and represents nu=p+0.5 in the matern equation (or just p depending on the equation).

somu15 Dec 12, 2023

I was on personal leave and hence my responses have been delayed so far.

For the Matern Half Int covariance, p should be a positive integer. See the documentation here: https://mooseframework.inl.gov/source/surrogates/MaternHalfIntCovariance.html

As far as the zero predictions go, this typically relates to problem with your training data. Some problems could be:

Not standardizing the training data inputs and outputs. This is very important for GPs. You can standardize by simply setting standardize_params = 'true' and standardize_data = 'true'
The training data doesn't make physical sense.
Or some other issue.

Standardize the data and params and see how it goes. Also, when doing predictions, focus on both the mean and the std of the GP.

Why are you using the Matern Half Int kernel and not the Squared Exponential?

Answer selected by charlieowen3065

charlieowen3065 Dec 12, 2023
Author

Hi Som!

All fine - I appreciate the help!

I think I have it working now - looks like without standardize_data = 'true', the model doesn't make very good predictions? Why is that? I've done ML before, but I've never had to standardize my Y before, so I suppose that I hadn't though about doing it. Just seems like it shouldn't be necessary?

Either way, thanks for the help!

Charlie

somu15 Dec 12, 2023

This is a zero mean GP implementation we are dealing. Therefore, the Y's should also be standardized. See https://cs229.stanford.edu/section/cs229-gaussian_processes.pdf for the basics of a GP.

And, do you know why you're using the Matern kernel?

charlieowen3065 Dec 12, 2023
Author

Som,

Great, thanks!

For the Matern kernel, we are still in a preliminary stage in the process, so I don't have a better reason for picking it other than it seems to work better than other kernels for our specific data.

Is there a good method I should use to determine the best kernel for a dataset? My plan was to do offline hyperparameter tuning using various kernels and use the best model for our project.

Does this seem like a bad method?

somu15 Dec 12, 2023

Yes, you can start with manually determining the best kernel for a given training data set. See this link on kernel selection: https://www.cs.toronto.edu/~duvenaud/cookbook/

A Matern kernel work better than squared exponential if the training data is discontinuous. Squared exponential assumes smoothness and Matern kernel does not. But you have to train an additional parameter for the Matern. See this: https://andrewcharlesjones.github.io/journal/matern-kernels.html

But I'm glad the STM implementation is working now. Reach out if you have further questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training a Gaussian Process model in MOOSE #26267

{{title}}

Replies: 2 comments 8 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Training a Gaussian Process model in MOOSE #26267

charlieowen3065 Dec 6, 2023

Replies: 2 comments · 8 replies

GiudGiud Dec 6, 2023 Collaborator

somu15 Dec 7, 2023

charlieowen3065 Dec 7, 2023 Author

charlieowen3065 Dec 12, 2023 Author

somu15 Dec 12, 2023

charlieowen3065 Dec 12, 2023 Author

somu15 Dec 12, 2023

charlieowen3065 Dec 12, 2023 Author

somu15 Dec 12, 2023

charlieowen3065
Dec 6, 2023

Replies: 2 comments 8 replies

GiudGiud
Dec 6, 2023
Collaborator

charlieowen3065
Dec 7, 2023
Author

charlieowen3065 Dec 12, 2023
Author

charlieowen3065 Dec 12, 2023
Author

charlieowen3065 Dec 12, 2023
Author