Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve SMT Method Changes / Support for JENN Models #1217

Merged
merged 6 commits into from
Apr 30, 2024

Conversation

bpaul4
Copy link
Contributor

@bpaul4 bpaul4 commented Apr 10, 2024

Fixes/Addresses:

Resolves #1205 by updating SMT GENN syntax to the latest supported version, and by adding usage of JENN (a new dependency of SMT GENN) as well.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the copyright and license terms described in the LICENSE.md file at the top level of this directory.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@bpaul4 bpaul4 self-assigned this Apr 10, 2024
@bpaul4 bpaul4 requested a review from lbianchi-lbl April 10, 2024 23:05
@ksbeattie ksbeattie added the Priority:High High Priority Issue or PR label Apr 16, 2024
lbianchi-lbl
lbianchi-lbl previously approved these changes Apr 23, 2024
Copy link
Contributor

@lbianchi-lbl lbianchi-lbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor question about the possibility of using if __name__ == "__main__": for scripts, otherwise looks good as far as I can tell.

Comment on lines +112 to +151
# Main code

# import data
data = pd.read_csv(r"MEA_carbon_capture_dataset_mimo.csv")
grad0_data = pd.read_csv(r"gradients_output0.csv", index_col=0) # ignore 1st col
grad1_data = pd.read_csv(r"gradients_output1.csv", index_col=0) # ignore 1st col

xdata = data.iloc[:, :6] # there are 6 input variables/columns
ydata = data.iloc[:, 6:] # the rest are output variables/columns
xlabels = xdata.columns.tolist() # set labels as a list (default) from pandas
zlabels = ydata.columns.tolist() # is a set of IndexedDataSeries objects
xdata_bounds = {i: (xdata[i].min(), xdata[i].max()) for i in xdata} # x bounds
ydata_bounds = {j: (ydata[j].min(), ydata[j].max()) for j in ydata} # y bounds

xmax, xmin = xdata.max(axis=0), xdata.min(axis=0)
ymax, ymin = ydata.max(axis=0), ydata.min(axis=0)
xdata, ydata = np.array(xdata), np.array(ydata) # (n_m, n_x) and (n_m, n_y)
gdata = np.stack([np.array(grad0_data), np.array(grad1_data)]) # (2, n_m, n_x)

model_data = np.concatenate(
(xdata, ydata), axis=1
) # JENN requires a Numpy array as input

# define x and y data, not used but will add to variable dictionary
xdata = model_data[:, :-2]
ydata = model_data[:, -2:]

# create model
model, y_pred, SSE = create_model(x_train=xdata, y_train=ydata, grad_train=gdata)

with open("mea_column_model_jenn.pkl", "wb") as file:
pickle.dump(model, file)

# load model as pickle format
with open("mea_column_model_jenn.pkl", "rb") as file:
loaded_model = pickle.load(file)


print(y_pred)
print("SSE = ", SSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to wrap this code in a main() function, so that it's not run on import?

def main():
    # move code here


if __name__ == "__main__":
    main()

Please disregard if the current structure is required (e.g. the code needs to be at the module scope for the dynamic module import to work correctly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be possible for the code that calls the training method, as all of the training examples are only used to generate the model files. The training methods that create the models may need to be on the module level - Keras for sure needs this to register and load the model class object.

Some of the Keras example files are vehicles for the class objects, e.g. mea_column_model_customnormform.py must be copied to the working directory during testing to load the registered model class in the plugin code, and those non-training files don't have any "main" code to execute.

I can update that here, or in another PR.

Comment on lines 104 to +140
# Main code

# import data
data = pd.read_csv(r"MEA_carbon_capture_dataset_mimo.csv")
grad0_data = pd.read_csv(r"gradients_output0.csv", index_col=0) # ignore 1st col
grad1_data = pd.read_csv(r"gradients_output1.csv", index_col=0) # ignore 1st col

xdata = data.iloc[:, :6] # there are 6 input variables/columns
zdata = data.iloc[:, 6:] # the rest are output variables/columns
ydata = data.iloc[:, 6:] # the rest are output variables/columns
xlabels = xdata.columns.tolist() # set labels as a list (default) from pandas
zlabels = zdata.columns.tolist() # is a set of IndexedDataSeries objects
ylabels = ydata.columns.tolist() # is a set of IndexedDataSeries objects
xdata_bounds = {i: (xdata[i].min(), xdata[i].max()) for i in xdata} # x bounds
zdata_bounds = {j: (zdata[j].min(), zdata[j].max()) for j in zdata} # z bounds
ydata_bounds = {j: (ydata[j].min(), ydata[j].max()) for j in ydata} # y bounds

xmax, xmin = xdata.max(axis=0), xdata.min(axis=0)
zmax, zmin = zdata.max(axis=0), zdata.min(axis=0)
xdata, zdata = np.array(xdata), np.array(zdata) # (n_m, n_x) and (n_m, n_y)
gdata = np.stack([np.array(grad0_data), np.array(grad1_data)]) # (2, n_m, n_x)
ymax, ymin = ydata.max(axis=0), ydata.min(axis=0)
xdata, ydata = np.array(xdata), np.array(ydata) # (n_m, n_x) and (n_m, n_y)

model_data = np.concatenate(
(xdata, zdata), axis=1
(xdata, ydata), axis=1
) # Surrogate Modeling Toolbox requires a Numpy array as input

# define x and z data, not used but will add to variable dictionary
# define x and y data, not used but will add to variable dictionary
xdata = model_data[:, :-2]
zdata = model_data[:, -2:]
ydata = model_data[:, -2:]

# create model
model = create_model(x_train=xdata, z_train=zdata, grad_train=gdata)
model, y_pred, SSE = create_model(x_train=xdata, y_train=ydata)

with open("mea_column_model_smt.pkl", "wb") as file:
with open("mea_column_model_smtgenn.pkl", "wb") as file:
pickle.dump(model, file)

# load model as pickle format
with open("mea_column_model_smt.pkl", "rb") as file:
with open("mea_column_model_smtgenn.pkl", "rb") as file:
loaded_model = pickle.load(file)


print(y_pred)
print("SSE = ", SSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment for using if __name__ == "__main__":, if possible.

Copy link

codecov bot commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 64.10256% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 38.72%. Comparing base (8da0631) to head (f17afdd).

Files Patch % Lines
foqus_lib/framework/graph/node.py 64.10% 10 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1217      +/-   ##
==========================================
+ Coverage   38.54%   38.72%   +0.17%     
==========================================
  Files         164      164              
  Lines       37032    37067      +35     
  Branches     6132     6140       +8     
==========================================
+ Hits        14274    14354      +80     
+ Misses      21619    21569      -50     
- Partials     1139     1144       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lbianchi-lbl lbianchi-lbl self-requested a review April 30, 2024 19:58
@lbianchi-lbl lbianchi-lbl merged commit 0b7f3c8 into CCSI-Toolset:master Apr 30, 2024
31 checks passed
@bpaul4 bpaul4 deleted the fix-smt branch October 1, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority:High High Priority Issue or PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolve SMT Method Changes
3 participants