-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #1130 - Can't run resstock in parallel #1131
Conversation
def checksum_dir_content(directory_path) | ||
files = Dir.glob('**/*', base: directory_path).select { |fn| File.file?(fn) } | ||
dir_checksum = Zlib::crc32(files.map { |rel_path| | ||
[ | ||
rel_path, | ||
File.mtime(File.join(directory_path, rel_path)), # mtime is affected by the copy, but we passed preserve = true | ||
File.size(File.join(directory_path, rel_path)) | ||
] | ||
}.to_s) | ||
return dir_checksum | ||
end | ||
|
||
def create_lib_folder(lib_dir, resources_dir, housing_characteristics_dir, debug: false) | ||
redo_needed = true | ||
if File.directory?(lib_dir) | ||
lib_resources_dir = File.join(lib_dir, File.basename(resources_dir)) | ||
resource_matches = checksum_dir_content(resources_dir) == checksum_dir_content(lib_resources_dir) | ||
if resource_matches | ||
lib_housing_characteristics_dir = File.join(lib_dir, File.basename(housing_characteristics_dir)) | ||
housing_matches = checksum_dir_content(housing_characteristics_dir) == checksum_dir_content(lib_housing_characteristics_dir) | ||
if housing_matches | ||
redo_needed = false | ||
elsif debug | ||
puts "Housing directory is outdated: #{lib_housing_characteristics_dir}" | ||
end | ||
elsif debug | ||
puts "Resources directory is outdated: #{lib_resources_dir}" | ||
end | ||
elsif debug | ||
puts "Creating lib_dir" | ||
end | ||
|
||
if !redo_needed | ||
if debug | ||
puts "Lib folder is up to date" | ||
end | ||
return | ||
end | ||
FileUtils.rm_rf(lib_dir) | ||
Dir.mkdir(lib_dir) | ||
FileUtils.cp_r(resources_dir, lib_dir) | ||
FileUtils.cp_r(housing_characteristics_dir, lib_dir) | ||
|
||
# Preserve object’s group, user and **modification time** on copying | ||
FileUtils.cp_r(resources_dir, lib_dir, preserve: true) | ||
FileUtils.cp_r(housing_characteristics_dir, lib_dir, preserve: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect if the resstock/lib folder has to be created in a "smart" way:
- If it's missing...
- If it's there, but the content is outdated
- To check whether that's outdated, I compute a checksum for all [filename, file modification time, file size].
- This is much faster than using OpenStudio::checksum (implementation given below for posterity)
def checksum_dir_content_os(directory_path)
files = Dir.glob(File.join(directory_path, '**/*')).select { |fn| File.file?(fn) }
dir_checksum = OpenStudio::checksum(files.map { |abs_path|
OpenStudio::checksum(OpenStudio::toPath(abs_path))
}.to_s)
return dir_checksum
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that one has to pass --debug
to run_analysis.rb to avoid wiping the lib folder at the end of the run as well.
arg = OpenStudio::Ruleset::OSArgument.makeStringArgument('buildstock_csv_path', true) | ||
arg.setDisplayName('Buildstock CSV File Path') | ||
arg.setDescription('Absolute/relative path of the buildstock CSV file, relative is compared to the lib/housing_characteristics_dir.') | ||
args << arg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New BuildExistingModel argument: buildstock_csv_path
.
Maybe that could be made optional and still default to lib/housing_characteristics/buildstock.csv
by setting the default value to buildstock.csv
.
buildstock_csv_path = args['buildstock_csv_path'] | ||
unless (Pathname.new buildstock_csv_path).absolute? | ||
buildstock_csv_path = File.absolute_path(File.join(characteristics_dir, buildstock_csv_path)) # Should have been generated by the Worker Initialization Script (run_sampling.rb) or provided by the project | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If buildstock_csv_path is absolute, use that. Otherwise get it relative to the lib/housing_characteristics dir
# TODO: this should write directly to the results_dir... | ||
# run_sampling_lib::write_csv should not take a relative path relative to | ||
# the resources/run_sampling_lib.rb but an absolute path | ||
create_buildstock_csv(project_directory, n_datapoints, outfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note.
buildstock_csv_path = File.join(results_dir, "buildstock.csv") | ||
FileUtils.cp(src, buildstock_csv_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If sampled, after copying it, use the copy in the results_dir instead of the lib/housing_characteristics/buildstock.csv
# If buildingstock_csv is absolute, just use that | ||
# If relative: relative to yml | ||
buildstock_csv_path = cfg['sampler']['args']['sample_file'] | ||
unless (Pathname.new buildstock_csv_path).absolute? | ||
buildstock_csv_path = File.expand_path(File.join(File.dirname(yml), buildstock_csv_path)) | ||
end | ||
|
||
buildstock_csv = CSV.read(des, headers: true) | ||
buildstock_csv = CSV.read(buildstock_csv_path, headers: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If precomputed, use the sample > args > sample_file.
- If absolute, all good
- If not, find it relative to the workflow YAML, as per before.
@@ -116,6 +123,7 @@ def run_workflow(yml, n_threads, measures_only, debug_arg, overwrite, building_i | |||
} | |||
|
|||
bld_exist_model_args = { | |||
'buildstock_csv_path': buildstock_csv_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass that buildstock_csv_path
argument to BuildExistingModel
@jmarrec So the basic idea here is making the path of the buildstock.csv file variable (instead of fixed at lib/housing_characteristics) so that you can use Couple of questions. |
…f it's already up to date Compute a checksum by checking file name, file modified time, and file size. It checksums match, you are guaranteed to have a resstock/lib folder that's fully up to date.
…csv_path instead of expecting it at lib/housing_characteristics/buildstock.csv
c346196
to
193a959
Compare
Yes, and don't wipe the lib folder if it's already there and up to date.
Your CI rules seem to protect against running on a branch from a fork... That's a bad setting IMHO. You should set it so workflows coming from a fork don't run automatically but required a maintainer to approve the run. Anyways, I'm cloning this PR onto my forked repo so it runs there: jmarrec#1 (2) If we did, I'd assume we'd see problems with the integration-tests job that uses buildstockbatch (where you haven't yet updated its workflow generator for the BuildExistingModel update). Is there a way to make the new Yes we can. We'll see if it's necessary. |
@jmarrec Should we still consider getting this PR into a state to be merged? |
@joseph-robertson I think the change makes sense, but as a maintainer you're always entitled to reject a pull request as something you don't want to maintain or don't see the point of. Up to you really. (I won't be mad if that's the case, just close it) The conflicts are trivial to fix and I can resolve them if needed. |
Superseded by #1230. |
Pull Request Description
I know the BuildExistingModel is in the OpenStudio-HPXML repo and changes should probably be made there, but I would like to make sure this has a chance of being included in ResStock before I make the changes over there as well.
Having the ability to run ResStock in parallel is something I actively need.
Running 125 "baseline" simulations on my 8 CPU / 16 cores machine:
I ran 3125 simulations, in paralle, in 2h30 minutes. And I plan to run a lot more.
This would have taken somewhere between 17 and 24 hours serially, rendering it completely unpractical
I'd very much like if this could be upstreamed in ResStock, as it may help other people and would reduce the burden of keeping my fork up to date.
Checklist
Not all may apply:
openstudio tasks.rb update_measures
has been run