-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow rinclude
in collect_results
to only select files which satisify all conditions
#413
Comments
It's a valid use case. I am not sure either. You are sure that there is no way to achieve this already with the given two keywords, using some advanced regex syntax? Create a regex expression that only matches both cionditions? |
There definitely is a way to put together a regex expression that matches both conditions, but it gets more complicated if you have several parameters. Imagine I have 5 parameters put into the filename with I guess I'm envisioning trying to get the same ease I enjoy with functions like |
check this out first: https://github.com/jkrumbiegel/ReadableRegex.jl and see if already provides this function you need. |
Okay, you were right, there is a regex that can match all the conditions (without worrying about order). If I wanted to match all files where r"(?=.*dt=0\.001.*)(?=.*dx=0\.01.*)(?=.*ic=(flat|round).*)^.*$" It uses "lookaheads" (which dt = 0.001
dx = 0.01
ic = ["flat", "round"]
desc = @strdict dt dx ic
# Escape periods
format_v(v) = replace(string(v), "." => "\\.")
# Make "or" statements for vectors in dictionary
function format_v(v::Vector{T}) where T
formatted_v = [format_v(v_i) for v_i in v]
return "($(join(formatted_v, "|")))"
end
# Generate the regex string
function strdict_regex(d)
query = ""
for (k,v) in d
query *= "(?=.*$k=$(format_v(v)))"
end
query *= "^.*\$"
return Regex(query)
end
strdict_regex(desc) For me, the spirit of DrWatson is that these kind of convenience functions are built in to make processing easier. So, this regex generator could be a provided utility or a dictionary from |
This is definitely within scope. However, I believe I do not understand what you mean by "a dictionary from Perhaps the simplest would be to add a new keyword, |
Ah yes. Sorry for the confusion. What I meant is that you could have a version of That seems easier to me because I can use the same procedure to save and retrieve the data. For example to generate data, I have something like the following: dt = 0.001
dx = 0.01
ic = ["flat", "round"]
params = @strdict dt dx ic
# Run simulation
results = simulation(params)
# Save
sim_name = savename("my_simulation", desc)
@tagsave(datadir(sim_name*".jld2"), results) Then to retrieve it, I could have: dt = 0.001
dx = 0.01
ic = ["flat", "round"]
params = @strdict dt dx ic
# Collect data
sim_name = savename("my_simulation", desc)
results = collect_results(datadir(); rinclude=[r"my_simulation"], params=params) To me that seems so clean. What I am currently doing to circumvent the issue is: dt = 0.001
dx = 0.01
ic = ["flat", "round"]
params = @strdict dt dx ic
# Collect data
sim_name = savename("my_simulation", desc)
params_regex = Regex("(?=.*my_simulation.*)" * strdict_regex(params).pattern)
results = collect_results(datadir(); rinclude=[params_regex]) |
right, but what would happen inside Yes, I agree that increasing the communication between savename and collect_Results would be great. If you can put together a PR that would be nice, because there we can talk with a concrete code implementation instead of the current situation where only the input is specified. |
Is your feature request related to a problem? Please describe.
Currently, in the
collect_results
function, therinclude
argument allows for specifying filename conditions (OR behavior) andrexclude
allows for ignoring filename conditions (NOT behavior). I'm looking for AND behavior. I want the loaded files to satisfy ALL conditions inrinclude
rather than satisfy any of them.As an example of where this may be useful, I'm running simulations with both different time and spatial steps and I only want to select those that have a specific combination of the two.
Describe the solution you'd like
I'm not sure exactly the best way to implement this, but it would probably look like adding a kwarg to
collect_results
either to modify how it handles therinclude
list or to specify an alternative list torinclude
andrexclude
.Describe alternatives you've considered
Currently, the undesirable results can be removed in a postprocessing step using tools from
DataFrames.jl
but that becomes prohibitive if you have large data because you need to load all of the results and then remove the unnecessary.The text was updated successfully, but these errors were encountered: