-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] First draft of the API of WF generators for atomate v2.0. #1
base: master
Are you sure you want to change the base?
Conversation
Example (incomplete) for ScfWFGenerator. Firework generator "scf_fw" (incomplete and to be discussed...) for SCF firework (to be used in the ScfWFGenerator). Tasks (incomplete) needed for scf_fw : see in atomate/abinit/firetasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic question I guess: what is the logger in atomate ?
CopyAbinitOutputs: it might be related to how fireworks works, but could we not work with links ? Why do we have to copy output files in another directory ? If it is to use the output of a run as the input of a new run, in abinit v9, there is the possiblity to give a path to the file in the input file, could it be used instead ? (for instance, getwfk_filepath)
I'm just thinking about the type of calculations I run: electron-phonon stuff. I often have > 100G WFK files to handle, so linking is much better in this case.
Maybe we have to copy only because one could want to run different jobs on different clusters ?
NB: I saw the comments in fireworks/core.py, I guess this will be subject to discussion.
CreateAbinitFolders: I don't think it's worth to allow changing the folders
How are the pseudos going to be handled ?
I will probably have new comments/questions during the discussion :-)
Hi @gbrunin , Thank you for your comments. Here are already a few answers. The current pull request is more into the overall architecture of the workflows, so some of your comments (which are very good ones ;-)) may be postponed to a later discussion (see below, in that case its good to open an issue).
The get_logger method allows to set a logger (from the logging package, see https://docs.python.org/3/howto/logging.html) with the given name. The name will be the module, e.g. "atomate.abinit.firetasks.write_input_tasks". Then when you use the logging reporting functions, e.g. logging.info('I give some information') or logging.debug('This is a debug message'), the log messages contain the module in which they were "sent". If you look at the get_logger function, it will give you a message with the time at which the log message was sent, the "level" of the message (e.g. INFO, DEBUG, CRITICAL, ...), the name (i.e. the module), and then the message. Probably what is not clear is that there is no such reporting function that is used here. These will come with the implementation. I hope it is more clear.
The name is indeed not chosen appropriately. Linking is what is actually done currently in abiflows. We actually had a discussion about all the file management with @gpetretto yesterday, so you are completely right. Using the getwfk_filepath and the like is indeed one possible solution. I think at this stage it is not supported by abipy though but that's not a big deal. Actually, a similar question arises for pseudopotentials (and the .files file which is meant to disappear).
Sure, as stated above, you are right.
Its also something we were wondering with @gpetretto yesterday. Usually we work on the same cluster, but who knows... There is a similar functionality in atomate/vasp currently. We will ask them if they are using it or if they just implemented the thing and then never used it. Depending on that we may consider supporting copies between different clusters.
Ok
We've also talked about this with @gpetretto, there are probably two things there. The first one is how are they handled at the python level. The other one is how the actual abinit input file is written. For the latter, we have decided with Gian that we won't support the versions of abinit before 9.2 and only support running abinit without the .files file. That means that the pseudos will be handled with the pseudos variable (https://docs.abinit.org/variables/files/#pseudos), possibly together with the pp_dirpath. At the python level, we have different options. For pseudos from the pseudo-dojo, we could use a string identifier corresponding to a specific PD table/xc/version etc ... for a given atom. This will need a few changes in the pseudo_dojo package as well but it is worth it (there are some drawbacks with the current situation but I will not get into details here). We also need to allow for custom pseudos. You can open an issue (edit: I just opened it : #2 ) about this specific topic (e.g. "How to handle pseudopotentials"), in which we discuss options etc. This will be easier to keep track of exchanges about each topic instead of having them possibly split in many pull requests. As a simple rule, if a problem is not solved in a given pull request, an issue should probably be opened with the discussions.
Sure :) |
from atomate.abinit.firetasks.glue_tasks import CreateAbinitFolders | ||
from atomate.abinit.firetasks.glue_tasks import ResolveDeps | ||
from atomate.abinit.firetasks.glue_tasks import CopyAbinitOutputs | ||
from atomate.abinit.firetasks.write_input_tasks import WriteInput | ||
from atomate.abinit.firetasks.run_tasks import RunAbinit | ||
from atomate.abinit.firetasks.control_tasks import CheckTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a little bit nicer if you use relative import path (https://realpython.com/absolute-vs-relative-python-imports/) like:
from ..firetasks.glue_tasks import CreateAbinitFolders
from ..firetasks.glue_tasks import ResolveDeps
from ..firetasks.glue_tasks import CopyAbinitOutputs
from ..firetasks.write_input_tasks import WriteInput
from ..firetasks.run_tasks import RunAbinit
from ..firetasks.control_tasks import CheckTask
sometimes the code might look cleaner when you import the module itself. Of course, you need to use prefixes but that is actually a good thing. You can immediately see from where the eg WriteInput
(write_input_tasks.WriteInput
) coming...
from ..firetasks import glue_tasks
from ..firetasks import write_input_tasks
from ..firetasks import run_tasks
from ..firetasks import control_tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar vein, it can be nice get into the practice of defining an __all__
for each submodule so that other imports in that submodule do not get mixed with the public API.
For the case of theatomate.abinit.fireworks.core
submodule this would just be
__all__ = ("scf_fw", )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm usually not a fan of relative imports (its not always easy from where something is coming from) except when there are a lot of subsubsubpackages. I would say that we are somewhere in between here. I propose we see along the way what to do ? For the others, do you have a strong opinion on that ?
For the __all__
, I fully agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I also prefer the absolute path for the imports. It was just an idea ... :)
You can still group them together but if you prefer this way that is completely fine as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this. I propose to open an issue just to keep it in mind and in any case if we want to change, that's easy. For the __all__
, I will add it to the different modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we just leave formatting up to black
and isort --profile=black
. This will automatically handle grouping etc.
@@ -0,0 +1 @@ | |||
__author__ = 'David Waroquiers <david.waroquiers@uclouvain.be>' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use __init__.py
files to the group and simplify the importing of the classes. https://docs.python.org/3/tutorial/modules.html#importing-from-a-package
https://docs.python.org/3/tutorial/modules.html#intra-package-references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's something we can use to simplify the importing of the relevant classes. I see a big advantage for the actual objects that users will use, e.g. in the atomate/abinit/__init__.py
, we could specify the workflow generators so that the user can do:
from atomate.abinit import ScfWFGenerator
instead of
from atomate.abinit.workflows.core import ScfWFGenerator
Would you also do such definitions for "developer"-focused objects (a user of atomate is not likely to import a task) ? This is related to my comment about relative imports somehow. My point is that if we do that, when a developer reads such an import, it is not directly clear from where it is coming from. What do you think ?
class CreateAbinitFolders(FiretaskBase): | ||
""" | ||
Firetask to create in, out and tmp folders for abinit. | ||
TODO: is it worth to allow changing them ?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a nice way to specify the default values of the optional_params
list? I can only find fireworks where run_task
has to go through and say e.g. self.get("in_dir", INDIR_NAME)
which doesn't seem very scalable/reusable.
If it's clear what the defaults are then I don't see why you wouldn't allow people to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. This is currently not implemented in Fireworks as far as I know but it would be very beneficial. Strictly speaking, it's not part of this project but I propose we raise the point in fireworks itself. I'll contact Anubhav about it and maybe open an issue on the fireworks github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to contribute something to Fireworks for this, would be good to be a bit hands-on so I can be more effective at reviewing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed this was a long standing point of discussion. Everything originates from the fact that the FireTask is basically a dict and dropping this entirely would be problematic for backward compatibility reasons. There is this PR to make the FireTasks become dataclasses in python which will be way better materialsproject/fireworks#393 . It is currently closed, I suppose since Alex did not have time to finalize it. We can propose to pick it up from where he left and complete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to revisit that PR. I the last thing I need to do is convert some core Fireworks to the dataclass format and try running some atomate workflows. Having clear defaults and type hinting will be really useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI: I reopened the dataclass support PR in materialsproject/fireworks#418
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One kind of important thing here is that the CI is not running on pull requests to this fork (as we discussed previously). Of course, this isn't so important for this PR, but certainly going forward...
It may just be as easy as someone with admin privileges enabling CircleCI for this repo by logging into CircleCI via GH (https://circleci.com/vcs-authorize/).
Atomate seems to suggest that flake8/black to be run with pre-commit, it would be helpful to do that here so we don't argue about purely style changes, but I can't see pre-commit being run in the CI (we may need to ask them about this). In any case, you can install it locally when working on this repo by running pre-commit install
(assuming you have the dev reqs installed).
Indeed, I will take care of that, as you say, for this PR/discussion it is not really important but it will be afterwards. |
Proposition for the common base workflow generator API.
Example (incomplete) for ScfWFGenerator.
Example Firework generator "scf_fw" (incomplete and to be discussed...) for SCF firework (to be used in the ScfWFGenerator).
Example tasks (incomplete) needed for scf_fw : see in atomate/abinit/firetasks.
Edit: I have added the following image in which I have broken down the current monolithic ScfFWTask that is present in abiflows (could be useful to understand how things are working). Note that there are questions and technical details in the image that are most probably not relevant for this "Architecture/API" pull request/discussion.
![ScfFWTask_BreakDown](https://user-images.githubusercontent.com/3625296/95966046-f783bd80-0e0a-11eb-8122-b27489cd57e5.png)