[WIP] First draft of the API of WF generators for atomate v2.0. #1

davidwaroquiers · 2020-10-12T13:10:47Z

Proposition for the common base workflow generator API.
Example (incomplete) for ScfWFGenerator.
Example Firework generator "scf_fw" (incomplete and to be discussed...) for SCF firework (to be used in the ScfWFGenerator).
Example tasks (incomplete) needed for scf_fw : see in atomate/abinit/firetasks.

Edit: I have added the following image in which I have broken down the current monolithic ScfFWTask that is present in abiflows (could be useful to understand how things are working). Note that there are questions and technical details in the image that are most probably not relevant for this "Architecture/API" pull request/discussion.

Example (incomplete) for ScfWFGenerator. Firework generator "scf_fw" (incomplete and to be discussed...) for SCF firework (to be used in the ScfWFGenerator). Tasks (incomplete) needed for scf_fw : see in atomate/abinit/firetasks

gbrunin

Basic question I guess: what is the logger in atomate ?

CopyAbinitOutputs: it might be related to how fireworks works, but could we not work with links ? Why do we have to copy output files in another directory ? If it is to use the output of a run as the input of a new run, in abinit v9, there is the possiblity to give a path to the file in the input file, could it be used instead ? (for instance, getwfk_filepath)
I'm just thinking about the type of calculations I run: electron-phonon stuff. I often have > 100G WFK files to handle, so linking is much better in this case.
Maybe we have to copy only because one could want to run different jobs on different clusters ?
NB: I saw the comments in fireworks/core.py, I guess this will be subject to discussion.

CreateAbinitFolders: I don't think it's worth to allow changing the folders

How are the pseudos going to be handled ?

I will probably have new comments/questions during the discussion :-)

davidwaroquiers · 2020-10-14T08:25:23Z

Hi @gbrunin ,

Thank you for your comments. Here are already a few answers. The current pull request is more into the overall architecture of the workflows, so some of your comments (which are very good ones ;-)) may be postponed to a later discussion (see below, in that case its good to open an issue).

Basic question I guess: what is the logger in atomate ?

The get_logger method allows to set a logger (from the logging package, see https://docs.python.org/3/howto/logging.html) with the given name. The name will be the module, e.g. "atomate.abinit.firetasks.write_input_tasks". Then when you use the logging reporting functions, e.g. logging.info('I give some information') or logging.debug('This is a debug message'), the log messages contain the module in which they were "sent". If you look at the get_logger function, it will give you a message with the time at which the log message was sent, the "level" of the message (e.g. INFO, DEBUG, CRITICAL, ...), the name (i.e. the module), and then the message. Probably what is not clear is that there is no such reporting function that is used here. These will come with the implementation. I hope it is more clear.

CopyAbinitOutputs: it might be related to how fireworks works, but could we not work with links ? Why do we have to copy output files in another directory ? If it is to use the output of a run as the input of a new run, in abinit v9, there is the possiblity to give a path to the file in the input file, could it be used instead ? (for instance, getwfk_filepath)

The name is indeed not chosen appropriately. Linking is what is actually done currently in abiflows. We actually had a discussion about all the file management with @gpetretto yesterday, so you are completely right. Using the getwfk_filepath and the like is indeed one possible solution. I think at this stage it is not supported by abipy though but that's not a big deal. Actually, a similar question arises for pseudopotentials (and the .files file which is meant to disappear).

I'm just thinking about the type of calculations I run: electron-phonon stuff. I often have > 100G WFK files to handle, so linking is much better in this case.

Sure, as stated above, you are right.

Maybe we have to copy only because one could want to run different jobs on different clusters ?

Its also something we were wondering with @gpetretto yesterday. Usually we work on the same cluster, but who knows... There is a similar functionality in atomate/vasp currently. We will ask them if they are using it or if they just implemented the thing and then never used it. Depending on that we may consider supporting copies between different clusters.

NB: I saw the comments in fireworks/core.py, I guess this will be subject to discussion.

CreateAbinitFolders: I don't think it's worth to allow changing the folders

Ok

How are the pseudos going to be handled ?

We've also talked about this with @gpetretto, there are probably two things there. The first one is how are they handled at the python level. The other one is how the actual abinit input file is written. For the latter, we have decided with Gian that we won't support the versions of abinit before 9.2 and only support running abinit without the .files file. That means that the pseudos will be handled with the pseudos variable (https://docs.abinit.org/variables/files/#pseudos), possibly together with the pp_dirpath. At the python level, we have different options. For pseudos from the pseudo-dojo, we could use a string identifier corresponding to a specific PD table/xc/version etc ... for a given atom. This will need a few changes in the pseudo_dojo package as well but it is worth it (there are some drawbacks with the current situation but I will not get into details here). We also need to allow for custom pseudos. You can open an issue (edit: I just opened it : #2 ) about this specific topic (e.g. "How to handle pseudopotentials"), in which we discuss options etc. This will be easier to keep track of exchanges about each topic instead of having them possibly split in many pull requests. As a simple rule, if a problem is not solved in a given pull request, an issue should probably be opened with the discussions.

I will probably have new comments/questions during the discussion :-)

Sure :)

fekad · 2020-10-14T11:41:37Z

atomate/abinit/fireworks/core.py

+from atomate.abinit.firetasks.glue_tasks import CreateAbinitFolders
+from atomate.abinit.firetasks.glue_tasks import ResolveDeps
+from atomate.abinit.firetasks.glue_tasks import CopyAbinitOutputs
+from atomate.abinit.firetasks.write_input_tasks import WriteInput
+from atomate.abinit.firetasks.run_tasks import RunAbinit
+from atomate.abinit.firetasks.control_tasks import CheckTask


It might be a little bit nicer if you use relative import path (https://realpython.com/absolute-vs-relative-python-imports/) like:

from ..firetasks.glue_tasks import CreateAbinitFolders from ..firetasks.glue_tasks import ResolveDeps from ..firetasks.glue_tasks import CopyAbinitOutputs from ..firetasks.write_input_tasks import WriteInput from ..firetasks.run_tasks import RunAbinit from ..firetasks.control_tasks import CheckTask

sometimes the code might look cleaner when you import the module itself. Of course, you need to use prefixes but that is actually a good thing. You can immediately see from where the eg WriteInput (write_input_tasks.WriteInput) coming...

from ..firetasks import glue_tasks from ..firetasks import write_input_tasks from ..firetasks import run_tasks from ..firetasks import control_tasks

In a similar vein, it can be nice get into the practice of defining an __all__ for each submodule so that other imports in that submodule do not get mixed with the public API.

For the case of theatomate.abinit.fireworks.core submodule this would just be

__all__ = ("scf_fw", )

I'm usually not a fan of relative imports (its not always easy from where something is coming from) except when there are a lot of subsubsubpackages. I would say that we are somewhere in between here. I propose we see along the way what to do ? For the others, do you have a strong opinion on that ?

For the __all__, I fully agree.

To be honest I also prefer the absolute path for the imports. It was just an idea ... :)
You can still group them together but if you prefer this way that is completely fine as well...

I don't have a strong opinion on this. I propose to open an issue just to keep it in mind and in any case if we want to change, that's easy. For the __all__, I will add it to the different modules.

I suggest we just leave formatting up to black and isort --profile=black. This will automatically handle grouping etc.

atomate/abinit/fireworks/core.py

atomate/common/workflows/base.py

fekad · 2020-10-14T12:25:26Z

atomate/abinit/firetasks/__init__.py

@@ -0,0 +1 @@
+__author__ = 'David Waroquiers <david.waroquiers@uclouvain.be>'


you can use __init__.py files to the group and simplify the importing of the classes. https://docs.python.org/3/tutorial/modules.html#importing-from-a-package
https://docs.python.org/3/tutorial/modules.html#intra-package-references

Sure, that's something we can use to simplify the importing of the relevant classes. I see a big advantage for the actual objects that users will use, e.g. in the atomate/abinit/__init__.py, we could specify the workflow generators so that the user can do:

from atomate.abinit import ScfWFGenerator

instead of

from atomate.abinit.workflows.core import ScfWFGenerator

Would you also do such definitions for "developer"-focused objects (a user of atomate is not likely to import a task) ? This is related to my comment about relative imports somehow. My point is that if we do that, when a developer reads such an import, it is not directly clear from where it is coming from. What do you think ?

ml-evs · 2020-10-14T12:34:20Z

atomate/abinit/firetasks/glue_tasks.py

+class CreateAbinitFolders(FiretaskBase):
+    """
+    Firetask to create in, out and tmp folders for abinit.
+    TODO: is it worth to allow changing them ??


Is there a nice way to specify the default values of the optional_params list? I can only find fireworks where run_task has to go through and say e.g. self.get("in_dir", INDIR_NAME) which doesn't seem very scalable/reusable.

If it's clear what the defaults are then I don't see why you wouldn't allow people to change it.

That's a good point. This is currently not implemented in Fireworks as far as I know but it would be very beneficial. Strictly speaking, it's not part of this project but I propose we raise the point in fireworks itself. I'll contact Anubhav about it and maybe open an issue on the fireworks github.

I'm happy to contribute something to Fireworks for this, would be good to be a bit hands-on so I can be more effective at reviewing here

Indeed this was a long standing point of discussion. Everything originates from the fact that the FireTask is basically a dict and dropping this entirely would be problematic for backward compatibility reasons. There is this PR to make the FireTasks become dataclasses in python which will be way better materialsproject/fireworks#393 . It is currently closed, I suppose since Alex did not have time to finalize it. We can propose to pick it up from where he left and complete it.

I need to revisit that PR. I the last thing I need to do is convert some core Fireworks to the dataclass format and try running some atomate workflows. Having clear defaults and type hinting will be really useful.

Just FYI: I reopened the dataclass support PR in materialsproject/fireworks#418

ml-evs

One kind of important thing here is that the CI is not running on pull requests to this fork (as we discussed previously). Of course, this isn't so important for this PR, but certainly going forward...

It may just be as easy as someone with admin privileges enabling CircleCI for this repo by logging into CircleCI via GH (https://circleci.com/vcs-authorize/).

Atomate seems to suggest that flake8/black to be run with pre-commit, it would be helpful to do that here so we don't argue about purely style changes, but I can't see pre-commit being run in the CI (we may need to ask them about this). In any case, you can install it locally when working on this repo by running pre-commit install (assuming you have the dev reqs installed).

davidwaroquiers · 2020-10-15T06:52:55Z

One kind of important thing here is that the CI is not running on pull requests to this fork (as we discussed previously). Of course, this isn't so important for this PR, but certainly going forward...

It may just be as easy as someone with admin privileges enabling CircleCI for this repo by logging into CircleCI via GH (https://circleci.com/vcs-authorize/).

Atomate seems to suggest that flake8/black to be run with pre-commit, it would be helpful to do that here so we don't argue about purely style changes, but I can't see pre-commit being run in the CI (we may need to ask them about this). In any case, you can install it locally when working on this repo by running pre-commit install (assuming you have the dev reqs installed).

Indeed, I will take care of that, as you say, for this PR/discussion it is not really important but it will be afterwards.

First draft of the API of WF generators for atomate v2.0.

575ce48

Example (incomplete) for ScfWFGenerator. Firework generator "scf_fw" (incomplete and to be discussed...) for SCF firework (to be used in the ScfWFGenerator). Tasks (incomplete) needed for scf_fw : see in atomate/abinit/firetasks

davidwaroquiers assigned gpetretto Oct 12, 2020

davidwaroquiers requested review from gbrunin and ml-evs October 13, 2020 13:54

gbrunin reviewed Oct 14, 2020

View reviewed changes

davidwaroquiers requested review from fekad, gpetretto and ghoshtk October 14, 2020 09:16

fekad reviewed Oct 14, 2020

View reviewed changes

atomate/abinit/fireworks/core.py Show resolved Hide resolved

fekad reviewed Oct 14, 2020

View reviewed changes

atomate/common/workflows/base.py Show resolved Hide resolved

fekad reviewed Oct 14, 2020

View reviewed changes

ml-evs reviewed Oct 14, 2020

View reviewed changes

Added __all__ to the different packages.

4851f4f

ml-evs requested changes Oct 14, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] First draft of the API of WF generators for atomate v2.0. #1

[WIP] First draft of the API of WF generators for atomate v2.0. #1

davidwaroquiers commented Oct 12, 2020 •

edited

Loading

gbrunin left a comment

davidwaroquiers commented Oct 14, 2020 •

edited

Loading

fekad Oct 14, 2020

ml-evs Oct 14, 2020

davidwaroquiers Oct 14, 2020

fekad Oct 14, 2020

davidwaroquiers Oct 14, 2020 •

edited

Loading

utf Oct 16, 2020

fekad Oct 14, 2020 •

edited

Loading

davidwaroquiers Oct 14, 2020

ml-evs Oct 14, 2020

davidwaroquiers Oct 14, 2020

ml-evs Oct 14, 2020

gpetretto Oct 14, 2020 •

edited

Loading

utf Oct 15, 2020

utf Oct 16, 2020

ml-evs left a comment

davidwaroquiers commented Oct 15, 2020

		@@ -0,0 +1 @@
		__author__ = 'David Waroquiers <david.waroquiers@uclouvain.be>'

[WIP] First draft of the API of WF generators for atomate v2.0. #1

Are you sure you want to change the base?

[WIP] First draft of the API of WF generators for atomate v2.0. #1

Conversation

davidwaroquiers commented Oct 12, 2020 • edited Loading

gbrunin left a comment

Choose a reason for hiding this comment

davidwaroquiers commented Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidwaroquiers Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fekad Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gpetretto Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ml-evs left a comment

Choose a reason for hiding this comment

davidwaroquiers commented Oct 15, 2020

davidwaroquiers commented Oct 12, 2020 •

edited

Loading

davidwaroquiers commented Oct 14, 2020 •

edited

Loading

davidwaroquiers Oct 14, 2020 •

edited

Loading

fekad Oct 14, 2020 •

edited

Loading

gpetretto Oct 14, 2020 •

edited

Loading