Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug with statement functions again #459

Closed
ecossevin opened this issue Dec 12, 2024 · 14 comments · Fixed by #470
Closed

bug with statement functions again #459

ecossevin opened this issue Dec 12, 2024 · 14 comments · Fixed by #470
Labels
bug Something isn't working

Comments

@ecossevin
Copy link

ecossevin commented Dec 12, 2024

What happened?

I have a bug in routines with statement functions : FMINJ in this case.

The code has a strange behavior, the bug is difficult to reproduce, this is why my reproducer is so big, sorry for that. Let me know if you can't reproduce the bug.

Using the version from #439 I have no bug. Using the lastest version of loki from main, the bug is back (end the commit fixing the bug is in loki latest main (git log | grep 97d0943).

What are the steps to reproduce the bug?

download loki pip install "loki @ git+https://github.com/ecmwf-ifs/loki.git"
run the test python3 -m pdb test.py and then continue through the breakpoint.

bug_statement.tar.gz

Version

latest, from today

Platform (OS and architecture)

Linux taranislogin1.taranishpc.meteo.fr 3.10.0-1160.102.1.el7.x86_64 #1 SMP Mon Sep 25 05:00:52 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux

Relevant log output

(Pdb) c
Traceback (most recent call last):
  File "/opt/softs/anaconda3/envs/Python310/lib/python3.10/pdb.py", line 1723, in main
    pdb._runscript(mainpyfile)
  File "/opt/softs/anaconda3/envs/Python310/lib/python3.10/pdb.py", line 1583, in _runscript
    self.run(statement)
  File "/opt/softs/anaconda3/envs/Python310/lib/python3.10/bdb.py", line 598, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/gmap/mrpm/cossevine/tmp/dbg/test.py", line 92, in <module>
    horizontal_idx=get_horizontal_idx(routine, lst_horizontal_idx)
  File "/home/gmap/mrpm/cossevine/tmp/dbg/test.py", line 62, in get_horizontal_idx
    routine.body=SubstituteExpressions(rename_map).visit(routine.body)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in visit_tuple
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 160, in <genexpr>
    visited = tuple(self.visit(i, **kwargs) for i in o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in visit_Node
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 185, in <genexpr>
    rebuilt = tuple(self.visit(i, **kwargs) for i in o.children)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/transformer.py", line 248, in visit
    obj = super().visit(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/visitor.py", line 124, in visit
    return meth(o, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/ir/expr_visitors.py", line 246, in visit_Expression
    return self.expr_mapper(o)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/expression/mappers.py", line 536, in __call__
    return super().__call__(expr, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/pymbolic/mapper/__init__.py", line 141, in __call__
    result = method(expr, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/pymbolic/mapper/__init__.py", line 495, in map_call_with_kwargs
    function = self.rec(expr.function, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/expression/mappers.py", line 536, in __call__
    return super().__call__(expr, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/pymbolic/mapper/__init__.py", line 141, in __call__
    result = method(expr, *args, **kwargs)
  File "/home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/expression/mappers.py", line 564, in map_variable_symbol
    assert expr.type is not None
AssertionError
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/gmap/mrpm/cossevine/tmp/test/venv3/lib/python3.10/site-packages/loki/expression/mappers.py(564)map_variable_symbol()
-> assert expr.type is not None

Accompanying data

No response

Organisation

météo france

@ecossevin ecossevin added the bug Something isn't working label Dec 12, 2024
@mlange05
Copy link
Collaborator

Hi @ecossevin ,

Unfortunately, I am still not able to reproduce the backtrace you showed above. However, I think you are using a different test.py to generate the backtrace. The one in the tarball you attached has effectively 19 lines (pasting content below), but your backtrace indicates problems at lines 62 and 92, which I don't have:

  File "/home/gmap/mrpm/cossevine/tmp/dbg/test.py", line 92, in <module>
    horizontal_idx=get_horizontal_idx(routine, lst_horizontal_idx)
  File "/home/gmap/mrpm/cossevine/tmp/dbg/test.py", line 62, in get_horizontal_idx
    routine.body=SubstituteExpressions(rename_map).visit(routine.body)

which indicates to me that you are running something other than the reproducer posted - unless I'm missing something?

The test that you posted runs fine for me:

(loki_env) $ python test.py
1
1

(loki_env) $ cat test.py
from loki import Sourcefile
from loki import fgen
from loki import do_resolve_associates
import logical
import sys
print(len(sys.argv))
print(len(sys.argv))

true_symbols = []
false_symbols = ['LHOOK']

#file_name = "ppwetpoint_bug_test.F90"
file_name = "ppwetpoint_no_bug.F90"
#file_name = "ppwetpoint_bug.F90"
source=Sourcefile.from_file(file_name)
src_name = "PPWETPOINT"
routine=source[src_name]
do_resolve_associates(routine)
routine_t1 = routine.clone()

@reuterbal
Copy link
Collaborator

Hi @ecossevin,

fwiw, I also can't reproduce the issue. Changing the file_name in test.py to the ppwetpoint_bug.F90 also doesn't seem to make a difference for me.

While looking at this, I've noticed the logical.py module that you packaged. It doesn't seem to be used in your reproducer, so I can only speculate about your use case, but it looks to me like this functionality should be mostly provided by internal Loki utilities already. I recommend taking a look at https://github.com/ecmwf-ifs/loki/blob/main/loki/transformations/remove_code.py, in particular the remove_dead_code option to the RemoveCodeTransformation. This will eliminate provably unreachable code.

@ecossevin
Copy link
Author

ecossevin commented Dec 23, 2024

Hi, sorry, indeed It was the wrong reproducer. I updated it
bug_statement.tar.gz

If you run python3 test.py there is no bug, run python3 -m pdb test.py.
The bug is still there with loki latest version, I tried on my computer and on the supercomputer.

@mlange05
Copy link
Collaborator

Hi @ecossevin, thanks for the updated test, I can see the relevant lines now. However, I'm still not able to reproduce the failure you claim, as the code "just works" for me. In particular, at the breakpoint you inserted in test.py, I get:

(loki_env) $ python3 test.py
dbg 60 <SymbolAttributes BasicType.DEFERRED>
> /lus/h2resw01/hpcperm/naml/bug_statement/test.py(62)get_horizontal_idx()
-> routine.body=SubstituteExpressions(rename_map).visit(routine.body)
(Pdb) fminj
DeferredTypeSymbol('FMINJ', None, None, None, False)
(Pdb) rename_map
{Scalar('JL', None, None, None, False): Scalar('JLON', None, None, <SymbolAttributes BasicType.INTEGER, kind=    JPIM>, False)}

If I then use dead code elimination before inspecting the generated (example below), I also get a very sensible looking version of the kernel that seems to have inlined all the relevant internal subroutines.

(Pdb) from loki import do_remove_dead_code
(Pdb) do_remove_dead_code(routine)
(Pdb) print(routine.to_fortran())

So, I'm not sure where to go from here, but this all seems fine to me. For reference, I'm using latest main.

@ecossevin
Copy link
Author

Hi, thanks for your answer.

Indeed, running python3 test.py with latest main works. The bug is when running python3 -m pdb test.py.

@ecossevin
Copy link
Author

Hi, please, is there any advancement on that isssue?
Thanks a lot.
Erwan

@mlange05
Copy link
Collaborator

mlange05 commented Jan 7, 2025

Hi, sorry, no, I don't see how -m pdb makes any difference here. It's still not triggering for me:

(loki_env) $ tar -xvf bug_statement_new.tar.gz
./
./test.py
./cuascn.F90
(loki_env) $ python3 -m pdb test.py
> /lus/h2resw01/hpcperm/naml/bug_statement/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) c
dbg 60 <SymbolAttributes BasicType.DEFERRED>
> /lus/h2resw01/hpcperm/naml/bug_statement/test.py(62)get_horizontal_idx()
-> routine.body=SubstituteExpressions(rename_map).visit(routine.body)
(Pdb) c
The program finished and will be restarted

and, as before, removing the inspect tools and breakpoint() call means it trivially runs to completion for me.

Is there any way you could narrow down the issue without using full scale source files? Or could you possibly find out which symbol is losing it's scope attribute (which is the likely cause for that particular last line in your backtrace to trigger)?

@mlange05
Copy link
Collaborator

mlange05 commented Jan 7, 2025

As I cannot actually reproduce this locally, the only other "crystal ball advice" I can give (purely from looking at the given traceback) is to check which symbol expr is actually missing a scope/type in loki/expression/mappers.py", line 564, in map_variable_symbol (something like if expr.type is None; breakpoint() might help)

Once that symbol is known, I would check on the line just before the call to SubstituteExpressions.visit() in line 62 if that symbol does indeed have an orphaned .scope attribute. If so, the issue is likely caused by inline_member_routine(), as this is the only piece of code in this test that would remove Subroutine/Scope objects. If that is indeed the case, you should be able to trim down the test to something manageable and repeatable (ideally a pared down MFE).

Sorry for not being more helpful, but without anything failing in a reproducible manner, there is precious little I can do here.

@ecossevin
Copy link
Author

ecossevin commented Jan 9, 2025

The difference can be the way we install loki ... I use this command : pip install "loki @ git+https://github.com/ecmwf-ifs/loki.git" --trusted-host pypi.org --trusted-host files.pythonhosted.org

This is the expr leading to the error:

(Pdb) expr
DeferredTypeSymbol('FMINJ', None, None, None, False)

Indeed i can try to have a look to where the variable loses it type, but the thing is that doing print can remove the bug, so it's difficult.

@reuterbal
Copy link
Collaborator

Sorry, I still can't reproduce the problem:

(venv) [nabr@ac6-101 erwan_bug]$ vim test.py  # remove the breakpoint()
(venv) [nabr@ac6-101 erwan_bug]$ python -m pdb test.py
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) c
dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) q

This is my command history:

(venv) [nabr@ac6-101 erwan_bug]$ history | tail -n 10
 1004  python3 -m venv venv
 1005  ml unload python3
 1006  venv/bin/python3 -m ensurepip --upgrade
 1007  venv/bin/python3 -m pip install "loki @ git+https://github.com/ecmwf-ifs/loki.git"
 1008  source venv/bin/activate
 1009  ls
 1010  python test.py
 1011  vim test.py
 1012  python -m pdb test.py
 1013  history | tail -n 10
(venv) [nabr@ac6-101 erwan_bug]$ python3 --version
Python 3.11.10

@ecossevin
Copy link
Author

ecossevin commented Jan 9, 2025

I'm going to try with your python version. Mine is Python 3.10.12.

Eric tried and said he didn't had the bug the first two times he run the test case, maybe if you run it 2/3 times the bug will appear?

@reuterbal
Copy link
Collaborator

Still no luck unfortunately:

(venv) [nabr@ac6-101 erwan_bug]$ for _ in {1..10}; do python test.py; done
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
dbg 60 <SymbolAttributes BasicType.DEFERRED>
(venv) [nabr@ac6-101 erwan_bug]$ for _ in {1..10}; do echo "quit" | python -m pdb -c continue test.py; done
dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo
(Pdb) dbg 60 <SymbolAttributes BasicType.DEFERRED>
The program finished and will be restarted
> /etc/ecmwf/nfs/dh1_home_a/nabr/loki/erwan_bug/test.py(4)<module>()
-> from inspect import currentframe, getframeinfo

Note that you should specify scope=routine when creating the JLON variable in l. 45 but it's unlikely that this is related to your problem.

When you hit the problem in a pdb session, can you please share what

expr._scope

is?

@ecossevin
Copy link
Author

ecossevin commented Jan 9, 2025

pers.py(564)map_variable_symbol()
-> assert expr.type is not None
(Pdb) expr._scope
<weakref at 0x75bb3274e700; dead>

@reuterbal
Copy link
Collaborator

Thanks for this, that helps. I think we're hitting some garbage collector race condition here due to an outdated scope reference.

I can reproduce this now. At the top of the test.py file, insert

import gc
gc.disable()

And I've put a breakpoint() just before the call to get_horizontal_idx at the end of the file.

With this, I can see the scope of fminj being an Associate which no longer exists in the routine. The Associate's parent is the routine, though, so this is likely a problem in the associate resolver.
Explicit garbage collection lets the scope reference become dead.

 python test.py
> .../erwan_bug/test.py(96)<module>()
-> gc.collect()
(Pdb) {var.name.lower(): var for var in FindVariables().visit(routine.body)}['fminj']
DeferredTypeSymbol('FMINJ', None, None, None, False)
(Pdb) {var.name.lower(): var for var in FindVariables().visit(routine.body)}['fminj'].scope
Associate:: YGFL%NACTAERO=NACTAERO, YDECLDP%LAERLIQAUTOCP=LAERLIQAUTOCP, YDECLDP%LAERLIQAUTOCPB=LAERLIQAUTOCPB, YDECLDP%RLMIN=RLMIN, YDCST%RCPD=RCPD, YDCST%RETV=RETV, YDCST%RG=RG, YDCST%RTT=RTT, YDTHF%RALFDCP=RALFDCP, YDTHF%RTBERCU=RTBERCU, YDTHF%RTICECU=RTICECU, YDECUMF%ENTRORG=ENTRORG, YDECUMF%ENTR_RH=ENTR_RH, YDECUMF%ENTSHALP=ENTSHALP, YDECUMF%RMFCFL=RMFCFL, YDECUMF%RMFCMIN=RMFCMIN, YDECUMF%RPRCON=RPRCON, YDECUMF%LMFGLAC=LMFGLAC, YDECUMF%LSCVLIQ=LSCVLIQ, YDEPHLI%LPHYLIN=LPHYLIN, YDEPHLI%RLPTRC=RLPTRC
(Pdb) {var.name.lower(): var for var in FindVariables().visit(routine.body)}['fminj'].scope.parent is routine
True
(Pdb) from loki import Associate, FindNodes
(Pdb) FindNodes(Associate).visit(routine.body)
[]
(Pdb) gc.collect()
115541
(Pdb) {var.name.lower(): var for var in FindVariables().visit(routine.body)}['fminj']._scope
<weakref at 0x14f6801721b0; dead>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants