Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL Core tests fail sporadically #1010

Closed
joelreymont opened this issue Mar 1, 2024 · 15 comments
Closed

RL Core tests fail sporadically #1010

joelreymont opened this issue Mar 1, 2024 · 15 comments

Comments

@joelreymont
Copy link
Contributor

I have to re-run tests a few times to get this test to fail and it's always the same test.


     Testing Running tests...
WARNING: Method definition timeit_debug_enabled() in module ReinforcementLearningCore at /Users/joelr/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:180 overwritten at /Users/joelr/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:188.
OfflineAgent: Test Failed at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:69
  Expression: length(a_2.trajectory.container) == 5
   Evaluated: 6 == 5

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:69 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:43 [inlined]
 [5] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [6] top-level scope
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:5
OfflineAgent: Test Failed at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76
  Expression: length(agent.trajectory.container) in (0, 5)
   Evaluated: 6 in (0, 5)

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:43 [inlined]
 [5] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [6] top-level scope
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:5
OfflineAgent: Test Failed at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76
  Expression: length(agent.trajectory.container) in (0, 5)
   Evaluated: 6 in (0, 5)

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:43 [inlined]
 [5] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [6] top-level scope
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:5
OfflineAgent: Test Failed at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76
  Expression: length(agent.trajectory.container) in (0, 5)
   Evaluated: 6 in (0, 5)

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:43 [inlined]
 [5] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [6] top-level scope
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:5
OfflineAgent: Test Failed at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76
  Expression: length(agent.trajectory.container) in (0, 5)
   Evaluated: 6 in (0, 5)

Stacktrace:
 [1] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
 [2] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:76 [inlined]
 [3] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [4] macro expansion
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:43 [inlined]
 [5] macro expansion
   @ ~/.julia/juliaup/julia-1.10.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
 [6] top-level scope
   @ ~/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/policies/agent.jl:5
[ Info: initializing tictactoe state info cache...
[ Info: finished initializing tictactoe state info cache in 0.670085458 seconds
Test Summary:                                                   | Pass  Fail  Total   Time
ReinforcementLearningCore.jl                                    |  656     5    661  53.7s
  core                                                          |    3            3   1.1s
  TotalRewardPerEpisode                                         |  105          105   0.7s
  DoEveryNStep                                                  |   68           68   0.1s
  TimePerStep                                                   |   42           42   1.0s
  StepsPerEpisode                                               |   16           16   0.1s
  RewardsPerEpisode                                             |   33           33   0.0s
  DoOnExit                                                      |    1            1   0.0s
  DoEveryNEpisode                                               |   84           84   0.1s
  StopAfterStep                                                 |    2            2   0.0s
  ComposedStopCondition                                         |    1            1   0.0s
  StopAfterEpisode                                              |    6            6   0.0s
  StopAfterNoImprovement                                        |   12           12   0.2s
  agent.jl                                                      |   20     5     25   0.6s
    Agent Tests                                                 |   12           12   0.3s
    OfflineAgent                                                |    8     5     13   0.3s
  MultiAgentPolicy                                              |    1            1   0.0s
  MultiAgentHook                                                |    1            1   0.1s
  CurrentPlayerIterator                                         |    1            1   0.0s
  Basic TicTacToeEnv (Sequential) env checks                    |   15           15   1.3s
  next_player!                                                  |    1            1   0.0s
  Basic RockPaperScissors (simultaneous) env checks             |   22           22   0.5s
  Sequential Environments correctly ended by termination signal |    1            1   0.2s
  approximators.jl                                              |   10           10   4.3s
  base                                                          |   44           44   1.3s
  device                                                        |    4            4   0.3s
  StackFrames                                                   |    5            5   0.7s
  Approximators                                                 |  136          136  32.9s
  utils/distributions                                           |   22           22   6.4s
ERROR: LoadError: Some tests did not pass: 656 passed, 5 failed, 0 errored, 0 broken.
in expression starting at /Users/joelr/Work/Julia/ReinforcementLearning.jl/src/ReinforcementLearningCore/test/runtests.jl:13
ERROR: Package ReinforcementLearningCore errored during testing
@joelreymont
Copy link
Contributor Author

Git bisect together with lots of running of tests by hand points to commit e1d9e9e as the bad one

❯ git bisect good
e1d9e9e21a0a3955667a1276b1140b3b72bf9d4b is the first bad commit
commit e1d9e9e21a0a3955667a1276b1140b3b72bf9d4b
Author: Henri Dehaybe <[email protected]>
Date:   Thu Oct 26 10:11:22 2023 +0200

    Conservative Q-Learning (#995)

    * divide sac into functions

    * bump version

    * implement CQL

    * create OfflineAgent (does not collect online data)

    * working state

    * experiments working

    * typo

    * Tests pass

    * add finetuning

    * write doc

    * Update src/ReinforcementLearningCore/src/policies/agent/agent_base.jl

    * Update src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl

    * Apply suggestions from code review

    * add review suggestions

    * remove finetuning

    * fix a ProgressMeter deprecation warning

    ---------

    Co-authored-by: Jeremiah <[email protected]>

 src/ReinforcementLearningCore/Project.toml         |  5 +-
 .../src/core/stop_conditions.jl                    |  4 +-
 .../src/policies/agent/agent.jl                    |  1 +
 .../src/policies/agent/agent_base.jl               | 13 +--
 .../src/policies/agent/offline_agent.jl            | 76 +++++++++++++++++
 .../test/policies/agent.jl                         | 38 +++++++++
 src/ReinforcementLearningExperiments/Project.toml  |  2 +-
 .../experiments/Offline/JuliaRL_CQLSAC_Pendulum.jl | 98 ++++++++++++++++++++++
 .../Policy Gradient/JuliaRL_SAC_Pendulum.jl        |  2 +-
 .../src/ReinforcementLearningExperiments.jl        |  1 +
 .../test/runtests.jl                               |  1 +
 src/ReinforcementLearningZoo/Project.toml          |  5 +-
 .../src/ReinforcementLearningZoo.jl                |  1 +
 .../src/algorithms/algorithms.jl                   |  2 +-
 .../src/algorithms/offline_rl/CQL_SAC.jl           | 93 ++++++++++++++++++++
 .../src/algorithms/offline_rl/offline_rl.jl        |  4 +-
 .../src/algorithms/policy_gradient/sac.jl          | 45 ++++++----
 17 files changed, 357 insertions(+), 34 deletions(-)
 create mode 100644 src/ReinforcementLearningCore/src/policies/agent/offline_agent.jl
 create mode 100644 src/ReinforcementLearningExperiments/deps/experiments/experiments/Offline/JuliaRL_CQLSAC_Pendulum.jl
 create mode 100644 src/ReinforcementLearningZoo/src/algorithms/offline_rl/CQL_SAC.jl

@jeremiahpslewis
Copy link
Member

OfflineAgent seems to be the culprit...

@joelreymont
Copy link
Contributor Author

I'm trying to figure this out...

@joelreymont joelreymont changed the title Test failing sporadically RL Core tests fail sporadically Mar 1, 2024
@joelreymont
Copy link
Contributor Author

I've spent 2-3 days digging into this already and it's time to ask for help!

I have figured out what's going on with this function but I can't figure out why

Base.push!(::OfflineAgent{P,T, <: OfflineBehavior{Nothing}}, ::PreExperimentStage, env::AbstractEnv) where {P,T} = nothing
#fills the trajectory with interactions generated with the behavior_agent at the PreExperimentStage.
function Base.push!(agent::OfflineAgent{P,T, <: OfflineBehavior{<:Agent}}, ::PreExperimentStage, env::AbstractEnv) where {P,T}
    is_stop = false
    policy = agent.offline_behavior.agent
    steps = 0
    while !is_stop
        reset!(env)
        push!(policy, PreEpisodeStage(), env)
        while !agent.offline_behavior.reset_condition(policy, env) # one episode
            steps += 1
            push!(policy, PreActStage(), env)
            action = RLBase.plan!(policy, env)
            act!(env, action)
            push!(policy, PostActStage(), env, action)
            if steps >= agent.offline_behavior.steps
                is_stop = true
                break
            end
        end # end of an episode
    push!(policy, PostEpisodeStage(), env)
    end    
end

If agent.offline_behavior.reset_condition is not triggered then the test completes just fine. Otherwise, we get an extra item in the agent.trajectory.container. The reason for getting the extra item is that we reset the environment at the top of the outer loop and then push!(policy, PreEpisodeStage(), env). This push does not insert anything at the beginning of the function when steps is 0 but does insert an item if we call the function again.

I inserted printouts after each push into the trajectory container and can see this behavior clearly. I also tried to dig down into the trajectory push method and further down. For the life of me I can't figure out why container length does not increase at the beginning of the iteration!

>>> iterating with step 0 and container length 0
container length 0 after env reset

XXX nothing is inserted here  by "push!(policy, PreEpisodeStage(), env)"

starting episode loop with step 0 and container length 0
container = []
steps = 1
container before pushing PreActStage = []
container after pushing PreActStage = []
container after acting = []
container after pushing PostActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
steps = 2
container before pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after acting = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PostActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false)]
steps = 3
container before pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false)]
container after acting = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PostActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true)]
ending episode. steps = 3, ended = true
container length 3 after pushing PostEpisodeStage
container after episode = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true)]

>>> iterating with step 3 and container length 3
container length 3 after env reset

XXX one item is inserted here  by "push!(policy, PreEpisodeStage(), env)"

starting episode loop with step 3 and container length 4
container = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false)]
steps = 4
container before pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false)]
container after pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false)]
container after acting = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false)]
container after pushing PostActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
steps = 5
container before pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PreActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after acting = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false)]
container after pushing PostActStage = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 4, action = 1, reward = 0.0f0, terminal = false)]
stopping at 5 steps!
ending episode. steps = 5, ended = false
container length 6 after pushing PostEpisodeStage
container after episode = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 4, action = 1, reward = 0.0f0, terminal = false)]
final container = [(state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 6, action = 2, reward = 0.0f0, terminal = false), (state = 6, next_state = 7, action = 2, reward = 1.0f0, terminal = true), (state = 7, next_state = 4, action = 0, reward = 0.0f0, terminal = false), (state = 4, next_state = 5, action = 2, reward = 0.0f0, terminal = false), (state = 5, next_state = 4, action = 1, reward = 0.0f0, terminal = false)]Ï

@jeremiahpslewis
Copy link
Member

Thanks for looking into this!!! I’ll dive into it tomorrow. :)

@joelreymont
Copy link
Contributor Author

I wish I could set breakpoints in tests (I'm using VSCode) but that seems to be impossible. I read existing Discourse threads and experimented with TestItemRunner to no avail.

FYI, we first hit the AbstractAgent push method and then jump over to the Trajectory push method.

@joelreymont
Copy link
Contributor Author

I feel stupid now but this is the classic case of solving a problem by talking to a rubber duckie. Asking for help works just as well :-). I missed the EpisodesBuffer push method which is likely the one eating up a push. Digging deeper!

@joelreymont
Copy link
Contributor Author

joelreymont commented Mar 4, 2024

This seems to allow for eb.traces to be empty after an insert. How can traces be empty after an insert, though?

@joelreymont
Copy link
Contributor Author

What I would like to figure out is who is making the decision to count or not count the first trace. If the first trace does not get inserted at all then how does it show up when I print out the traces at the end of the test?

@joelreymont
Copy link
Contributor Author

I printed out the value of partial and it's true on the first insert of PreEpisodeStage as well as the second (if the reset condition is triggered mid episode).

@joelreymont
Copy link
Contributor Author

There seems to be nothing wrong with the code and the test is buggy. Fixing the test instead.

@jeremiahpslewis
Copy link
Member

Awesome. Merged. Thanks!

@HenriDeh
Copy link
Member

HenriDeh commented Mar 7, 2024

Oof, sorry I was on vacations, I could have helped. Brave of you to dig into all that.

@joelreymont
Copy link
Contributor Author

Thank you Henri! I learned a lot about Julia in the process.

Trying to get into reinforcement learning now, pun intended!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants