[REG-2183] Initial QLearning Bot Segment #356

nAmKcAz · 2024-11-18T17:10:39Z

Updates Q Learning logic to allow it to be used as a bot segment
We can merge this, but I wouldn't publicize it yet. We have some research to do here on how best to utilize/train this in the context of bot sequences. Training is currently 'ick' as we don't have a systemic way to restart the whole game and resume the same sequence for training.
See also https://github.com/Regression-Games/RGBossRoom/pull/80

Find the pull request instructions here

Every reviewer and the owner of the PR should consider these points in their request (feel free to copy this checklist so you can fill it out yourself in the overall PR comment)

nAmKcAz · 2024-11-18T17:14:29Z

@vontell This 'might' be ready for review as an 'experiment', but certainly not ready for release. How do you want to handle review/discussions on this ?

I don't really want to keep it in a side branch as the SDK moves so fast that keeping that up to date / working would be tedious / wouldn't happen.

batu · 2024-11-19T21:32:58Z

I would love to integrate the eval pipeline to this, so we can see which reward approaches works better.

My vote would be to review/merge this soon. Would it be too nonsensical to put it under gg.regression.unity.experimental.q_learning? We can maybe remove the runespawn from gg.regression.unity.experimental.runespawn namespace, and just keep gg.regression.unity.experimental where all experimental work goes?

nAmKcAz · 2024-11-19T21:35:53Z

@batu, currently this is pretty much useless for training and evaluation since resetting the game for a whole bot sequence isn't really a thing

I could allow resetting similar to how the qlearningbot.cs does it where it force reloads the starting scene for each episode.. but this doesnn't work well for many games (inc: bossroom and all customer games so far)

nAmKcAz · 2024-11-20T13:59:12Z

As for experimental... to me our entire SDK is 'experimental' so I really don't fancy package/project names to try to hide things. I'm not going to die on this hill either though

To me they are all hidden unless we document them (As noted by our customers not being able to understand the product).

vontell

Excited to get this in and try it out myself at some point. @RG-nAmKcAz do you have any preference on what to tackle next here? Also, I see the boss room sample, but it would be good to have a quick message, instruction, or video on how to start this in boss room (even though the training process is bad as you mention). I just personally want to try it! I am guessing I can run that sequence in the other PR to use a trained model, but wasn't sure how to train myself.

...y.bots/Runtime/Scripts/StateRecorder/BotSegments/Models/BotActions/QLearningBotActionData.cs

vontell · 2024-11-20T15:23:46Z

...y.bots/Runtime/Scripts/StateRecorder/BotSegments/Models/BotActions/QLearningBotActionData.cs

+        public Dictionary<RewardType, int> rewardTypeRatios;
+
+        // training options
+        // TODO: Should we add an option that causes a 'game restart' AND restarts the sequence from the beginning when 'learning = true' ?


I'd be curious to see what the Unity ML Agents tutorials do for this. I suspect that maybe they always use special environments, but they might have some tricks. From what I can see, it looks like they train using a build of the game rather than the editor... maybe their harness in Python just repeatedly starts and quits that executable?

Yep, they 'assume' you're starting a runtime that IS the whole episode... that's why I say this thing is done, but not 'ready' as training is very tedious currently

you have to start up the game and get it to the right spot, then run your training segment episode

then manually reset the environment and do it again..

until you have your model 'trained'

then you can use the 'real' segment pointing to the same model file, but with training disabled

@vontell .. you will DEFINITELY need to train a model yourself... i got pissed after about an hour of babysitting the restarts for training and seeing no meaningful progress in the results

thus my comments of we need a better way to 'train' using sequences

That could be a big selling point - this is just an assumption, but the fact that you need to train an agent in a special environment and not really the game due to the issues you ran into is probably a big blocker... if I am a game developer, it's bad enough that I need to learn how to use the ML tool, and now I need to go and make a whole new scene to train in, that isn't even my full game? Sequences to get them into the game could be a cool selling point.

that's actually what i'm prototyping today per my slack message

i'm adding an option to the restart segment to allow restarting of the same sequence after restart

so I should be able to make a sequence for bossroom that

goes through the menus into the game, runs the qlearning for X time, restarts the game
... and restarts the same sequence.. effectively looping until the user stops it

@vontell see the new updates functionally here, and practically in https://github.com/Regression-Games/RGBossRoom/pull/80

The big change is that restart segments can now tell the game to restart that same sequence from the beginning after restart.

{ "name": "Restart Game and Restart the Sequence after", "keyFrameCriteria": [ {"type":"ActionComplete","transient":true,"data":{}} ], "botAction":{ "type":"RestartGame", "data":{ "restartSequenceAfterRestart": true } } }

...y.bots/Runtime/Scripts/StateRecorder/BotSegments/Models/BotActions/QLearningBotActionData.cs

...on.unity.bots/Runtime/Scripts/GenericBots/Experimental/Rewards/ActionCoverageRewardModule.cs

vontell · 2024-11-20T15:38:03Z

Also, sincerest apologies for not seeing your comment earlier a few days ago that tagged me, looks like my GitHub notifications got cleared... yes getting it in sooner rather than later is preferred, happy to treat it as experimental and not publicly documented as we play around with it and decide on next steps.

batu · 2024-11-20T20:43:57Z

I am approving this PR as the parts that are segment side seem good.

However, after looking through the details I am confident that we will not be able to train a model to do useful work with this specific State/Action representation

Here is an example state/action entry.

"Cinemachine.CinemachineBrain,Cinemachine.CinemachineComposer,Cinemachine.CinemachineFreeLook,Cinemachine.CinemachineOrbitalTransposer,Cinemachine.CinemachinePipeline,Cinemachine.CinemachineTargetGroup,Cinemachine.CinemachineVirtualCamera,EditorChildSceneLoader,Gameplay.RegressionGames.RGBossRoom.RGAttackObjectAction,RegressionGames.RGIconPulse,RegressionGames.RGTextPulse,RegressionGames.StateRecorder.BotSegments.BotSegmentsPlaybackController,RegressionGames.StateRecorder.KeyboardInputActionObserver,RegressionGames.StateRecorder.LoggingObserver,RegressionGames.StateRecorder.MouseInputActionObserver,RegressionGames.StateRecorder.ProfilerObserver,RegressionGames.StateRecorder.ReplayToolbarManager,RegressionGames.StateRecorder.ScreenRecorder,RegressionGames.StateRecorder.ScreenshotCapture,RegressionGames.StateRecorder.TransformObjectFinder,RegressionGames.StateRecorder.Types.RGExcludeFromState,RegressionGames.StateRecorder.VirtualMouseCursor,RGBotManager,RGBreakableObjectState,RGEnemyState,RGFollowObjectAction,RGPlayerState,RGSequenceManager,TMPro.TextMeshProUGUI,Unity.BossRoom.ApplicationLifecycle.ApplicationController,Unity.BossRoom.Audio.AudioMixerConfigurator,Unity.BossRoom.Audio.ClientMusicPlayer,Unity.BossRoom.CameraUtils.CameraController,Unity.BossRoom.ConnectionManagement.ConnectionManager,Unity.BossRoom.DebugCheats.DebugCheatsManager,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorFootstepSounds,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorTriggeredSpecialFX,Unity.BossRoom.Gameplay.GameplayObjects.Breakable,Unity.BossRoom.Gameplay.GameplayObjects.Character.CharacterSwap,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientAvatarGuidHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientPlayerAvatar,Unity.BossRoom.Gameplay.GameplayObjects.Character.NetworkAvatarGuidState,Unity.BossRoom.Gameplay.GameplayObjects.Character.PhysicsWrapper,Unity.BossRoom.Gameplay.GameplayObjects.Character.PlayerServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerAnimationHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacterMovement,Unity.BossRoom.Gameplay.GameplayObjects.DamageReceiver,Unity.BossRoom.Gameplay.GameplayObjects.EnemyPortal,Unity.BossRoom.Gameplay.GameplayObjects.GameDataSource,Unity.BossRoom.Gameplay.GameplayObjects.NetworkHealthState,Unity.BossRoom.Gameplay.GameplayObjects.NetworkLifeState,Unity.BossRoom.Gameplay.GameplayObjects.PersistentPlayer,Unity.BossRoom.Gameplay.GameplayObjects.PickUpState,Unity.BossRoom.Gameplay.GameplayObjects.PublishMessageOnLifeChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerDisplacerOnParentChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerWaveSpawner,Unity.BossRoom.Gameplay.GameState.ServerBossRoomState,Unity.BossRoom.Gameplay.RegressionGames.RGBossRoom.RGPerformSkillAction,Unity.BossRoom.Gameplay.UI.ClientBossRoomLoadingScreen,Unity.BossRoom.Gameplay.UI.ClientClickFeedback,Unity.BossRoom.Gameplay.UI.ConnectionAnimation,Unity.BossRoom.Gameplay.UI.ConnectionStatusMessageUIManager,Unity.BossRoom.Gameplay.UI.HeroActionBar,Unity.BossRoom.Gameplay.UI.PartyHUD,Unity.BossRoom.Gameplay.UI.PopupManager,Unity.BossRoom.Gameplay.UI.UIHUDButton,Unity.BossRoom.Gameplay.UI.UIMessageFeed,Unity.BossRoom.Gameplay.UI.UIMessageSlot,Unity.BossRoom.Gameplay.UI.UIName,Unity.BossRoom.Gameplay.UI.UISettingsCanvas,Unity.BossRoom.Gameplay.UI.UIStateDisplay,Unity.BossRoom.Gameplay.UI.UIStateDisplayHandler,Unity.BossRoom.Gameplay.UI.UITooltipDetector,Unity.BossRoom.Gameplay.UI.UnityServicesUIHandler,Unity.BossRoom.Gameplay.UserInput.ClientInputSender,Unity.BossRoom.Infrastructure.NetworkObjectPool,Unity.BossRoom.Infrastructure.UpdateRunner,Unity.BossRoom.Navigation.NavigationSystem,Unity.BossRoom.Utils.Editor.NetworkLatencyWarning,Unity.BossRoom.Utils.Editor.NetworkOverlay,Unity.BossRoom.Utils.EnableOrDisableColliderOnAwake,Unity.BossRoom.Utils.NetworkNameState,Unity.BossRoom.Utils.NetworkStats,Unity.BossRoom.VisualEffects.RandomizedLight,Unity.BossRoom.VisualEffects.ScrollingMaterialUVs,Unity.BossRoom.VisualEffects.SpecialFXGraphic,Unity.Multiplayer.Samples.BossRoom.Client.ClientPickUpPotEffects,Unity.Multiplayer.Samples.Utilities.DontDestroyOnLoad,Unity.Multiplayer.Samples.Utilities.LoadingProgressManager,Unity.Multiplayer.Samples.Utilities.NetcodeHooks,Unity.Multiplayer.Samples.Utilities.NetStatsMonitorCustomization,Unity.Multiplayer.Samples.Utilities.NetworkedLoadingProgressTracker,Unity.Multiplayer.Samples.Utilities.SceneLoaderWrapper,Unity.Multiplayer.Samples.Utilities.ServerAdditiveSceneLoader,Unity.Multiplayer.Tools.NetStatsMonitor.RuntimeNetStatsMonitor,Unity.Netcode.Components.NetworkAnimator,Unity.Netcode.Components.NetworkTransform,Unity.Netcode.NetworkManager,Unity.Netcode.NetworkObject,Unity.Netcode.Transports.UTP.UnityTransport,UnityEngine.AI.NavMeshAgent,UnityEngine.AI.NavMeshModifier,UnityEngine.AI.NavMeshSurface,UnityEngine.Animations.PositionConstraint,UnityEngine.Animator,UnityEngine.AudioListener,UnityEngine.AudioSource,UnityEngine.BoxCollider,UnityEngine.Camera,UnityEngine.Canvas,UnityEngine.CanvasGroup,UnityEngine.CanvasRenderer,UnityEngine.CapsuleCollider,UnityEngine.EventSystems.EventSystem,UnityEngine.InputSystem.UI.InputSystemUIInputModule,UnityEngine.Light,UnityEngine.LightProbeGroup,UnityEngine.LODGroup,UnityEngine.MeshCollider,UnityEngine.MeshFilter,UnityEngine.MeshRenderer,UnityEngine.ParticleSystem,UnityEngine.ParticleSystemRenderer,UnityEngine.RectTransform,UnityEngine.ReflectionProbe,UnityEngine.Rendering.DebugUpdater,UnityEngine.Rendering.Universal.UniversalAdditionalCameraData,UnityEngine.Rendering.Universal.UniversalAdditionalLightData,UnityEngine.Rendering.Volume,UnityEngine.Rigidbody,UnityEngine.SkinnedMeshRenderer,UnityEngine.Transform,UnityEngine.UI.Button,UnityEngine.UI.CanvasScaler,UnityEngine.UI.ContentSizeFitter,UnityEngine.UI.GraphicRaycaster,UnityEngine.UI.GridLayoutGroup,UnityEngine.UI.Image,UnityEngine.UI.Mask,UnityEngine.UI.ScrollRect,UnityEngine.UI.Slider,UnityEngine.UI.Text,UnityEngine.UI.VerticalLayoutGroup,UnityEngine.UIElements.PanelEventHandler,UnityEngine.UIElements.PanelRaycaster,VContainer.Unity.LifetimeScope:Digit4,Digit7": {
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit2Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit2\"}\n}False:PlayerAvatar0": 0.0413901322,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UI.UITooltipDetector/UITooltipDetector.cs/Mouse.current.position\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UI.UITooltipDetector, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"NON_UI\"\n}(0.75, 0.75):Button0": 0.00573281571,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MouseButtonAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.leftButton\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"mouseButtonFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"0\"}\n}True:PlayerAvatar0": 0.004450281,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.25, 0.75):PlayerAvatar0": 0.00634076959,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.75, 0.75):PlayerAvatar0": 0.00193079631,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit6Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit6\"}\n}True:PlayerAvatar0": 0.000657624158
    }

The keys are verbose, but more specifically, our state space is the existence of components, without the actual values associated with them. This state representation isn't useful to learn how to explore as it stands.

batu · 2024-11-20T20:50:47Z

Without access to any type of memory, I am skeptical our current rewards will converge to a model that explores generally. However, I need to do a bit more step by step hand calculation to see whether that is the case.

But the intuition is that our training doesn't result in a general explorer, but rather an explorer that reacts to its past exploration policy aka, given two corridors, our agent doesn't learn to explore both, but learns to pick the corridor that was explored least in the training process at test time.

batu · 2024-11-20T19:21:24Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+    public class QAction : IEquatable<QAction>
+    {
+        public RGGameAction Action;
+        public object ParamValue;


What is this ParamValue? Without the type it is hard understand, can you add a comment?

batu · 2024-11-20T19:26:37Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+        /// The default implementation concatenates all the active component type names
+        /// and includes the current keyboard/mouse button state as well.
+        /// </summary>
+        protected virtual string GetCurrentState()


Is my understanding correct that the current state (for the q table) is the names of all components + action names.

This suggests to me that we are not including the values of such components, such position, health etc, but just the existence of Transform/Health?

Expanded on this below

batu · 2024-11-20T19:34:20Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+                }
+            } else if (act.ParameterRange is RGContinuousValueRange contRange)
+            {
+                var ranges = contRange.Discretize(4);


This should be made variable so we can tweak it. The way we discretize actions will affect things drastically.

I would also go for a default value that is odd, (5?). A common action range is [-1, 1] and that would get discretized to [[-1.0 to -0.5], [-0.5 to 0.0], [0.0 to 0.5], [0.5 to 1.0]]. Depending on how we do the comparison, 0 and (-)0.5 will fall into a same bucket, which they shouldn't.

batu · 2024-11-20T19:35:26Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+                    actionSpace.Add(new QAction(act, param));
+                }
+            }
+            return actionSpace;


would be useful to log the size of the action space, as this is also another metric that affects the convergence of the algorithm

batu · 2024-11-20T19:37:52Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+        private string _lastActionKey;
+        private List<RGActionInput> _lastInputs = new();
+        private float? _lastActionTime;
+        private float _epsilon;


How do we use the epsilon value? Is this for epsilon_greedy decay?

batu · 2024-11-20T19:50:02Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+                }
+
+                // Update Q-table with experience
+                foreach (var exp in _experienceBuf)


I saw that we default to 64 steps for the size of the experienceBuffer, that is pretty small. However, I recognize that increasing the size can have performance issues. To combat that we can update our q_table not every frame but once every n frames.

Currently with the defaults of ActionInterval = 0.05f; and bufferSize of 64, we are effectively limiting ourselves to updating from 3.2 seconds worth of data. This might make convergence really difficult. (will circle back to this idea after reading to rewards)

Expanded on this below

batu · 2024-11-20T19:53:07Z

...on.unity.bots/Runtime/Scripts/GenericBots/Experimental/Rewards/CameraPositionRewardModule.cs

@@ -5,46 +5,37 @@ namespace RegressionGames.GenericBots.Experimental.Rewards
 {
    /// <summary>
    /// Generic exploration reward module.


I think this is a good start, we just need to be mindful about genre/reward style matching in the future. For example, in a third person camera perspective, spinning the camera around and moving a little bit will generate a good amount of reward. But yeah, this is why we will try different rewards.

batu · 2024-11-20T19:54:30Z

...on.unity.bots/Runtime/Scripts/GenericBots/Experimental/Rewards/CameraPositionRewardModule.cs

            {
-                numVisits = 0;
+                return 0.0f; // no main camera, don't use this reward


I wouldn't expect this to return 0 silently. If there is no camera, and I am using camera based reward, I should be somehow notified. Otherwise I will just get a random agent without any other notice.

batu · 2024-11-20T20:35:41Z

src/gg.regression.unity.bots/Runtime/Scripts/GenericBots/Experimental/QLearningBotLogic.cs

+        public IRewardModule RewardModule = new CameraPositionRewardModule();
+
+        private List<QAction> _actionSpace = new();
+        private Dictionary<string, Dictionary<string, float>> _qTable = new();


Would be good to document what the key/values are

batu · 2024-11-20T20:53:44Z

Oh and, this is for future, but if we end up productionizing this, I think a similar "save the latest bot in the project" would be smart (over saving it in the AppData

batu · 2024-11-20T20:56:01Z

If this is something we are going to be pushing forward, we CAN NOT, make meaningful progress on bossroom. We would need dedicated toy test environments that are incredibly simple but validate the approach.

I now wear a wig because I used rl in my phd--pulled all my hair out to train that stuff

nAmKcAz · 2024-11-20T21:05:01Z

@batu These are reasonable questions, but not really about this PR. This PR takes the pre-existing QLearning code and makes it work as a bot segment, that's it; no new thoughts or ideas about how we implement QLearning. For deeper analysis / review of the original QLearning bot code please refer back to this PR from Sasha #245

batu · 2024-11-20T21:15:45Z

@RG-nAmKcAz, yes, for sure. That's why I approved it as the relevant parts look solid.

However, if we are going to be investing more energy into this approach, we need to answer these questions. I can move these questions to some other relevant page

nAmKcAz · 2024-11-21T13:10:57Z

Merging this as is to get us a starting framework for learning/reward-model segments in place. Iteration on which methods/models/etc will continue in future tasks.

[REG-2183] Initial QLearning Bot Segment

26a823b

nAmKcAz requested review from batu and vontell November 20, 2024 13:55

nAmKcAz marked this pull request as ready for review November 20, 2024 13:56

vontell approved these changes Nov 20, 2024

View reviewed changes

nAmKcAz added 2 commits November 20, 2024 12:54

Bot Sequence restart checkpoint files

225b3bf

Fix cleanup of restart file

e2c01ae

nAmKcAz requested a review from vontell November 20, 2024 18:30

batu approved these changes Nov 20, 2024

View reviewed changes

nAmKcAz merged commit e0ef3fe into main Nov 21, 2024
2 checks passed

nAmKcAz deleted the zack/q-learning-segment branch November 21, 2024 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REG-2183] Initial QLearning Bot Segment #356

[REG-2183] Initial QLearning Bot Segment #356

nAmKcAz commented Nov 18, 2024 •

edited

Loading

nAmKcAz commented Nov 18, 2024

batu commented Nov 19, 2024 •

edited

Loading

nAmKcAz commented Nov 19, 2024 •

edited

Loading

nAmKcAz commented Nov 20, 2024

vontell left a comment

vontell Nov 20, 2024

nAmKcAz Nov 20, 2024

nAmKcAz Nov 20, 2024

vontell Nov 20, 2024

nAmKcAz Nov 20, 2024

nAmKcAz Nov 20, 2024

vontell commented Nov 20, 2024

batu commented Nov 20, 2024

batu commented Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu Nov 20, 2024

batu commented Nov 20, 2024

batu commented Nov 20, 2024

nAmKcAz commented Nov 20, 2024

batu commented Nov 20, 2024

nAmKcAz commented Nov 21, 2024

[REG-2183] Initial QLearning Bot Segment #356

[REG-2183] Initial QLearning Bot Segment #356

Conversation

nAmKcAz commented Nov 18, 2024 • edited Loading

nAmKcAz commented Nov 18, 2024

batu commented Nov 19, 2024 • edited Loading

nAmKcAz commented Nov 19, 2024 • edited Loading

nAmKcAz commented Nov 20, 2024

vontell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vontell commented Nov 20, 2024

batu commented Nov 20, 2024

batu commented Nov 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

batu commented Nov 20, 2024

batu commented Nov 20, 2024

nAmKcAz commented Nov 20, 2024

batu commented Nov 20, 2024

nAmKcAz commented Nov 21, 2024

nAmKcAz commented Nov 18, 2024 •

edited

Loading

batu commented Nov 19, 2024 •

edited

Loading

nAmKcAz commented Nov 19, 2024 •

edited

Loading