-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REG-2183] Initial QLearning Bot Segment #356
Conversation
@vontell This 'might' be ready for review as an 'experiment', but certainly not ready for release. How do you want to handle review/discussions on this ? I don't really want to keep it in a side branch as the SDK moves so fast that keeping that up to date / working would be tedious / wouldn't happen. |
I would love to integrate the eval pipeline to this, so we can see which reward approaches works better. My vote would be to review/merge this soon. Would it be too nonsensical to put it under |
@batu, currently this is pretty much useless for training and evaluation since resetting the game for a whole bot sequence isn't really a thing I could allow resetting similar to how the qlearningbot.cs does it where it force reloads the starting scene for each episode.. but this doesnn't work well for many games (inc: bossroom and all customer games so far) |
As for experimental... to me our entire SDK is 'experimental' so I really don't fancy package/project names to try to hide things. I'm not going to die on this hill either though To me they are all hidden unless we document them (As noted by our customers not being able to understand the product). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excited to get this in and try it out myself at some point. @RG-nAmKcAz do you have any preference on what to tackle next here? Also, I see the boss room sample, but it would be good to have a quick message, instruction, or video on how to start this in boss room (even though the training process is bad as you mention). I just personally want to try it! I am guessing I can run that sequence in the other PR to use a trained model, but wasn't sure how to train myself.
...y.bots/Runtime/Scripts/StateRecorder/BotSegments/Models/BotActions/QLearningBotActionData.cs
Show resolved
Hide resolved
public Dictionary<RewardType, int> rewardTypeRatios; | ||
|
||
// training options | ||
// TODO: Should we add an option that causes a 'game restart' AND restarts the sequence from the beginning when 'learning = true' ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious to see what the Unity ML Agents tutorials do for this. I suspect that maybe they always use special environments, but they might have some tricks. From what I can see, it looks like they train using a build of the game rather than the editor... maybe their harness in Python just repeatedly starts and quits that executable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, they 'assume' you're starting a runtime that IS the whole episode... that's why I say this thing is done, but not 'ready' as training is very tedious currently
you have to start up the game and get it to the right spot, then run your training segment episode
then manually reset the environment and do it again..
until you have your model 'trained'
then you can use the 'real' segment pointing to the same model file, but with training disabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vontell .. you will DEFINITELY need to train a model yourself... i got pissed after about an hour of babysitting the restarts for training and seeing no meaningful progress in the results
thus my comments of we need a better way to 'train' using sequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a big selling point - this is just an assumption, but the fact that you need to train an agent in a special environment and not really the game due to the issues you ran into is probably a big blocker... if I am a game developer, it's bad enough that I need to learn how to use the ML tool, and now I need to go and make a whole new scene to train in, that isn't even my full game? Sequences to get them into the game could be a cool selling point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's actually what i'm prototyping today per my slack message
i'm adding an option to the restart segment to allow restarting of the same sequence after restart
so I should be able to make a sequence for bossroom that
goes through the menus into the game, runs the qlearning for X time, restarts the game
... and restarts the same sequence.. effectively looping until the user stops it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vontell see the new updates functionally here, and practically in https://github.com/Regression-Games/RGBossRoom/pull/80
The big change is that restart segments can now tell the game to restart that same sequence from the beginning after restart.
{
"name": "Restart Game and Restart the Sequence after",
"keyFrameCriteria": [
{"type":"ActionComplete","transient":true,"data":{}}
],
"botAction":{
"type":"RestartGame",
"data":{
"restartSequenceAfterRestart": true
}
}
}
...y.bots/Runtime/Scripts/StateRecorder/BotSegments/Models/BotActions/QLearningBotActionData.cs
Show resolved
Hide resolved
...on.unity.bots/Runtime/Scripts/GenericBots/Experimental/Rewards/ActionCoverageRewardModule.cs
Show resolved
Hide resolved
Also, sincerest apologies for not seeing your comment earlier a few days ago that tagged me, looks like my GitHub notifications got cleared... yes getting it in sooner rather than later is preferred, happy to treat it as experimental and not publicly documented as we play around with it and decide on next steps. |
I am approving this PR as the parts that are segment side seem good. However, after looking through the details I am confident that we will not be able to train a model to do useful work with this specific State/Action representation Here is an example state/action entry. "Cinemachine.CinemachineBrain,Cinemachine.CinemachineComposer,Cinemachine.CinemachineFreeLook,Cinemachine.CinemachineOrbitalTransposer,Cinemachine.CinemachinePipeline,Cinemachine.CinemachineTargetGroup,Cinemachine.CinemachineVirtualCamera,EditorChildSceneLoader,Gameplay.RegressionGames.RGBossRoom.RGAttackObjectAction,RegressionGames.RGIconPulse,RegressionGames.RGTextPulse,RegressionGames.StateRecorder.BotSegments.BotSegmentsPlaybackController,RegressionGames.StateRecorder.KeyboardInputActionObserver,RegressionGames.StateRecorder.LoggingObserver,RegressionGames.StateRecorder.MouseInputActionObserver,RegressionGames.StateRecorder.ProfilerObserver,RegressionGames.StateRecorder.ReplayToolbarManager,RegressionGames.StateRecorder.ScreenRecorder,RegressionGames.StateRecorder.ScreenshotCapture,RegressionGames.StateRecorder.TransformObjectFinder,RegressionGames.StateRecorder.Types.RGExcludeFromState,RegressionGames.StateRecorder.VirtualMouseCursor,RGBotManager,RGBreakableObjectState,RGEnemyState,RGFollowObjectAction,RGPlayerState,RGSequenceManager,TMPro.TextMeshProUGUI,Unity.BossRoom.ApplicationLifecycle.ApplicationController,Unity.BossRoom.Audio.AudioMixerConfigurator,Unity.BossRoom.Audio.ClientMusicPlayer,Unity.BossRoom.CameraUtils.CameraController,Unity.BossRoom.ConnectionManagement.ConnectionManager,Unity.BossRoom.DebugCheats.DebugCheatsManager,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorFootstepSounds,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorTriggeredSpecialFX,Unity.BossRoom.Gameplay.GameplayObjects.Breakable,Unity.BossRoom.Gameplay.GameplayObjects.Character.CharacterSwap,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientAvatarGuidHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientPlayerAvatar,Unity.BossRoom.Gameplay.GameplayObjects.Character.NetworkAvatarGuidState,Unity.BossRoom.Gameplay.GameplayObjects.Character.PhysicsWrapper,Unity.BossRoom.Gameplay.GameplayObjects.Character.PlayerServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerAnimationHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacterMovement,Unity.BossRoom.Gameplay.GameplayObjects.DamageReceiver,Unity.BossRoom.Gameplay.GameplayObjects.EnemyPortal,Unity.BossRoom.Gameplay.GameplayObjects.GameDataSource,Unity.BossRoom.Gameplay.GameplayObjects.NetworkHealthState,Unity.BossRoom.Gameplay.GameplayObjects.NetworkLifeState,Unity.BossRoom.Gameplay.GameplayObjects.PersistentPlayer,Unity.BossRoom.Gameplay.GameplayObjects.PickUpState,Unity.BossRoom.Gameplay.GameplayObjects.PublishMessageOnLifeChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerDisplacerOnParentChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerWaveSpawner,Unity.BossRoom.Gameplay.GameState.ServerBossRoomState,Unity.BossRoom.Gameplay.RegressionGames.RGBossRoom.RGPerformSkillAction,Unity.BossRoom.Gameplay.UI.ClientBossRoomLoadingScreen,Unity.BossRoom.Gameplay.UI.ClientClickFeedback,Unity.BossRoom.Gameplay.UI.ConnectionAnimation,Unity.BossRoom.Gameplay.UI.ConnectionStatusMessageUIManager,Unity.BossRoom.Gameplay.UI.HeroActionBar,Unity.BossRoom.Gameplay.UI.PartyHUD,Unity.BossRoom.Gameplay.UI.PopupManager,Unity.BossRoom.Gameplay.UI.UIHUDButton,Unity.BossRoom.Gameplay.UI.UIMessageFeed,Unity.BossRoom.Gameplay.UI.UIMessageSlot,Unity.BossRoom.Gameplay.UI.UIName,Unity.BossRoom.Gameplay.UI.UISettingsCanvas,Unity.BossRoom.Gameplay.UI.UIStateDisplay,Unity.BossRoom.Gameplay.UI.UIStateDisplayHandler,Unity.BossRoom.Gameplay.UI.UITooltipDetector,Unity.BossRoom.Gameplay.UI.UnityServicesUIHandler,Unity.BossRoom.Gameplay.UserInput.ClientInputSender,Unity.BossRoom.Infrastructure.NetworkObjectPool,Unity.BossRoom.Infrastructure.UpdateRunner,Unity.BossRoom.Navigation.NavigationSystem,Unity.BossRoom.Utils.Editor.NetworkLatencyWarning,Unity.BossRoom.Utils.Editor.NetworkOverlay,Unity.BossRoom.Utils.EnableOrDisableColliderOnAwake,Unity.BossRoom.Utils.NetworkNameState,Unity.BossRoom.Utils.NetworkStats,Unity.BossRoom.VisualEffects.RandomizedLight,Unity.BossRoom.VisualEffects.ScrollingMaterialUVs,Unity.BossRoom.VisualEffects.SpecialFXGraphic,Unity.Multiplayer.Samples.BossRoom.Client.ClientPickUpPotEffects,Unity.Multiplayer.Samples.Utilities.DontDestroyOnLoad,Unity.Multiplayer.Samples.Utilities.LoadingProgressManager,Unity.Multiplayer.Samples.Utilities.NetcodeHooks,Unity.Multiplayer.Samples.Utilities.NetStatsMonitorCustomization,Unity.Multiplayer.Samples.Utilities.NetworkedLoadingProgressTracker,Unity.Multiplayer.Samples.Utilities.SceneLoaderWrapper,Unity.Multiplayer.Samples.Utilities.ServerAdditiveSceneLoader,Unity.Multiplayer.Tools.NetStatsMonitor.RuntimeNetStatsMonitor,Unity.Netcode.Components.NetworkAnimator,Unity.Netcode.Components.NetworkTransform,Unity.Netcode.NetworkManager,Unity.Netcode.NetworkObject,Unity.Netcode.Transports.UTP.UnityTransport,UnityEngine.AI.NavMeshAgent,UnityEngine.AI.NavMeshModifier,UnityEngine.AI.NavMeshSurface,UnityEngine.Animations.PositionConstraint,UnityEngine.Animator,UnityEngine.AudioListener,UnityEngine.AudioSource,UnityEngine.BoxCollider,UnityEngine.Camera,UnityEngine.Canvas,UnityEngine.CanvasGroup,UnityEngine.CanvasRenderer,UnityEngine.CapsuleCollider,UnityEngine.EventSystems.EventSystem,UnityEngine.InputSystem.UI.InputSystemUIInputModule,UnityEngine.Light,UnityEngine.LightProbeGroup,UnityEngine.LODGroup,UnityEngine.MeshCollider,UnityEngine.MeshFilter,UnityEngine.MeshRenderer,UnityEngine.ParticleSystem,UnityEngine.ParticleSystemRenderer,UnityEngine.RectTransform,UnityEngine.ReflectionProbe,UnityEngine.Rendering.DebugUpdater,UnityEngine.Rendering.Universal.UniversalAdditionalCameraData,UnityEngine.Rendering.Universal.UniversalAdditionalLightData,UnityEngine.Rendering.Volume,UnityEngine.Rigidbody,UnityEngine.SkinnedMeshRenderer,UnityEngine.Transform,UnityEngine.UI.Button,UnityEngine.UI.CanvasScaler,UnityEngine.UI.ContentSizeFitter,UnityEngine.UI.GraphicRaycaster,UnityEngine.UI.GridLayoutGroup,UnityEngine.UI.Image,UnityEngine.UI.Mask,UnityEngine.UI.ScrollRect,UnityEngine.UI.Slider,UnityEngine.UI.Text,UnityEngine.UI.VerticalLayoutGroup,UnityEngine.UIElements.PanelEventHandler,UnityEngine.UIElements.PanelRaycaster,VContainer.Unity.LifetimeScope:Digit4,Digit7": {
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit2Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit2\"}\n}False:PlayerAvatar0": 0.0413901322,
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UI.UITooltipDetector/UITooltipDetector.cs/Mouse.current.position\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UI.UITooltipDetector, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"NON_UI\"\n}(0.75, 0.75):Button0": 0.00573281571,
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MouseButtonAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.leftButton\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"mouseButtonFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"0\"}\n}True:PlayerAvatar0": 0.004450281,
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.25, 0.75):PlayerAvatar0": 0.00634076959,
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.75, 0.75):PlayerAvatar0": 0.00193079631,
"{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit6Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit6\"}\n}True:PlayerAvatar0": 0.000657624158
} The keys are verbose, but more specifically, our state space is the existence of components, without the actual values associated with them. This state representation isn't useful to learn how to explore as it stands. |
Without access to any type of memory, I am skeptical our current rewards will converge to a model that explores generally. However, I need to do a bit more step by step hand calculation to see whether that is the case. But the intuition is that our training doesn't result in a general explorer, but rather an explorer that reacts to its past exploration policy aka, given two corridors, our agent doesn't learn to explore both, but learns to pick the corridor that was explored least in the training process at test time. |
public class QAction : IEquatable<QAction> | ||
{ | ||
public RGGameAction Action; | ||
public object ParamValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this ParamValue? Without the type it is hard understand, can you add a comment?
/// The default implementation concatenates all the active component type names | ||
/// and includes the current keyboard/mouse button state as well. | ||
/// </summary> | ||
protected virtual string GetCurrentState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is my understanding correct that the current state (for the q table) is the names of all components + action names.
This suggests to me that we are not including the values of such components, such position, health etc, but just the existence of Transform/Health?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded on this below
} | ||
} else if (act.ParameterRange is RGContinuousValueRange contRange) | ||
{ | ||
var ranges = contRange.Discretize(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be made variable so we can tweak it. The way we discretize actions will affect things drastically.
I would also go for a default value that is odd, (5?). A common action range is [-1, 1] and that would get discretized to [[-1.0 to -0.5], [-0.5 to 0.0], [0.0 to 0.5], [0.5 to 1.0]]. Depending on how we do the comparison, 0 and (-)0.5 will fall into a same bucket, which they shouldn't.
actionSpace.Add(new QAction(act, param)); | ||
} | ||
} | ||
return actionSpace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be useful to log the size of the action space, as this is also another metric that affects the convergence of the algorithm
private string _lastActionKey; | ||
private List<RGActionInput> _lastInputs = new(); | ||
private float? _lastActionTime; | ||
private float _epsilon; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we use the epsilon value? Is this for epsilon_greedy decay?
} | ||
|
||
// Update Q-table with experience | ||
foreach (var exp in _experienceBuf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that we default to 64 steps for the size of the experienceBuffer, that is pretty small. However, I recognize that increasing the size can have performance issues. To combat that we can update our q_table not every frame but once every n frames.
Currently with the defaults of ActionInterval = 0.05f; and bufferSize of 64, we are effectively limiting ourselves to updating from 3.2 seconds worth of data. This might make convergence really difficult. (will circle back to this idea after reading to rewards)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded on this below
@@ -5,46 +5,37 @@ namespace RegressionGames.GenericBots.Experimental.Rewards | |||
{ | |||
/// <summary> | |||
/// Generic exploration reward module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good start, we just need to be mindful about genre/reward style matching in the future. For example, in a third person camera perspective, spinning the camera around and moving a little bit will generate a good amount of reward. But yeah, this is why we will try different rewards.
{ | ||
numVisits = 0; | ||
return 0.0f; // no main camera, don't use this reward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect this to return 0 silently. If there is no camera, and I am using camera based reward, I should be somehow notified. Otherwise I will just get a random agent without any other notice.
public IRewardModule RewardModule = new CameraPositionRewardModule(); | ||
|
||
private List<QAction> _actionSpace = new(); | ||
private Dictionary<string, Dictionary<string, float>> _qTable = new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to document what the key/values are
Oh and, this is for future, but if we end up productionizing this, I think a similar "save the latest bot in the project" would be smart (over saving it in the AppData |
If this is something we are going to be pushing forward, we CAN NOT, make meaningful progress on bossroom. We would need dedicated toy test environments that are incredibly simple but validate the approach. I now wear a wig because I used rl in my phd--pulled all my hair out to train that stuff |
@batu These are reasonable questions, but not really about this PR. This PR takes the pre-existing QLearning code and makes it work as a bot segment, that's it; no new thoughts or ideas about how we implement QLearning. For deeper analysis / review of the original QLearning bot code please refer back to this PR from Sasha #245 |
@RG-nAmKcAz, yes, for sure. That's why I approved it as the relevant parts look solid. However, if we are going to be investing more energy into this approach, we need to answer these questions. I can move these questions to some other relevant page |
Merging this as is to get us a starting framework for learning/reward-model segments in place. Iteration on which methods/models/etc will continue in future tasks. |
Updates Q Learning logic to allow it to be used as a bot segment
We can merge this, but I wouldn't publicize it yet. We have some research to do here on how best to utilize/train this in the context of bot sequences. Training is currently 'ick' as we don't have a systemic way to restart the whole game and resume the same sequence for training.
See also https://github.com/Regression-Games/RGBossRoom/pull/80
Find the pull request instructions here
Every reviewer and the owner of the PR should consider these points in their request (feel free to copy this checklist so you can fill it out yourself in the overall PR comment)