Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REG-2183] Initial QLearning Bot Segment #356

Merged
merged 3 commits into from
Nov 21, 2024
Merged

Conversation

nAmKcAz
Copy link
Collaborator

@nAmKcAz nAmKcAz commented Nov 18, 2024

  • Updates Q Learning logic to allow it to be used as a bot segment

  • We can merge this, but I wouldn't publicize it yet. We have some research to do here on how best to utilize/train this in the context of bot sequences. Training is currently 'ick' as we don't have a systemic way to restart the whole game and resume the same sequence for training.

  • See also https://github.com/Regression-Games/RGBossRoom/pull/80


Find the pull request instructions here

Every reviewer and the owner of the PR should consider these points in their request (feel free to copy this checklist so you can fill it out yourself in the overall PR comment)

  • The code is extensible and backward compatible
  • New public interfaces are extensible and open to backward compatibility in the future
  • If preparing to remove a field in the future (i.e. this PR removes an argument), the argument stays but is no longer functional, and attaches a deprecation warning. A linear task is also created to track this deletion task.
  • Non-critical or potentially modifiable arguments are optional
  • Breaking changes and the approach to handling them have been verified with the team (in the Linear task, design doc, or PR itself)
  • The code is easy to read
  • Unit tests are added for expected and edge cases
  • Integration tests are added for expected and edge cases
  • Functions and classes are documented
  • Migrations for both up and down operations are completed
  • A documentation PR is created and being reviewed for anything in this PR that requires knowledge to use
  • Implications on other dependent code (i.e. sample games and sample bots) is considered, mentioned, and properly handled
  • Style changes and other non-blocking changes are marked as non-blocking from reviewers

@nAmKcAz
Copy link
Collaborator Author

nAmKcAz commented Nov 18, 2024

@vontell This 'might' be ready for review as an 'experiment', but certainly not ready for release. How do you want to handle review/discussions on this ?

I don't really want to keep it in a side branch as the SDK moves so fast that keeping that up to date / working would be tedious / wouldn't happen.

@batu
Copy link
Contributor

batu commented Nov 19, 2024

I would love to integrate the eval pipeline to this, so we can see which reward approaches works better.

My vote would be to review/merge this soon. Would it be too nonsensical to put it under gg.regression.unity.experimental.q_learning? We can maybe remove the runespawn from gg.regression.unity.experimental.runespawn namespace, and just keep gg.regression.unity.experimental where all experimental work goes?

@nAmKcAz
Copy link
Collaborator Author

nAmKcAz commented Nov 19, 2024

@batu, currently this is pretty much useless for training and evaluation since resetting the game for a whole bot sequence isn't really a thing

I could allow resetting similar to how the qlearningbot.cs does it where it force reloads the starting scene for each episode.. but this doesnn't work well for many games (inc: bossroom and all customer games so far)

@nAmKcAz nAmKcAz requested review from batu and vontell November 20, 2024 13:55
@nAmKcAz nAmKcAz marked this pull request as ready for review November 20, 2024 13:56
@nAmKcAz
Copy link
Collaborator Author

nAmKcAz commented Nov 20, 2024

As for experimental... to me our entire SDK is 'experimental' so I really don't fancy package/project names to try to hide things. I'm not going to die on this hill either though

To me they are all hidden unless we document them (As noted by our customers not being able to understand the product).

Copy link
Collaborator

@vontell vontell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to get this in and try it out myself at some point. @RG-nAmKcAz do you have any preference on what to tackle next here? Also, I see the boss room sample, but it would be good to have a quick message, instruction, or video on how to start this in boss room (even though the training process is bad as you mention). I just personally want to try it! I am guessing I can run that sequence in the other PR to use a trained model, but wasn't sure how to train myself.

public Dictionary<RewardType, int> rewardTypeRatios;

// training options
// TODO: Should we add an option that causes a 'game restart' AND restarts the sequence from the beginning when 'learning = true' ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to see what the Unity ML Agents tutorials do for this. I suspect that maybe they always use special environments, but they might have some tricks. From what I can see, it looks like they train using a build of the game rather than the editor... maybe their harness in Python just repeatedly starts and quits that executable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, they 'assume' you're starting a runtime that IS the whole episode... that's why I say this thing is done, but not 'ready' as training is very tedious currently

you have to start up the game and get it to the right spot, then run your training segment episode

then manually reset the environment and do it again..

until you have your model 'trained'

then you can use the 'real' segment pointing to the same model file, but with training disabled

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vontell .. you will DEFINITELY need to train a model yourself... i got pissed after about an hour of babysitting the restarts for training and seeing no meaningful progress in the results

thus my comments of we need a better way to 'train' using sequences

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a big selling point - this is just an assumption, but the fact that you need to train an agent in a special environment and not really the game due to the issues you ran into is probably a big blocker... if I am a game developer, it's bad enough that I need to learn how to use the ML tool, and now I need to go and make a whole new scene to train in, that isn't even my full game? Sequences to get them into the game could be a cool selling point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's actually what i'm prototyping today per my slack message

i'm adding an option to the restart segment to allow restarting of the same sequence after restart

so I should be able to make a sequence for bossroom that

goes through the menus into the game, runs the qlearning for X time, restarts the game
... and restarts the same sequence.. effectively looping until the user stops it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vontell see the new updates functionally here, and practically in https://github.com/Regression-Games/RGBossRoom/pull/80

The big change is that restart segments can now tell the game to restart that same sequence from the beginning after restart.

{
  "name": "Restart Game and Restart the Sequence after",
  "keyFrameCriteria": [
    {"type":"ActionComplete","transient":true,"data":{}}
  ],
  "botAction":{
    "type":"RestartGame",
    "data":{
      "restartSequenceAfterRestart": true
    }
  }
}

@vontell
Copy link
Collaborator

vontell commented Nov 20, 2024

Also, sincerest apologies for not seeing your comment earlier a few days ago that tagged me, looks like my GitHub notifications got cleared... yes getting it in sooner rather than later is preferred, happy to treat it as experimental and not publicly documented as we play around with it and decide on next steps.

@nAmKcAz nAmKcAz requested a review from vontell November 20, 2024 18:30
@batu
Copy link
Contributor

batu commented Nov 20, 2024

I am approving this PR as the parts that are segment side seem good.

However, after looking through the details I am confident that we will not be able to train a model to do useful work with this specific State/Action representation

Here is an example state/action entry.

"Cinemachine.CinemachineBrain,Cinemachine.CinemachineComposer,Cinemachine.CinemachineFreeLook,Cinemachine.CinemachineOrbitalTransposer,Cinemachine.CinemachinePipeline,Cinemachine.CinemachineTargetGroup,Cinemachine.CinemachineVirtualCamera,EditorChildSceneLoader,Gameplay.RegressionGames.RGBossRoom.RGAttackObjectAction,RegressionGames.RGIconPulse,RegressionGames.RGTextPulse,RegressionGames.StateRecorder.BotSegments.BotSegmentsPlaybackController,RegressionGames.StateRecorder.KeyboardInputActionObserver,RegressionGames.StateRecorder.LoggingObserver,RegressionGames.StateRecorder.MouseInputActionObserver,RegressionGames.StateRecorder.ProfilerObserver,RegressionGames.StateRecorder.ReplayToolbarManager,RegressionGames.StateRecorder.ScreenRecorder,RegressionGames.StateRecorder.ScreenshotCapture,RegressionGames.StateRecorder.TransformObjectFinder,RegressionGames.StateRecorder.Types.RGExcludeFromState,RegressionGames.StateRecorder.VirtualMouseCursor,RGBotManager,RGBreakableObjectState,RGEnemyState,RGFollowObjectAction,RGPlayerState,RGSequenceManager,TMPro.TextMeshProUGUI,Unity.BossRoom.ApplicationLifecycle.ApplicationController,Unity.BossRoom.Audio.AudioMixerConfigurator,Unity.BossRoom.Audio.ClientMusicPlayer,Unity.BossRoom.CameraUtils.CameraController,Unity.BossRoom.ConnectionManagement.ConnectionManager,Unity.BossRoom.DebugCheats.DebugCheatsManager,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorFootstepSounds,Unity.BossRoom.Gameplay.GameplayObjects.AnimationCallbacks.AnimatorTriggeredSpecialFX,Unity.BossRoom.Gameplay.GameplayObjects.Breakable,Unity.BossRoom.Gameplay.GameplayObjects.Character.CharacterSwap,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientAvatarGuidHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ClientPlayerAvatar,Unity.BossRoom.Gameplay.GameplayObjects.Character.NetworkAvatarGuidState,Unity.BossRoom.Gameplay.GameplayObjects.Character.PhysicsWrapper,Unity.BossRoom.Gameplay.GameplayObjects.Character.PlayerServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerAnimationHandler,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacter,Unity.BossRoom.Gameplay.GameplayObjects.Character.ServerCharacterMovement,Unity.BossRoom.Gameplay.GameplayObjects.DamageReceiver,Unity.BossRoom.Gameplay.GameplayObjects.EnemyPortal,Unity.BossRoom.Gameplay.GameplayObjects.GameDataSource,Unity.BossRoom.Gameplay.GameplayObjects.NetworkHealthState,Unity.BossRoom.Gameplay.GameplayObjects.NetworkLifeState,Unity.BossRoom.Gameplay.GameplayObjects.PersistentPlayer,Unity.BossRoom.Gameplay.GameplayObjects.PickUpState,Unity.BossRoom.Gameplay.GameplayObjects.PublishMessageOnLifeChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerDisplacerOnParentChange,Unity.BossRoom.Gameplay.GameplayObjects.ServerWaveSpawner,Unity.BossRoom.Gameplay.GameState.ServerBossRoomState,Unity.BossRoom.Gameplay.RegressionGames.RGBossRoom.RGPerformSkillAction,Unity.BossRoom.Gameplay.UI.ClientBossRoomLoadingScreen,Unity.BossRoom.Gameplay.UI.ClientClickFeedback,Unity.BossRoom.Gameplay.UI.ConnectionAnimation,Unity.BossRoom.Gameplay.UI.ConnectionStatusMessageUIManager,Unity.BossRoom.Gameplay.UI.HeroActionBar,Unity.BossRoom.Gameplay.UI.PartyHUD,Unity.BossRoom.Gameplay.UI.PopupManager,Unity.BossRoom.Gameplay.UI.UIHUDButton,Unity.BossRoom.Gameplay.UI.UIMessageFeed,Unity.BossRoom.Gameplay.UI.UIMessageSlot,Unity.BossRoom.Gameplay.UI.UIName,Unity.BossRoom.Gameplay.UI.UISettingsCanvas,Unity.BossRoom.Gameplay.UI.UIStateDisplay,Unity.BossRoom.Gameplay.UI.UIStateDisplayHandler,Unity.BossRoom.Gameplay.UI.UITooltipDetector,Unity.BossRoom.Gameplay.UI.UnityServicesUIHandler,Unity.BossRoom.Gameplay.UserInput.ClientInputSender,Unity.BossRoom.Infrastructure.NetworkObjectPool,Unity.BossRoom.Infrastructure.UpdateRunner,Unity.BossRoom.Navigation.NavigationSystem,Unity.BossRoom.Utils.Editor.NetworkLatencyWarning,Unity.BossRoom.Utils.Editor.NetworkOverlay,Unity.BossRoom.Utils.EnableOrDisableColliderOnAwake,Unity.BossRoom.Utils.NetworkNameState,Unity.BossRoom.Utils.NetworkStats,Unity.BossRoom.VisualEffects.RandomizedLight,Unity.BossRoom.VisualEffects.ScrollingMaterialUVs,Unity.BossRoom.VisualEffects.SpecialFXGraphic,Unity.Multiplayer.Samples.BossRoom.Client.ClientPickUpPotEffects,Unity.Multiplayer.Samples.Utilities.DontDestroyOnLoad,Unity.Multiplayer.Samples.Utilities.LoadingProgressManager,Unity.Multiplayer.Samples.Utilities.NetcodeHooks,Unity.Multiplayer.Samples.Utilities.NetStatsMonitorCustomization,Unity.Multiplayer.Samples.Utilities.NetworkedLoadingProgressTracker,Unity.Multiplayer.Samples.Utilities.SceneLoaderWrapper,Unity.Multiplayer.Samples.Utilities.ServerAdditiveSceneLoader,Unity.Multiplayer.Tools.NetStatsMonitor.RuntimeNetStatsMonitor,Unity.Netcode.Components.NetworkAnimator,Unity.Netcode.Components.NetworkTransform,Unity.Netcode.NetworkManager,Unity.Netcode.NetworkObject,Unity.Netcode.Transports.UTP.UnityTransport,UnityEngine.AI.NavMeshAgent,UnityEngine.AI.NavMeshModifier,UnityEngine.AI.NavMeshSurface,UnityEngine.Animations.PositionConstraint,UnityEngine.Animator,UnityEngine.AudioListener,UnityEngine.AudioSource,UnityEngine.BoxCollider,UnityEngine.Camera,UnityEngine.Canvas,UnityEngine.CanvasGroup,UnityEngine.CanvasRenderer,UnityEngine.CapsuleCollider,UnityEngine.EventSystems.EventSystem,UnityEngine.InputSystem.UI.InputSystemUIInputModule,UnityEngine.Light,UnityEngine.LightProbeGroup,UnityEngine.LODGroup,UnityEngine.MeshCollider,UnityEngine.MeshFilter,UnityEngine.MeshRenderer,UnityEngine.ParticleSystem,UnityEngine.ParticleSystemRenderer,UnityEngine.RectTransform,UnityEngine.ReflectionProbe,UnityEngine.Rendering.DebugUpdater,UnityEngine.Rendering.Universal.UniversalAdditionalCameraData,UnityEngine.Rendering.Universal.UniversalAdditionalLightData,UnityEngine.Rendering.Volume,UnityEngine.Rigidbody,UnityEngine.SkinnedMeshRenderer,UnityEngine.Transform,UnityEngine.UI.Button,UnityEngine.UI.CanvasScaler,UnityEngine.UI.ContentSizeFitter,UnityEngine.UI.GraphicRaycaster,UnityEngine.UI.GridLayoutGroup,UnityEngine.UI.Image,UnityEngine.UI.Mask,UnityEngine.UI.ScrollRect,UnityEngine.UI.Slider,UnityEngine.UI.Text,UnityEngine.UI.VerticalLayoutGroup,UnityEngine.UIElements.PanelEventHandler,UnityEngine.UIElements.PanelRaycaster,VContainer.Unity.LifetimeScope:Digit4,Digit7": {
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit2Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit2\"}\n}False:PlayerAvatar0": 0.0413901322,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UI.UITooltipDetector/UITooltipDetector.cs/Mouse.current.position\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UI.UITooltipDetector, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"NON_UI\"\n}(0.75, 0.75):Button0": 0.00573281571,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MouseButtonAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.leftButton\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"mouseButtonFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"0\"}\n}True:PlayerAvatar0": 0.004450281,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.25, 0.75):PlayerAvatar0": 0.00634076959,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.MousePositionAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Mouse.current.position/Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Physics.RaycastNonAlloc(ray, k_CachedHit, k_MouseInputRaycastDistance, m_ActionLayerMask)\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_VECTOR2\",\"minValueX\":0,\"minValueY\":0,\"maxValueX\":1,\"maxValueY\":1},\n\"positionType\":\"COLLIDER_3D\",\n\"layerMasks\":[\n{\"funcType\":\"TYPE_MEMBER_ACCESS\",\"data\":\"{\\\"MemberAccesses\\\":[{\\\"MemberType\\\":4,\\\"DeclaringType\\\":\\\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\\\",\\\"MemberName\\\":\\\"m_ActionLayerMask\\\"}]}\"}\n]\n}(0.75, 0.75):PlayerAvatar0": 0.00193079631,
      "{\n\"actionTypeName\":\"RegressionGames.ActionManager.Actions.InputSystemKeyAction, RegressionGames, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"paths\":[\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender/ClientInputSender.cs/Keyboard.current.digit6Key\"],\n\"objectTypeName\":\"Unity.BossRoom.Gameplay.UserInput.ClientInputSender, Unity.BossRoom.Gameplay, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null\",\n\"parameterRange\":{\"type\":\"RANGE_BOOL\",\"minValue\":0,\"maxValue\":1},\n\"keyFunc\":{\"funcType\":\"TYPE_CONSTANT\",\"data\":\"Digit6\"}\n}True:PlayerAvatar0": 0.000657624158
    }

The keys are verbose, but more specifically, our state space is the existence of components, without the actual values associated with them. This state representation isn't useful to learn how to explore as it stands.

@batu
Copy link
Contributor

batu commented Nov 20, 2024

Without access to any type of memory, I am skeptical our current rewards will converge to a model that explores generally. However, I need to do a bit more step by step hand calculation to see whether that is the case.

But the intuition is that our training doesn't result in a general explorer, but rather an explorer that reacts to its past exploration policy aka, given two corridors, our agent doesn't learn to explore both, but learns to pick the corridor that was explored least in the training process at test time.

public class QAction : IEquatable<QAction>
{
public RGGameAction Action;
public object ParamValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this ParamValue? Without the type it is hard understand, can you add a comment?

/// The default implementation concatenates all the active component type names
/// and includes the current keyboard/mouse button state as well.
/// </summary>
protected virtual string GetCurrentState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that the current state (for the q table) is the names of all components + action names.

This suggests to me that we are not including the values of such components, such position, health etc, but just the existence of Transform/Health?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded on this below

}
} else if (act.ParameterRange is RGContinuousValueRange contRange)
{
var ranges = contRange.Discretize(4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be made variable so we can tweak it. The way we discretize actions will affect things drastically.

I would also go for a default value that is odd, (5?). A common action range is [-1, 1] and that would get discretized to [[-1.0 to -0.5], [-0.5 to 0.0], [0.0 to 0.5], [0.5 to 1.0]]. Depending on how we do the comparison, 0 and (-)0.5 will fall into a same bucket, which they shouldn't.

actionSpace.Add(new QAction(act, param));
}
}
return actionSpace;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be useful to log the size of the action space, as this is also another metric that affects the convergence of the algorithm

private string _lastActionKey;
private List<RGActionInput> _lastInputs = new();
private float? _lastActionTime;
private float _epsilon;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we use the epsilon value? Is this for epsilon_greedy decay?

}

// Update Q-table with experience
foreach (var exp in _experienceBuf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that we default to 64 steps for the size of the experienceBuffer, that is pretty small. However, I recognize that increasing the size can have performance issues. To combat that we can update our q_table not every frame but once every n frames.

Currently with the defaults of ActionInterval = 0.05f; and bufferSize of 64, we are effectively limiting ourselves to updating from 3.2 seconds worth of data. This might make convergence really difficult. (will circle back to this idea after reading to rewards)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded on this below

@@ -5,46 +5,37 @@ namespace RegressionGames.GenericBots.Experimental.Rewards
{
/// <summary>
/// Generic exploration reward module.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good start, we just need to be mindful about genre/reward style matching in the future. For example, in a third person camera perspective, spinning the camera around and moving a little bit will generate a good amount of reward. But yeah, this is why we will try different rewards.

{
numVisits = 0;
return 0.0f; // no main camera, don't use this reward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect this to return 0 silently. If there is no camera, and I am using camera based reward, I should be somehow notified. Otherwise I will just get a random agent without any other notice.

public IRewardModule RewardModule = new CameraPositionRewardModule();

private List<QAction> _actionSpace = new();
private Dictionary<string, Dictionary<string, float>> _qTable = new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to document what the key/values are

@batu
Copy link
Contributor

batu commented Nov 20, 2024

Oh and, this is for future, but if we end up productionizing this, I think a similar "save the latest bot in the project" would be smart (over saving it in the AppData

@batu
Copy link
Contributor

batu commented Nov 20, 2024

If this is something we are going to be pushing forward, we CAN NOT, make meaningful progress on bossroom. We would need dedicated toy test environments that are incredibly simple but validate the approach.

I now wear a wig because I used rl in my phd--pulled all my hair out to train that stuff

@nAmKcAz
Copy link
Collaborator Author

nAmKcAz commented Nov 20, 2024

@batu These are reasonable questions, but not really about this PR. This PR takes the pre-existing QLearning code and makes it work as a bot segment, that's it; no new thoughts or ideas about how we implement QLearning. For deeper analysis / review of the original QLearning bot code please refer back to this PR from Sasha #245

@batu
Copy link
Contributor

batu commented Nov 20, 2024

@RG-nAmKcAz, yes, for sure. That's why I approved it as the relevant parts look solid.

However, if we are going to be investing more energy into this approach, we need to answer these questions. I can move these questions to some other relevant page

@nAmKcAz
Copy link
Collaborator Author

nAmKcAz commented Nov 21, 2024

Merging this as is to get us a starting framework for learning/reward-model segments in place. Iteration on which methods/models/etc will continue in future tasks.

@nAmKcAz nAmKcAz merged commit e0ef3fe into main Nov 21, 2024
2 checks passed
@nAmKcAz nAmKcAz deleted the zack/q-learning-segment branch November 21, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants