Skip to content

Latest commit

 

History

History
11 lines (10 loc) · 538 Bytes

term.md

File metadata and controls

11 lines (10 loc) · 538 Bytes

번역체, 용어

Agent Policy Exploration Action value On Policy Environment Action Exploitation Discount rate Off Policy State Reward Value Discount factor T-horizon Observation Return State value MDP epoch Bellman equation Bellman Optimality equation Multi-Armed Bandit Problem Dynamic programming Offline Reinforcement Learning Backup Episode History Trajectory Model base Planning Prediction Control Actor-Critic Model Free Rollout Policy evaluation Policy iteration(improvement) Value iteration Temporal Difference Monte-Carlo