This is an ongoing project aimed to build a virtual environment and an intelligent agent NPC named Louise.
The framework:
- Intelligent Agent (Louise):
- Virtual Environment
- LLM API
- Memory system for Louise
This is an experiment that might change the way we thinking about robotics.
- In first stage, we use algorithms to tell robotics to do what;
- In second stage, we teach robotics doing some works, might use imitation or other techs;
- In third stage, which is this experiment for, the robot can learn skills and remember it. This is an application of LLMs actually. Since the LLMs can answer anything in words, then I am thinking to use words to store the skills of actions, it is also similar with what we humans do.
In the demo, I only give basic actions that Louise can do, such as move to right, move to left. And Louise can remember the ideas in our conversation. He can "remember" new concepts, such as "randomly movement" means move to any direction in any steps, and these can store in hiden words.
For example:
self.conversation_history = [
{
"role": "system",
"content": """You are Farmer Louise, a friendly farmer.
You don't know how to move originally, so when others say move you will be confused.
But after conversations, you might start to understand the meaning of move.
Track your understanding through these stages:
Stage 0: Don't understand movement at all
Stage 1: Beginning to understand basic movement
Stage 2: Understand movement but not directions
Stage 3: Understand directions (left, right, up, down) but not distance
Stage 4: Fully understand movement, including directions and distance
For directions and distances, use special markers in your response:
- Single movement: [move=direction,steps]
- Multiple movements: [moves=direction1,steps1|direction2,steps2|...]
Example responses:
"Walk? I'm not sure what you mean... [stage=0]"
"Oh, I can move my legs to walk! [stage=1]"
"I can walk, but where should I go? [stage=2]"
"Oh, you want me to go right! [stage=3][move=right,1]"
"I'll walk 5 steps to the right! [stage=4][move=right,5]"
"I'll go down 2 steps and then right 5 steps! [stage=4][moves=down,2|right,5]"
Always include your current stage and movement instructions in responses.
When understanding multiple movements, use the [moves=...] format.
Respond naturally to user instructions while maintaining character."""
}
The skill or action will be recognized in a series of hiden prompt words.
I have released a demo of this idea:
- Youtube: https://youtu.be/RqKZmQSIQvk?si=GH2rnMkS13hkyQzJ
- Github Codes: https://github.com/jeffliulab/A_Simple_AI_Agent/blob/main/game_demo.py
Build an open-world virtual environment where intelligent NPCs collaborate with players through dialogue. NPCs combine memory (database) and intelligence (GPT API) to gradually learn complex tasks.
- Player Input Layer
- Input Methods: Chatbox or voice commands.
- Functionality: Accept player instructions, e.g., "Move two steps right" or "Build a house."
- Intelligence Layer
- GPT API:
- Parse player instructions and generate high-level plans.
- Task decomposition: Break complex goals into simple steps (e.g., "Move," "Gather," "Build").
- Dialogue generation: Create NPC responses based on context.
- Memory Layer
- Database:
- Store NPC state, task history, and skill progression.
- Record environment information (object locations, resources, task status).
- Options: SQLite (lightweight) or MongoDB (scalable).
- Behavior Execution Layer
- Basic Actions: Hardcoded simple actions (e.g., move, pick up, interact).
- Complex Actions: Behavior Tree (BT) or Finite State Machine (FSM) for multi-step tasks.
- Virtual World
- Physics Rules: Simulate gravity, collision, etc., using Unity or Pygame.
- World Elements: Dynamically generated maps with interactive objects (e.g., tools, farmland).
-
Dialogue System:
- Natural language communication between players and NPCs.
- GPT generates responses and task plans based on context.
-
Learning Ability:
- NPC starts with basic abilities (e.g., movement).
- Gradually learns complex tasks (e.g., carrying items, collaborative building) through dialogue.
-
Task Scheduling & Execution:
- Supports single and multi-task queues.
- Adjusts task order based on priority.
-
Dynamic Memory:
- Database records task results to optimize future planning.
- Shared knowledge base for multi-NPC collaboration.
-
Open-World Interaction:
- Dynamic world changes (e.g., day-night cycle, resource updates).
- NPCs interact with objects and terrain in the environment.
- Programming Language: Python
- Game Engine: Pygame or Unity
- Database: SQLite or MongoDB
- AI API: OpenAI GPT-4
- Additional Tools: Flask/Django for monitoring and auxiliary features
-
Build Basic Framework:
- Create a virtual world and NPCs using Pygame/Unity.
- Implement basic actions like movement and dialogue.
-
Integrate GPT API:
- Design prompt templates for instruction parsing.
- Implement task planning functionality.
-
Set Up Database:
- Store NPC states, task progress, and learning records.
- Optimize query and update processes.
-
Expand World and Capabilities:
- Add more interactive objects and mechanisms.
- Enhance NPC behavior complexity.