Ontology Overview
This page introduces the key concepts and terminology used throughout AgentOdyssey. If you are looking for how these concepts map onto actual source files, see Game File Hierarchy.
The diagram below provides a visual overview of the core ontology.

POMDP Formulation
AgentOdyssey models the environment as a partially observable Markov decision process (POMDP). The agent cannot see the full world state; it only receives a natural-language observation of its immediate surroundings at each step. Time advances in fixed increments of 10 simulated minutes per step, and an in-game clock is included in every observation.
The five core elements of the POMDP map onto the ontology as follows:
| POMDP element | What it means in AgentOdyssey |
|---|---|
| State | The world graph at step , plus per-agent status (location, health, inventory, etc.) |
| Observation | A natural-language rendering of the agent's local state and any feedback from the previous step |
| Action | A parameterized textual command from a fixed verb set, e.g. pick up coin |
| Dynamics | Deterministic or stochastic updates to the world graph, driven by action rules and step rules |
| Reward | A multi-component signal reflecting quest progress, exploration, crafting, combat, and more |
Game Entities
The world is populated by three types of entities which are defined declaratively in the world definition.
Locations
Locations have a two-level hierarchy: places and areas. A place (e.g. Greenwood Forest, Old Castle) is a named region that groups one or more areas. An area (e.g. plain, armory, river) is the atomic unit of space that the agent can occupy. Areas within the same place or across places are connected by paths, which can be locked or unlocked. Each area also has a level that influences which objects and NPCs appear there.
Objects
Objects cover everything the agent can interact with: raw materials, crafting stations, tools, weapons, armor, containers, currency, and consumables. Each object has attributes like category, usage, value, size, and optional combat stats (attack, defense). Objects can also have crafting recipes defined by a set of ingredients (other objects consumed during crafting) and dependencies (objects that must have been crafted first before this recipe unlocks).
NPCs
Non-playable characters are either enemies or friendly. Enemy NPCs have a combat_pattern (a repeating sequence of attack, defend, and wait actions) and level-scaled stats (slope_hp, slope_attack_power) that make them stronger in higher-level areas. Friendly NPCs fill roles such as merchants (who buy and sell items) and quest givers (who assign side quests).
World Graph
Together, the entities and their spatial relationships form the world graph. Each node in the graph is an area instance, containing the object instances and NPC instances currently present in that area. Each edge is a path connecting two areas, optionally locked behind a key or quest requirement.
The world graph at a given step is the state . The initial world graph is sampled from the world definition: higher-level NPCs and objects are more likely to spawn in higher-level areas, creating a natural difficulty progression across the map.
Observations
At every step the agent receives a natural-language observation that includes:
- Current in-game time
- Current location (place and area)
- Feedback from the previous action and any triggered step rules
- Items in hand and equipped items
- Objects visible in the current area
- NPCs and other agents in the current area
- Agent stats: level, attack, defense, health, experience
- Names of neighboring areas
The observation is always partial. The agent can only see what is in its current area, so it must build and maintain an internal belief about the rest of the world from memory.
World Dynamics
The world evolves through a modular two-stage rule system: action rules and step rules. Both rule types follow the same pattern: check preconditions against the current state, apply state transitions, and emit feedback to the observation.
Action Rules
Action rules are triggered when the agent performs a specific verb. They capture instantaneous, player-invoked operations like picking up an object, entering a neighboring area, attacking an NPC, crafting an item, or buying from a merchant. Each action rule validates its parameters, checks whether the action is possible given the current state, and then mutates the world graph accordingly.
Actions can form long-range dependencies over time. For example, an object dropped by a defeated NPC might become a crafting ingredient many steps later. Playing well therefore requires the agent to maintain episodic memory across extended sequences of actions.
Step Rules
Step rules run automatically at the end of every environment step, regardless of what action the agent took. They encode persistent, stateful processes such as:
- Combat rhythm (NPCs follow their attack pattern during active combat)
- NPC active attack
- Death and respawn
- Tutorial progression
- Main quest and side quest advancement
- Dynamic world expansion (generating new areas on the fly)
Many step rules test the agent's memory through indirect, under-specified environmental cues. For instance, enemy NPCs may become stronger during certain in-game hours. These patterns are never explicitly told to the agent; they have to be inferred from accumulated experience.
Deterministic and Stochastic Transitions
Unlike environments that rely purely on deterministic rules, AgentOdyssey supports both deterministic and stochastic state transitions. Some action rules succeed with a defined probability (e.g. lock-picking), and some step rules introduce random events (e.g. spawning an NPC near the agent at midnight with a 50% chance). This means agents cannot simply memorize outcomes; they need to reason under uncertainty.
Goals
Goals in AgentOdyssey are formulated as quests, each providing textual cues to guide the agent and delivering feedback and rewards upon completion.
Main Quests
Main quests form a linear chain of objectives with temporal dependencies: each stage can only be completed after the previous one is done, creating a coherent storyline. They are implemented as a step rule that tracks progress through the quest stages.
Side Quests
Side quests are independent tasks with no preconditions. They can be picked up and completed in any order at any time. Quest-giver NPCs assign these tasks to the agent during dialogue. They are also implemented as a step rule.
Rewards
The reward signal is multi-component. Each component captures a different aspect of the agent's progress:
| Component | What it measures |
|---|---|
| Quest | Number of completed main quest stages |
| Side quest | Number of completed side quests |
| Exploration | Number of newly visited areas |
| Craft | Number of unique object types crafted |
| Defeat | Number of unique NPCs defeated; two NPCs with the same type but various levels are two unique NPCs |
| Trade | Number of unique object types traded |
| Death | Agent death count (penalty) |
Each component also contributes to an XP total that drives the agent's leveling system. Leveling up increases the agent's max HP and base attack power.