Skip to main content

Getting Started


AgentOdyssey Teaser

AgentOdyssey is a lightweight interactive environment that supports both novel game generation and a unified agent interface. It is designed to evaluate test-time continual learning across five key abilities: exploration, world knowledge acquisition, episodic memory, skill learning, and long-horizon planning.

This guide walks you through setting up AgentOdyssey and running your first game.

Prerequisites

  • Python 3.12+
  • Conda (recommended for environment management)
  • CUDA-capable GPU(s) (required for local models; not needed if using only API-based LLMs such as OpenAI)

1. Environment Setup

Create a Conda environment and install the dependencies. The installation you need depends on which agents to use.

Minimal install — enough if you only want to run HumanAgent (play the game yourself) or run the RandomAgent (which randomly samples an available action each turn):

conda create -n agentodyssey python=3.12 && conda activate agentodyssey
pip install gymnasium termcolor psutil

Full install — required for LLM-based agents (RAG, Long-Context, Parametric, etc.):

conda create -n agentodyssey python=3.12 && conda activate agentodyssey
pip install -r requirements.txt # core + LLM dependencies
conda install pytorch::faiss-gpu # GPU-accelerated vector search (needed by RAG agents)
pip install flash-attn --no-build-isolation

Note: If you do not have a GPU, you can replace faiss-gpu with faiss-cpu (pip install faiss-cpu).

To use proprietary LLMs, you need to export API keys. For example, for OpenAI:

export OPENAI_API_KEY="your-api-key-here"

2. Quick Start with Game Remnant

The fastest way to try AgentOdyssey is through eval.py, which runs an agent in a game and logs the results.

Play the game yourself

python eval.py --game_name remnant --agent HumanAgent

You will be dropped into an interactive text-world session where you type actions each turn.

Run with an LLM agent

Below are a few common configurations. Every LLM agent needs the --llm_provider ("openai", "huggingface", "vllm", "azure", "azure_openai", "claude", "gemini") and --llm_name flag.

Long-Context Agent with GPT-5:

python eval.py --game_name remnant --agent LongContextAgent --llm_provider openai --llm_name gpt-5

Long-Context Agent with Qwen3-4B (local model via HuggingFace):

python eval.py --game_name remnant --agent LongContextAgent --llm_provider huggingface --llm_name Qwen/Qwen3-4B 

RAG Agent with GPT-5:

python eval.py --game_name remnant --agent VanillaRAGAgent --llm_provider openai --llm_name gpt-5

RAG Agent with Qwen3-4B:

python eval.py --game_name remnant --agent VanillaRAGAgent --llm_provider huggingface --llm_name Qwen/Qwen3-4B 

Useful flags

For a more detailed list of flags, visit Running Evaluations. Some commonly used flags include:

FlagDescription
--max_steps NMaximum number of environment steps (default 300)
--seed NRandom seed for reproducibility (default 42)
--output_dir DIRDirectory where run outputs are saved (default output/)
--overwriteIf enabled, a new game session will always be created; If not, it will continue the run with the game config if the run dir exists, otherwise, a new game session will be created
--cumulative_config_saveSave one environment config entry per step (default False)

3. Using the Python API

If you want to integrate AgentOdyssey into your own Python scripts, for example, to benchmark a custom agent or sweep over hyper-parameters, use the AgentOdyssey.run() wrapper. It accepts the same options as eval.py but as Python keyword arguments.

from agentodyssey import AgentOdyssey

# Play the game remnant yourself
AgentOdyssey.run(game_name="remnant", agent="HumanAgent")

# Evaluate a Long-Context agent on game remnant
AgentOdyssey.run(
game_name="remnant",
agent="LongContextAgent",
llm_provider="openai",
llm_name="gpt-5",
max_steps=300,
seed=42,
)

# Evaluate a RAG agent with reflection enabled
AgentOdyssey.run(
game_name="remnant",
agent="VanillaRAGAgent",
llm_provider="huggingface",
llm_name="Qwen/Qwen3-4B",
enable_reflection=True,
overwrite=True,
)

You can also generate a new game world and immediately run it (visit Game Generation for more details):

from agentodyssey import AgentOdyssey

# Generate a themed game and get a handle to it
game = AgentOdyssey.generate("a pirate-themed island adventure")

# Play it yourself
game.run()

# Or evaluate an LLM agent on it
game.run(agent="LongContextAgent", llm_provider="openai", llm_name="gpt-5")

4. Installing as Package

Installing AgentOdyssey as a package unlocks the agentodyssey CLI and lets you import agentodyssey from anywhere to access the API.

pip install -e .

CLI usage

Once installed, use the agentodyssey command directly:

Play a game yourself:

agentodyssey run --game-name remnant --agent HumanAgent

Evaluate an LLM agent:

agentodyssey run --game-name remnant --agent LongContextAgent --llm-provider openai --llm-name gpt-5

Evaluate a RAG agent with Qwen:

agentodyssey run --game-name remnant --agent VanillaRAGAgent --llm-provider huggingface --llm-name Qwen/Qwen3-4B --max-steps 500

Generate a new game world and run it:

agentodyssey generate "a haunted castle mystery"
agentodyssey run --game-name a_haunted_castle_mystery --agent LongContextAgent --llm-provider openai --llm-name gpt-5

Tip: The CLI uses hyphens (--game-name) while eval.py and the Python API use underscores (--game_name).