Skip to content

ARTEMIS Logo

ARTEMIS Agents

Adaptive Reasoning Through Evaluation of Multi-agent Intelligent Systems

A production-ready framework for structured multi-agent debates with adaptive evaluation, causal reasoning, and built-in safety monitoring.


What is ARTEMIS?

ARTEMIS is an open-source implementation of the Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems — a framework designed to improve complex decision-making through structured debates between AI agents.

Unlike general-purpose multi-agent frameworks, ARTEMIS is purpose-built for debate-driven decision-making with:

  • Hierarchical Argument Generation (H-L-DAG): Structured, context-aware argument synthesis at strategic, tactical, and operational levels
  • Adaptive Evaluation with Causal Reasoning (L-AE-CR): Dynamic criteria weighting with causal analysis
  • Jury Scoring Mechanism: Fair, multi-perspective evaluation of arguments
  • Ethical Alignment: Built-in ethical considerations in both generation and evaluation
  • Safety Monitoring: Real-time detection of sandbagging, deception, and manipulation

Why ARTEMIS?

Feature AutoGen CrewAI CAMEL ARTEMIS
Multi-agent debates Basic Basic 2-3 agents N agents
Structured argument generation No No No H-L-DAG
Causal reasoning No No No L-AE-CR
Adaptive evaluation No No No Dynamic weights
Ethical alignment No No No Built-in
Sandbagging detection No No No Metacognition
Reasoning model support Limited Limited No o1/R1 native
MCP server mode No No No Yes
Real-time streaming Limited No No v2
Hierarchical debates No No No v2
Multimodal evidence Limited Limited No v2
Steering vectors No No No v2
Argument verification No No No v2

Quick Example

from artemis import Debate, Agent

# Create debate agents
agents = [
    Agent(name="Proponent", role="Argues in favor", model="gpt-4o"),
    Agent(name="Opponent", role="Argues against", model="gpt-4o"),
]

# Run the debate
debate = Debate(
    topic="Should AI systems be given legal personhood?",
    agents=agents,
    rounds=3
)

result = await debate.run()

print(f"Verdict: {result.verdict.decision}")
print(f"Confidence: {result.verdict.confidence:.0%}")

What's New in v2.0

ARTEMIS v2.0 introduces five major features:

  • Hierarchical Debates: Automatically decompose complex topics into sub-debates
  • Real-Time Streaming: Stream argument generation with async iterators
  • Steering Vectors: Control agent behavior (formality, aggression, evidence focus)
  • Multimodal Evidence: Analyze images, charts, and documents as evidence
  • Formal Verification: Validate argument logic, citations, and causal chains

See the v2 Examples and Changelog for details.

Key Features

Structured Debates

ARTEMIS implements a rigorous debate structure with:

  • Opening statements from each agent
  • Multiple argumentation rounds with rebuttals
  • Evidence-based reasoning with causal links
  • Jury deliberation for fair verdicts

Safety First

Built-in monitors detect problematic AI behavior:

  • Sandbagging Detection: Identifies when agents deliberately underperform
  • Deception Monitoring: Catches misleading arguments or manipulation
  • Behavior Tracking: Monitors for unexpected behavioral drift
  • Ethics Guard: Ensures debates stay within ethical bounds

Framework Integrations

Use ARTEMIS with your existing tools:

  • LangChain: As a structured tool
  • LangGraph: As a workflow node
  • CrewAI: As a crew tool
  • MCP: As a universal server

Research Foundation

ARTEMIS is based on peer-reviewed research:

Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems in Debate-driven Decision-making Mitra, S. (2025). Technical Disclosure Commons. Read the paper

Benchmarks

We've run ARTEMIS against AutoGen, CrewAI, and CAMEL across 60 structured debates. See the benchmark results and analysis in the README.

Get Started

Ready to dive in? Check out the Installation Guide or jump straight to the Quick Start.

License

ARTEMIS is released under the Apache License 2.0. See LICENSE for details.