Skip to content

ARTEMIS Logo

ARTEMIS Agents

Adaptive Reasoning Through Evaluation of Multi-agent Intelligent Systems

A production-ready framework for structured multi-agent debates with adaptive evaluation, causal reasoning, and built-in safety monitoring.


What is ARTEMIS?

ARTEMIS is an open-source implementation of the Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems — a framework designed to improve complex decision-making through structured debates between AI agents.

Unlike general-purpose multi-agent frameworks, ARTEMIS is purpose-built for debate-driven decision-making with:

  • Hierarchical Argument Generation (H-L-DAG): Structured, context-aware argument synthesis at strategic, tactical, and operational levels
  • Adaptive Evaluation with Causal Reasoning (L-AE-CR): Dynamic criteria weighting with causal analysis
  • Jury Scoring Mechanism: Fair, multi-perspective evaluation of arguments
  • Ethical Alignment: Built-in ethical considerations in both generation and evaluation
  • Safety Monitoring: Real-time detection of sandbagging, deception, and manipulation

Why ARTEMIS?

Feature AutoGen CrewAI CAMEL ARTEMIS
Multi-agent debates Basic Basic 2-3 agents N agents
Structured argument generation No No No H-L-DAG
Causal reasoning No No No L-AE-CR
Adaptive evaluation No No No Dynamic weights
Ethical alignment No No No Built-in
Sandbagging detection No No No Metacognition
Reasoning model support Limited Limited No o1/R1 native
MCP server mode No No No Yes

Quick Example

from artemis import Debate, Agent

# Create debate agents
agents = [
    Agent(name="Proponent", role="Argues in favor", model="gpt-4o"),
    Agent(name="Opponent", role="Argues against", model="gpt-4o"),
]

# Run the debate
debate = Debate(
    topic="Should AI systems be given legal personhood?",
    agents=agents,
    rounds=3
)

result = await debate.run()

print(f"Verdict: {result.verdict.decision}")
print(f"Confidence: {result.verdict.confidence:.0%}")

Key Features

Structured Debates

ARTEMIS implements a rigorous debate structure with:

  • Opening statements from each agent
  • Multiple argumentation rounds with rebuttals
  • Evidence-based reasoning with causal links
  • Jury deliberation for fair verdicts

Safety First

Built-in monitors detect problematic AI behavior:

  • Sandbagging Detection: Identifies when agents deliberately underperform
  • Deception Monitoring: Catches misleading arguments or manipulation
  • Behavior Tracking: Monitors for unexpected behavioral drift
  • Ethics Guard: Ensures debates stay within ethical bounds

Framework Integrations

Use ARTEMIS with your existing tools:

  • LangChain: As a structured tool
  • LangGraph: As a workflow node
  • CrewAI: As a crew tool
  • MCP: As a universal server

Research Foundation

ARTEMIS is based on peer-reviewed research:

Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems in Debate-driven Decision-making Mitra, S. (2025). Technical Disclosure Commons. Read the paper

Benchmarks

We've run ARTEMIS against AutoGen, CrewAI, and CAMEL across 60 structured debates. See the benchmark results and analysis in the README.

Get Started

Ready to dive in? Check out the Installation Guide or jump straight to the Quick Start.

License

ARTEMIS is released under the Apache License 2.0. See LICENSE for details.