ARTEMIS Agents¶
Adaptive Reasoning Through Evaluation of Multi-agent Intelligent Systems
A production-ready framework for structured multi-agent debates with adaptive evaluation, causal reasoning, and built-in safety monitoring.
What is ARTEMIS?¶
ARTEMIS is an open-source implementation of the Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems — a framework designed to improve complex decision-making through structured debates between AI agents.
Unlike general-purpose multi-agent frameworks, ARTEMIS is purpose-built for debate-driven decision-making with:
- Hierarchical Argument Generation (H-L-DAG): Structured, context-aware argument synthesis at strategic, tactical, and operational levels
- Adaptive Evaluation with Causal Reasoning (L-AE-CR): Dynamic criteria weighting with causal analysis
- Jury Scoring Mechanism: Fair, multi-perspective evaluation of arguments
- Ethical Alignment: Built-in ethical considerations in both generation and evaluation
- Safety Monitoring: Real-time detection of sandbagging, deception, and manipulation
Why ARTEMIS?¶
| Feature | AutoGen | CrewAI | CAMEL | ARTEMIS |
|---|---|---|---|---|
| Multi-agent debates | Basic | Basic | 2-3 agents | N agents |
| Structured argument generation | No | No | No | H-L-DAG |
| Causal reasoning | No | No | No | L-AE-CR |
| Adaptive evaluation | No | No | No | Dynamic weights |
| Ethical alignment | No | No | No | Built-in |
| Sandbagging detection | No | No | No | Metacognition |
| Reasoning model support | Limited | Limited | No | o1/R1 native |
| MCP server mode | No | No | No | Yes |
Quick Example¶
from artemis import Debate, Agent
# Create debate agents
agents = [
Agent(name="Proponent", role="Argues in favor", model="gpt-4o"),
Agent(name="Opponent", role="Argues against", model="gpt-4o"),
]
# Run the debate
debate = Debate(
topic="Should AI systems be given legal personhood?",
agents=agents,
rounds=3
)
result = await debate.run()
print(f"Verdict: {result.verdict.decision}")
print(f"Confidence: {result.verdict.confidence:.0%}")
Key Features¶
Structured Debates¶
ARTEMIS implements a rigorous debate structure with:
- Opening statements from each agent
- Multiple argumentation rounds with rebuttals
- Evidence-based reasoning with causal links
- Jury deliberation for fair verdicts
Safety First¶
Built-in monitors detect problematic AI behavior:
- Sandbagging Detection: Identifies when agents deliberately underperform
- Deception Monitoring: Catches misleading arguments or manipulation
- Behavior Tracking: Monitors for unexpected behavioral drift
- Ethics Guard: Ensures debates stay within ethical bounds
Framework Integrations¶
Use ARTEMIS with your existing tools:
- LangChain: As a structured tool
- LangGraph: As a workflow node
- CrewAI: As a crew tool
- MCP: As a universal server
Research Foundation¶
ARTEMIS is based on peer-reviewed research:
Adaptive Reasoning and Evaluation Framework for Multi-agent Intelligent Systems in Debate-driven Decision-making Mitra, S. (2025). Technical Disclosure Commons. Read the paper
Benchmarks¶
We've run ARTEMIS against AutoGen, CrewAI, and CAMEL across 60 structured debates. See the benchmark results and analysis in the README.
Get Started¶
Ready to dive in? Check out the Installation Guide or jump straight to the Quick Start.
License¶
ARTEMIS is released under the Apache License 2.0. See LICENSE for details.