Agentuity Documentation

Model Arena

Model Arena compares model outputs with a second model acting as judge. This demo runs the same prompt through multiple providers via the AI Gateway, then scores the answers against criteria. Use it for early prompt and model evaluation before you turn the same idea into repeatable tests.

Competitors

Anthropic/claude-opus-4-8Google/gemini-3.5-flash

Judge

Groq/openai/gpt-oss-120b

Prompt

A robot discovers it can dream

Tonesci-fi

Reference Code

import { AIGatewayClient } from "@agentuity/aigateway";

const gateway = new AIGatewayClient();
const prompt = "Write a haiku about coding.";
const OPENAI_MODEL = "openai/gpt-5.4-mini";
const ANTHROPIC_MODEL = "anthropic/claude-opus-4-8";
const JUDGE_MODEL = "openai/gpt-5.4-mini";

const [openaiResult, anthropicResult] = await Promise.all([
  gateway.completeText({
    model: OPENAI_MODEL,
    messages: [{ role: "user", content: prompt }],
  }),
  gateway.completeText({
    model: ANTHROPIC_MODEL,
    messages: [{ role: "user", content: prompt }],
  }),
]);

if (!openaiResult.hasText || !anthropicResult.hasText) {
  throw new Error("One of the candidate models returned no text.");
}

const { data } = await gateway.completeStructured({
  model: JUDGE_MODEL,
  messages: [
    {
      role: "user",
      content:
        "Pick the better answer.\n\nAnthropic:\n" +
        anthropicResult.text +
        "\n\nOpenAI:\n" +
        openaiResult.text,
    },
  ],
  response_schema: {
    name: "model_judgment",
    schema: {
      type: "object",
      properties: {
        winner: { type: "string", enum: ["openai", "anthropic"] },
        reasoning: { type: "string" },
        scores: {
          type: "object",
          properties: {
            clarity: { type: "number" },
            originality: { type: "number" },
          },
          required: ["clarity", "originality"],
          additionalProperties: false,
        },
      },
      required: ["winner", "reasoning", "scores"],
      additionalProperties: false,
    },
  },
});

export { data as judgment };

Ready

Output will appear here...