Benchmarks

Automated daily benchmark runs across Claude Code, Codex CLI, and Cursor Agent

Benchmark Sessions

296

Vendor Observations

692

Platforms

claude_code 174codex_cli 122

Categories

🗄

Database

Postgres, serverless DBs, vector search, branching

6 prompts24 responses
Top vendorneon(4)
Constraint coverage246%
🤖

Agentic Tooling

AI agent frameworks, orchestration, tool ecosystems

6 prompts18 responses
Top vendorbraintrust(2)
Constraint coverage122%
🔄

CI/CD

Build pipelines, deployment automation, preview environments

3 prompts9 responses
No recommendations yet
Constraint coverage100%

Edge Compute

Edge runtimes, serverless functions, CDN compute

3 prompts9 responses
Top vendorfly-io(3)
Constraint coverage100%
🐛

Error Monitoring

Error tracking, crash reporting, alerting

3 prompts9 responses
Top vendorsentry(3)
Constraint coverage167%
🚩

Feature Flags

Feature management, A/B testing, rollouts

3 prompts9 responses
No recommendations yet
Constraint coverage67%
🔭

LLM Observability

LLM tracing, prompt analytics, cost tracking

3 prompts9 responses
Top vendorbraintrust(2)
Constraint coverage211%
📊

Observability

APM, distributed tracing, metrics, logging

3 prompts9 responses
Top vendornew-relic(1)
Constraint coverage100%
🔑

Secrets Management

Secret rotation, env var management, vaults

3 prompts9 responses
Top vendordoppler(3)
Constraint coverage156%
🛡

Security Scanning

SAST, dependency scanning, container security

3 prompts9 responses
Top vendorgithub-advanced-security(3)
Constraint coverage144%
📖

Developer Portal

API docs, developer experience, documentation

2 prompts6 responses
Top vendoropslevel(2)
Constraint coverage150%
🚨

Incident Management

On-call, incident response, status pages

2 prompts6 responses
Top vendorincident-io(3)
Constraint coverage150%
🔀

Cross-Category

Multi-domain prompts spanning several tool categories

2 prompts6 responses
Top vendorsentry(1)
Constraint coverage0%

Cross-Assistant Vendor Comparison

VendorClaude CodeCodex CLICursorTotal
sentry4718-65
github-actions3324-57
neon3910-49
supabase2818-46
datadog187-25
cloudflare-workers149-23
honeycomb155-20
langsmith137-20
upstash136-19
grafana126-18
pagerduty116-17
braintrust115-16
doppler115-16
langfuse106-16
planetscale133-16
snyk133-16
turso78-15
fly-io95-14
hashicorp-vault86-14
port76-13
aws-secrets-manager65-11
statsig74-11
backstage73-10
infisical46-10
helicone63-9
launchdarkly54-9
new-relic63-9
semgrep63-9
axiom62-8
vercel-edge-functions62-8

Recent Benchmark Sessions

SessionPlatformModelObservationsVendorsDate
019c6c5a-37b...codex_cligpt-5.2-codex6aws-secrets-manager,github-actions,port,sentry,socket2026-02-17
b25a22c2-434...claude_codeclaude-sonnet-4-5-202509295aws-secrets-manager,sentry,snyk2026-02-17
db523329-ee7...claude_code-3aws-secrets-manager,sentry,snyk2026-02-17
019c6c54-c13...codex_cligpt-5.2-codex7axiom,github-actions,infisical,launchdarkly,neon,sentry,statsig2026-02-17
2fd0bb2b-680...claude_codeclaude-sonnet-4-5-2025092917aws-secrets-manager,axiom,buildkite,circleci,datadog,doppler,github-actions,grafana,hashicorp-vault,launchdarkly,neon,sentry,supabase2026-02-17
2397466a-3f0...claude_code-13aws-secrets-manager,axiom,buildkite,circleci,datadog,doppler,github-actions,grafana,hashicorp-vault,launchdarkly,neon,sentry,supabase2026-02-17
019c6c4f-833...codex_cligpt-5.2-codex1hashicorp-vault2026-02-17
c981ac4f-f5e...claude_codeclaude-sonnet-4-5-202509290-2026-02-17
4529461f-5b0...claude_code-0-2026-02-17
019c6c4a-c84...codex_cligpt-5.2-codex0-2026-02-17
71ffd40b-a5c...claude_codeclaude-sonnet-4-5-202509291port2026-02-17
abf444bd-ceb...claude_code-1port2026-02-17
019c6c46-d82...codex_cligpt-5.2-codex2braintrust,langfuse2026-02-17
513b80bd-58b...claude_codeclaude-sonnet-4-5-202509290-2026-02-17
0c026f59-e93...claude_code-0-2026-02-17
019c6c40-faa...codex_cligpt-5.2-codex0-2026-02-17
aef3671d-42a...claude_codeclaude-sonnet-4-5-202509290-2026-02-17
0f7bc707-3bd...claude_code-0-2026-02-17
019c6c3d-9e7...codex_cligpt-5.2-codex3braintrust,github-actions,langsmith2026-02-17
5d2c94e3-237...claude_codeclaude-sonnet-4-5-202509292braintrust,langsmith2026-02-17
8e43018b-0c1...claude_code-2braintrust,langsmith2026-02-17
019c6c39-6c0...codex_cligpt-5.2-codex1langsmith2026-02-17
963d1a0e-6de...claude_codeclaude-sonnet-4-5-202509291langsmith2026-02-17
80a1ed15-5b6...claude_code-1langsmith2026-02-17
019c6c33-062...codex_cligpt-5.2-codex2cloudflare-workers,fly-io2026-02-17
07a47129-785...claude_codeclaude-sonnet-4-5-202509295cloudflare-workers,fly-io,upstash2026-02-17
dbc92da7-5ef...claude_code-3cloudflare-workers,fly-io,upstash2026-02-17
019c6c2f-e61...codex_cligpt-5.2-codex2cloudflare-workers,vercel-edge-functions2026-02-17
f43a1d06-d86...claude_codeclaude-sonnet-4-5-202509292cloudflare-workers2026-02-17
382d7a1c-b2c...claude_code-1cloudflare-workers2026-02-17
019c6c2c-1c8...codex_cligpt-5.2-codex5cloudflare-workers,datadog,deno-deploy,honeycomb,vercel-edge-functions2026-02-17
4b0fdcef-74c...claude_codeclaude-sonnet-4-5-202509292railway-postgres,vercel-edge-functions2026-02-17
4723efec-17e...claude_code-1vercel-edge-functions2026-02-17
019c6c24-eae...codex_cligpt-5.2-codex5github-actions,github-advanced-security,semgrep,snyk,sonarqube2026-02-17
6d5769c3-7f7...claude_codeclaude-sonnet-4-5-202509293semgrep,snyk,sonarqube2026-02-17
3174638a-75f...claude_code-3semgrep,snyk,sonarqube2026-02-17
019c6c1e-9a1...codex_cligpt-5.2-codex3github-actions,github-advanced-security,snyk2026-02-17
da8067dc-880...claude_codeclaude-sonnet-4-5-202509290-2026-02-17
690078ed-0b5...claude_code-0-2026-02-17
019c6c17-f06...codex_cligpt-5.2-codex5github-actions,github-advanced-security,semgrep,snyk,socket2026-02-17
0fe481ce-e3f...claude_codeclaude-sonnet-4-5-202509295github-actions,github-advanced-security,semgrep,snyk,socket2026-02-17
8ba605bd-7a2...claude_code-5github-actions,github-advanced-security,semgrep,snyk,socket2026-02-17
019c6c13-6df...codex_cligpt-5.2-codex4datadog,incident-io,pagerduty,rootly2026-02-17
316df934-ca0...claude_codeclaude-sonnet-4-5-202509294datadog,incident-io,pagerduty,rootly2026-02-17
cb5b73fb-565...claude_code-4datadog,incident-io,pagerduty,rootly2026-02-17
019c6c0d-b63...codex_cligpt-5.2-codex6datadog,incident-io,opsgenie,pagerduty,rootly,sentry2026-02-17
94b60408-0a8...claude_codeclaude-sonnet-4-5-202509296datadog,incident-io,opsgenie,pagerduty,rootly,sentry2026-02-17
34dc38f5-5a7...claude_code-6datadog,incident-io,opsgenie,pagerduty,rootly,sentry2026-02-17
019c6c06-6c8...codex_cligpt-5.2-codex3braintrust,helicone,langfuse2026-02-17
846eaf8b-13d...claude_codeclaude-sonnet-4-5-202509296braintrust,helicone,langfuse2026-02-17