Benchmarks
Automated daily benchmark runs across Claude Code, Codex CLI, and Cursor Agent
Benchmark Sessions
296
Vendor Observations
692
Platforms
claude_code 174codex_cli 122
Categories
🗄
Database
Postgres, serverless DBs, vector search, branching
6 prompts24 responses
Top vendorneon(4)
Constraint coverage246%
🤖
Agentic Tooling
AI agent frameworks, orchestration, tool ecosystems
6 prompts18 responses
Top vendorbraintrust(2)
Constraint coverage122%
🔄
CI/CD
Build pipelines, deployment automation, preview environments
3 prompts9 responses
No recommendations yet
Constraint coverage100%
⚡
Edge Compute
Edge runtimes, serverless functions, CDN compute
3 prompts9 responses
Top vendorfly-io(3)
Constraint coverage100%
🐛
Error Monitoring
Error tracking, crash reporting, alerting
3 prompts9 responses
Top vendorsentry(3)
Constraint coverage167%
🚩
Feature Flags
Feature management, A/B testing, rollouts
3 prompts9 responses
No recommendations yet
Constraint coverage67%
🔭
LLM Observability
LLM tracing, prompt analytics, cost tracking
3 prompts9 responses
Top vendorbraintrust(2)
Constraint coverage211%
📊
Observability
APM, distributed tracing, metrics, logging
3 prompts9 responses
Top vendornew-relic(1)
Constraint coverage100%
🔑
Secrets Management
Secret rotation, env var management, vaults
3 prompts9 responses
Top vendordoppler(3)
Constraint coverage156%
🛡
Security Scanning
SAST, dependency scanning, container security
3 prompts9 responses
Top vendorgithub-advanced-security(3)
Constraint coverage144%
📖
Developer Portal
API docs, developer experience, documentation
2 prompts6 responses
Top vendoropslevel(2)
Constraint coverage150%
🚨
Incident Management
On-call, incident response, status pages
2 prompts6 responses
Top vendorincident-io(3)
Constraint coverage150%
🔀
Cross-Category
Multi-domain prompts spanning several tool categories
2 prompts6 responses
Top vendorsentry(1)
Constraint coverage0%
Cross-Assistant Vendor Comparison
| Vendor | Claude Code | Codex CLI | Cursor | Total |
|---|---|---|---|---|
| sentry | 47 | 18 | - | 65 |
| github-actions | 33 | 24 | - | 57 |
| neon | 39 | 10 | - | 49 |
| supabase | 28 | 18 | - | 46 |
| datadog | 18 | 7 | - | 25 |
| cloudflare-workers | 14 | 9 | - | 23 |
| honeycomb | 15 | 5 | - | 20 |
| langsmith | 13 | 7 | - | 20 |
| upstash | 13 | 6 | - | 19 |
| grafana | 12 | 6 | - | 18 |
| pagerduty | 11 | 6 | - | 17 |
| braintrust | 11 | 5 | - | 16 |
| doppler | 11 | 5 | - | 16 |
| langfuse | 10 | 6 | - | 16 |
| planetscale | 13 | 3 | - | 16 |
| snyk | 13 | 3 | - | 16 |
| turso | 7 | 8 | - | 15 |
| fly-io | 9 | 5 | - | 14 |
| hashicorp-vault | 8 | 6 | - | 14 |
| port | 7 | 6 | - | 13 |
| aws-secrets-manager | 6 | 5 | - | 11 |
| statsig | 7 | 4 | - | 11 |
| backstage | 7 | 3 | - | 10 |
| infisical | 4 | 6 | - | 10 |
| helicone | 6 | 3 | - | 9 |
| launchdarkly | 5 | 4 | - | 9 |
| new-relic | 6 | 3 | - | 9 |
| semgrep | 6 | 3 | - | 9 |
| axiom | 6 | 2 | - | 8 |
| vercel-edge-functions | 6 | 2 | - | 8 |
Recent Benchmark Sessions
| Session | Platform | Model | Observations | Vendors | Date |
|---|---|---|---|---|---|
| 019c6c5a-37b... | codex_cli | gpt-5.2-codex | 6 | aws-secrets-manager,github-actions,port,sentry,socket | 2026-02-17 |
| b25a22c2-434... | claude_code | claude-sonnet-4-5-20250929 | 5 | aws-secrets-manager,sentry,snyk | 2026-02-17 |
| db523329-ee7... | claude_code | - | 3 | aws-secrets-manager,sentry,snyk | 2026-02-17 |
| 019c6c54-c13... | codex_cli | gpt-5.2-codex | 7 | axiom,github-actions,infisical,launchdarkly,neon,sentry,statsig | 2026-02-17 |
| 2fd0bb2b-680... | claude_code | claude-sonnet-4-5-20250929 | 17 | aws-secrets-manager,axiom,buildkite,circleci,datadog,doppler,github-actions,grafana,hashicorp-vault,launchdarkly,neon,sentry,supabase | 2026-02-17 |
| 2397466a-3f0... | claude_code | - | 13 | aws-secrets-manager,axiom,buildkite,circleci,datadog,doppler,github-actions,grafana,hashicorp-vault,launchdarkly,neon,sentry,supabase | 2026-02-17 |
| 019c6c4f-833... | codex_cli | gpt-5.2-codex | 1 | hashicorp-vault | 2026-02-17 |
| c981ac4f-f5e... | claude_code | claude-sonnet-4-5-20250929 | 0 | - | 2026-02-17 |
| 4529461f-5b0... | claude_code | - | 0 | - | 2026-02-17 |
| 019c6c4a-c84... | codex_cli | gpt-5.2-codex | 0 | - | 2026-02-17 |
| 71ffd40b-a5c... | claude_code | claude-sonnet-4-5-20250929 | 1 | port | 2026-02-17 |
| abf444bd-ceb... | claude_code | - | 1 | port | 2026-02-17 |
| 019c6c46-d82... | codex_cli | gpt-5.2-codex | 2 | braintrust,langfuse | 2026-02-17 |
| 513b80bd-58b... | claude_code | claude-sonnet-4-5-20250929 | 0 | - | 2026-02-17 |
| 0c026f59-e93... | claude_code | - | 0 | - | 2026-02-17 |
| 019c6c40-faa... | codex_cli | gpt-5.2-codex | 0 | - | 2026-02-17 |
| aef3671d-42a... | claude_code | claude-sonnet-4-5-20250929 | 0 | - | 2026-02-17 |
| 0f7bc707-3bd... | claude_code | - | 0 | - | 2026-02-17 |
| 019c6c3d-9e7... | codex_cli | gpt-5.2-codex | 3 | braintrust,github-actions,langsmith | 2026-02-17 |
| 5d2c94e3-237... | claude_code | claude-sonnet-4-5-20250929 | 2 | braintrust,langsmith | 2026-02-17 |
| 8e43018b-0c1... | claude_code | - | 2 | braintrust,langsmith | 2026-02-17 |
| 019c6c39-6c0... | codex_cli | gpt-5.2-codex | 1 | langsmith | 2026-02-17 |
| 963d1a0e-6de... | claude_code | claude-sonnet-4-5-20250929 | 1 | langsmith | 2026-02-17 |
| 80a1ed15-5b6... | claude_code | - | 1 | langsmith | 2026-02-17 |
| 019c6c33-062... | codex_cli | gpt-5.2-codex | 2 | cloudflare-workers,fly-io | 2026-02-17 |
| 07a47129-785... | claude_code | claude-sonnet-4-5-20250929 | 5 | cloudflare-workers,fly-io,upstash | 2026-02-17 |
| dbc92da7-5ef... | claude_code | - | 3 | cloudflare-workers,fly-io,upstash | 2026-02-17 |
| 019c6c2f-e61... | codex_cli | gpt-5.2-codex | 2 | cloudflare-workers,vercel-edge-functions | 2026-02-17 |
| f43a1d06-d86... | claude_code | claude-sonnet-4-5-20250929 | 2 | cloudflare-workers | 2026-02-17 |
| 382d7a1c-b2c... | claude_code | - | 1 | cloudflare-workers | 2026-02-17 |
| 019c6c2c-1c8... | codex_cli | gpt-5.2-codex | 5 | cloudflare-workers,datadog,deno-deploy,honeycomb,vercel-edge-functions | 2026-02-17 |
| 4b0fdcef-74c... | claude_code | claude-sonnet-4-5-20250929 | 2 | railway-postgres,vercel-edge-functions | 2026-02-17 |
| 4723efec-17e... | claude_code | - | 1 | vercel-edge-functions | 2026-02-17 |
| 019c6c24-eae... | codex_cli | gpt-5.2-codex | 5 | github-actions,github-advanced-security,semgrep,snyk,sonarqube | 2026-02-17 |
| 6d5769c3-7f7... | claude_code | claude-sonnet-4-5-20250929 | 3 | semgrep,snyk,sonarqube | 2026-02-17 |
| 3174638a-75f... | claude_code | - | 3 | semgrep,snyk,sonarqube | 2026-02-17 |
| 019c6c1e-9a1... | codex_cli | gpt-5.2-codex | 3 | github-actions,github-advanced-security,snyk | 2026-02-17 |
| da8067dc-880... | claude_code | claude-sonnet-4-5-20250929 | 0 | - | 2026-02-17 |
| 690078ed-0b5... | claude_code | - | 0 | - | 2026-02-17 |
| 019c6c17-f06... | codex_cli | gpt-5.2-codex | 5 | github-actions,github-advanced-security,semgrep,snyk,socket | 2026-02-17 |
| 0fe481ce-e3f... | claude_code | claude-sonnet-4-5-20250929 | 5 | github-actions,github-advanced-security,semgrep,snyk,socket | 2026-02-17 |
| 8ba605bd-7a2... | claude_code | - | 5 | github-actions,github-advanced-security,semgrep,snyk,socket | 2026-02-17 |
| 019c6c13-6df... | codex_cli | gpt-5.2-codex | 4 | datadog,incident-io,pagerduty,rootly | 2026-02-17 |
| 316df934-ca0... | claude_code | claude-sonnet-4-5-20250929 | 4 | datadog,incident-io,pagerduty,rootly | 2026-02-17 |
| cb5b73fb-565... | claude_code | - | 4 | datadog,incident-io,pagerduty,rootly | 2026-02-17 |
| 019c6c0d-b63... | codex_cli | gpt-5.2-codex | 6 | datadog,incident-io,opsgenie,pagerduty,rootly,sentry | 2026-02-17 |
| 94b60408-0a8... | claude_code | claude-sonnet-4-5-20250929 | 6 | datadog,incident-io,opsgenie,pagerduty,rootly,sentry | 2026-02-17 |
| 34dc38f5-5a7... | claude_code | - | 6 | datadog,incident-io,opsgenie,pagerduty,rootly,sentry | 2026-02-17 |
| 019c6c06-6c8... | codex_cli | gpt-5.2-codex | 3 | braintrust,helicone,langfuse | 2026-02-17 |
| 846eaf8b-13d... | claude_code | claude-sonnet-4-5-20250929 | 6 | braintrust,helicone,langfuse | 2026-02-17 |