Scale AI

company

Verified

https://scale.com/

scale_ai

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

anisha2102 authored a paper about 5 hours ago

Detecting and Preventing Hallucinations in Large Vision Language Models

anisha2102 authored a paper about 5 hours ago

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

taesiri submitted a paper 1 day ago

Agentic Rubrics as Contextual Verifiers for SWE Agents

View all activity

Papers

Agentic Rubrics as Contextual Verifiers for SWE Agents

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

View all Papers

anisha2102

authored 2 papers about 5 hours ago

Detecting and Preventing Hallucinations in Large Vision Language Models

Paper • 2308.06394 • Published Aug 11, 2023

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Paper • 2507.17746 • Published Jul 23, 2025 • 3

taesiri

submitted a paper to Daily Papers 1 day ago

Agentic Rubrics as Contextual Verifiers for SWE Agents

Paper • 2601.04171 • Published 2 days ago • 7

bhertz

updated a dataset 21 days ago

ScaleAI/MCP-Atlas

Viewer • Updated 21 days ago • 500 • 254 • 6

agosai

updated a dataset 22 days ago

ScaleAI/audiomc

Viewer • Updated 22 days ago • 452 • 162 • 2

agosai

updated a collection 22 days ago

AudioMultiChallenge

1 item • Updated 22 days ago

bhertz

published a dataset 23 days ago

ScaleAI/MCP-Atlas

Viewer • Updated 21 days ago • 500 • 254 • 6

pmannam

published a dataset 23 days ago

ScaleAI/robotics-meerkat

Updated 23 days ago • 3

manasisharma

authored 4 papers about 2 months ago

BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

Paper • 2403.09227 • Published Mar 14, 2024 • 1

Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models

Paper • 2506.13923 • Published Jun 16, 2025

Remote Labor Index: Measuring AI Automation of Remote Work

Paper • 2510.26787 • Published Oct 30, 2025 • 5

ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents

Paper • 2511.07685 • Published Nov 10, 2025 • 9

mhr2004

authored a paper 3 months ago

Online Rubrics Elicitation from Pairwise Comparisons

Paper • 2510.07284 • Published Oct 8, 2025 • 1

jda

authored a paper 4 months ago

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21, 2025 • 21

calvincbzhang

authored a paper 4 months ago

SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

Paper • 2509.16941 • Published Sep 21, 2025 • 21

rssatscale

authored a paper 4 months ago

CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Paper • 2309.14580 • Published Sep 26, 2023 • 1

kaus-scale

authored a paper 7 months ago

FORTRESS: Frontier Risk Evaluation for National Security and Public Safety

Paper • 2506.14922 • Published Jun 17, 2025

seanshi-scale

authored a paper 12 months ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77

hugh-scale

authored a paper 12 months ago

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24, 2025 • 77

ZifanScale

authored a paper about 1 year ago

Representation Engineering: A Top-Down Approach to AI Transparency

Paper • 2310.01405 • Published Oct 2, 2023 • 7