Salesforce/GiftEvalParquet
Updated
•
9
None defined yet.
LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering
MMPersuade: A Dataset and Evaluation Framework for Multimodal Persuasion