Beyond Benchmark Maxxing: Measuring Open Source Models as Real-World Agents ultravox.ai 1 points by zkoch 12 hours ago