Hundreds of contractors on a Meta project posed as teenagers to test how ChatGPT, Gemini and Character.AI handle suicide, drugs and sex, WIRED found.
AI benchmark cheating has been theorized as an inevitable consequence of training capable optimizers against fixed metrics. With OpenAI's GPT-5.6 Sol, the theory arrived in full view. The nonprofit ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results