A Warning Label on the Use of AI Safety Evaluations
Emerging research demonstrates that existing artificial intelligence (AI) pre-deployment safety evaluations frequently underestimate models’ potential for causing harm. Due to the inherent unreliability of many of these assessment tools, they should be used cautiously by policy makers and should not serve as a primary risk management strategy for AI governance frameworks.