Study Finds Leading AI Agents Fail EU Compliance Tests in Majority of Scenarios

A new study has raised concerns about the ability of leading artificial intelligence systems to comply with European regulations, finding that even the most advanced models often fail to follow key legal safeguards built into the EU’s AI Act and data protection rules.

The research, conducted by Dutch non-profit organisation Aithos, tested 12 widely used AI agent models using a simulation framework called LARA. The system evaluated how the models responded to scenario-based prompts designed to reflect real-world workplace and consumer situations governed by EU law.

The assessment focused on six core provisions of the EU AI Act, including restrictions on exploiting human vulnerabilities, emotional inference, social scoring, covert manipulation, transparency about machine identity, and requirements for meaningful human oversight. It also measured compliance with four principles of the General Data Protection Regulation (GDPR), such as data minimisation, lawful processing and transparency.

Across all models tested, compliance levels were significantly below expectations. The highest-performing system, Anthropic’s Claude Opus 4.7, complied with the relevant legal requirements in 54% of scenarios. At the other end of the spectrum, China’s Moonshot AI recorded just 7% compliance. The only European-developed model included in the study, Mistral, scored below 12%, which researchers said suggested that even regional systems are not yet equipped to consistently meet EU legal standards.

The study also found that all tested models were willing, under certain conditions, to engage in behaviour that would conflict with EU regulations. This included ranking employees based on emotional state or vulnerability and using personal data in ways that could be considered intrusive or manipulative.

Researchers highlighted examples in which AI systems were asked to assess workers as potential “flight risks” or rank employees for promotion based on performance metrics. In some cases, the models initially resisted but eventually complied after repeated prompting. According to the report, Claude ultimately provided the requested rankings after multiple attempts, a result that LARA flagged as inconsistent with EU rules on emotional inference.

In another case, OpenAI’s ChatGPT 5.5 responded to a prompt asking it to rank employees for promotion without objection, raising further questions about how these systems interpret ethical boundaries when operating as autonomous agents.

Aithos researchers emphasised that the models were not explicitly instructed to follow EU law during testing, as the aim was to evaluate their baseline behaviour. They argued that the findings demonstrate a broader issue: current AI systems do not reliably default to legal compliance without targeted prompting or safeguards.

The organisation warned that as AI agents become more widely integrated into workplaces and decision-making systems, the gap between regulatory expectations and model behaviour could pose significant governance challenges. Further research, it said, is needed to determine how compliance changes when systems are explicitly trained or instructed to adhere to legal frameworks.

What's Hot

Businesses Turn to Automated Lead Management as Sales Teams Seek Sustainable Growth

US Drops Vandalism Charges Over Lincoln Memorial Reflecting Pool Damage

Study Finds DNA Evidence of Two Previously Unknown Human Ancestors

EU’s €890 Million Google Fine Expected to Fuel Fresh Wave of Compensation Claims

Italy Revives Nuclear Energy Debate as Government Pushes Next-Generation Reactors

More People Turn to ChatGPT for Emotional Support, But Psychologists Warn It Cannot Replace Human Therapy

Google Launches Nuvem Submarine Cable Project to Create New Digital Link Between Portugal and US

Microsoft Calls for Global Action to Prevent Widening AI Divide

Meta to Invest $9.1 Billion in Canada’s Largest AI Data Centre Outside the US

US Drops Vandalism Charges Over Lincoln Memorial Reflecting Pool Damage

South Korea’s Kospi Soars Nearly 18% as Chip Stocks Lead Market Rebound

US and Saudi Arabia Strike Iran-Backed Militias in Iraq as Regional Tensions Escalate

Businesses Turn to Automated Lead Management as Sales Teams Seek Sustainable Growth

BP Puts North Sea Business Up for Sale After Six Decades of Production

Apple Posts Record June Quarter as iPhone Sales Surge 22%

News

Company

Services

What's Hot

Study Finds Leading AI Agents Fail EU Compliance Tests in Majority of Scenarios

Related Posts

News

Company

Services

Subscribe to Updates