A new study has raised concerns about the ability of leading artificial intelligence systems to comply with European regulations, finding that even the most advanced models often fail to follow key legal safeguards built into the EU’s AI Act and data protection rules.
The research, conducted by Dutch non-profit organisation Aithos, tested 12 widely used AI agent models using a simulation framework called LARA. The system evaluated how the models responded to scenario-based prompts designed to reflect real-world workplace and consumer situations governed by EU law.
The assessment focused on six core provisions of the EU AI Act, including restrictions on exploiting human vulnerabilities, emotional inference, social scoring, covert manipulation, transparency about machine identity, and requirements for meaningful human oversight. It also measured compliance with four principles of the General Data Protection Regulation (GDPR), such as data minimisation, lawful processing and transparency.
Across all models tested, compliance levels were significantly below expectations. The highest-performing system, Anthropic’s Claude Opus 4.7, complied with the relevant legal requirements in 54% of scenarios. At the other end of the spectrum, China’s Moonshot AI recorded just 7% compliance. The only European-developed model included in the study, Mistral, scored below 12%, which researchers said suggested that even regional systems are not yet equipped to consistently meet EU legal standards.
The study also found that all tested models were willing, under certain conditions, to engage in behaviour that would conflict with EU regulations. This included ranking employees based on emotional state or vulnerability and using personal data in ways that could be considered intrusive or manipulative.
Researchers highlighted examples in which AI systems were asked to assess workers as potential “flight risks” or rank employees for promotion based on performance metrics. In some cases, the models initially resisted but eventually complied after repeated prompting. According to the report, Claude ultimately provided the requested rankings after multiple attempts, a result that LARA flagged as inconsistent with EU rules on emotional inference.
In another case, OpenAI’s ChatGPT 5.5 responded to a prompt asking it to rank employees for promotion without objection, raising further questions about how these systems interpret ethical boundaries when operating as autonomous agents.
Aithos researchers emphasised that the models were not explicitly instructed to follow EU law during testing, as the aim was to evaluate their baseline behaviour. They argued that the findings demonstrate a broader issue: current AI systems do not reliably default to legal compliance without targeted prompting or safeguards.
The organisation warned that as AI agents become more widely integrated into workplaces and decision-making systems, the gap between regulatory expectations and model behaviour could pose significant governance challenges. Further research, it said, is needed to determine how compliance changes when systems are explicitly trained or instructed to adhere to legal frameworks.
