17/03/2026
https://fastdatascience.com/generative-ai/openai-vs-claude-vs-qwen/
How well do the newest large language models perform? How do they compare to 2025's offerings?
We evaluated the newest large language models on a law test prepared by Eugenio Vaccari. The Chinese entrants DeepSeek and Qwen have challenged the dominance of GPT, although most UK users use GPT, Gemini, and Claude. Over time we are heading towards an 80% score on the law test when the bots are combined with a RAG system (a database of English insolvency statutes and case law), when two years ago we were only around 30%. What is fascinating is that the Chinese models are delivering a similar performance to the American juggernauts, for a fraction of the cost.
We've plotted the AI models' performance on the law exam with model release date on the x-axis, so you can see at a glance how rapidly the field is advancing. I posted an earlier version of this graph a year ago, but now we have data going back to 2024 and the early days of GPT 3.5, which is very exciting.
Meanwhile, the House of Lords has put out a report recommending that the UK government implements some protections for creative industries and to force AI companies to be transparent about where their training data came from. https://fastdatascience.com/legal-ai/ai-copyright/
π https://fastdatascience.com