In 2011, Apple introduced Siri. This voice recognition system was designed as an ubiquitous digital assistant, which could help you with anything, anytime, anywhere. In 2014, Amazon introduced Alexa, which was designed to serve a similar purpose. Nearly a decade later, neither product has reached its potential. They are mostly niche tools that are used for very discreet purposes. Today New York Times explain how Siri, Alexa, and Google Assistant lost the AI race to tools like GPT. Now we have another notch in the belt of OpenAI’s breakthrough technology.
Yesterday OpenAI released GPT-4. To demonstrate the power of this tool, the company allowed a number of experts to try out the system. In the legal corner were Daniel Martin Katz, Mike Bommarito, Shang Gao and Pablo Arredondo. In January 2023, Katz and Bommarito studied whether GPT-3.5 could pass the bar. At that time, AI technology reached a overall accuracy rate by about 50%.
In their paper, the authors concluded that GPT-4 could cross the bar “within the next 0 to 18 months.” The low end of their estimate turned out to be accurate.
Fast forward to today. Beware the ides of March. Katz, Bommarito, Gao and Arredondo published a new article on SSRN, titled “GPT-4 Passes Bar Examination.” Here is the summary :
In this paper, we experimentally evaluate the zero-stroke performance of a pre-release version of GPT-4 compared to previous generations of GPT on the entire Uniform Bar Examination (UBE), including not only the examination of the multi-state multiple-choice bar (MBE), but also the open components Multistate Essay Exam (MEE) and Multistate Performance Test (MPT). On the MBE, GPT-4 significantly outperforms human candidates and previous models, demonstrating a 26% increase over ChatGPT and beating humans in five out of seven areas. On the MEE and MPT, which have yet to be evaluated by researchers, GPT-4 scores an average of 4.2/6.0 compared to much lower scores for ChatGPT. Ranked in the UBE components, similar to a human taste taker, the GPT-4 scores approximately 297 points, which is well above the pass mark for all UBE jurisdictions. These findings document not only the rapid and remarkable advance in the performance of major linguistic models in general, but also the potential of these models to support the delivery of legal services in society.
Figure 1 puts this revolution in the opposite direction:
Two months ago, an earlier version of GPT was at 50%. Now, GPT-4 has passed the 75% mark and exceeds the average performance of students nationwide. GPT-4 would rank in the 90th percentile of bass takers nationwide!
And GPT has achieved good results in all areas. The evidence is north of 85%, and GPT-4 scored nearly 70% in ConLaw!
We should all think very carefully about how this tool will affect the future of legal services and what we teach our students.