April 20, 2024
3 Minutes

Revisiting Numi: Testing The Latest GPT-4 Update

As a coin collector and technology nerd, in late 2023 I developed Numi, an AI-powered chatbot that leverages the advanced capabilities of OpenAI's GPT-4 vision model to assist coin collectors in identifying and grading their coins. It's been fascinating seeing the exponential growth of Artificial Intelligence, so I created Numi to test AI's abilities to tackle one of the biggest barriers to new collectors in coin collecting. Over the course of Numi's development, I became more and more convinced that AI is going to fundamentally change the future of the hobby.

Testing Numi With OpenAI's Latest GPT-4 Update

I had a hypothesis that more data given to the AI would yield more accurate grading results. In December 2023 I ran a series of tests on each grade using 2 to 10 photos per coin. Following OpenAI's recent April 2024 update to their GPT-4 model, which powers Numi's AI capabilities, I conducted another series of tests on Numi's grading accuracy.

I then ran statistical analyses to assess the impact on Numi's performance and compared its grading accuracy between the December 2023 and April 2024 test results.

Determining the Optimal # of Photos for Accurate Grading

A key aspect of my analysis focused on identifying the optimal number of coin photos users should upload to achieve the most accurate grading results. In December 2023, my tests indicated that uploading 10 photos yielded the best accuracy across all coin grades. This aligned with my hypothesis that more data = better. However, after the GPT-4 update in April 2024, that number had changed, with just 4 photos now providing the most precise grading outcomes.

Just How Much Did Numi Improve?

To measure Numi's accuracy and any improvements, I calculated the Mean Absolute Deviation (MAD) – a metric that represents the average deviation between Numi's predicted grades and the actual, expert-assigned grades. In December 2023, Numi's MAD was 5.39, indicating that, on average, its predictions deviated by approximately 5 grade points from the actual coin's grade. By April 2024, following the GPT-4 update, Numi's MAD score decreased to 3.64, representing a substantial 32.47% increase in overall accuracy.

I suspected that Numi would be more accurate given the updates, but I was not expecting this much of a change. While the GPT-4 vision model still struggles immensely with medium-graded coins [Around XF-40], Numi performed exceptionally well for very low and very high-graded coins. With the biggest improvements seen for very low-graded coins.

The Future of AI in Numismatics

After seeing these results, I am even more convinced that Artificial Intelligence will revolutionize the field of coin collecting. As models like GPT-4 continue to improve, AI tools will become increasingly valuable for collectors seeking to expand their knowledge and make informed decisions about their collections. While Numi itself will most likely not end up being the go-to tool for collectors in the future, it serves as powerful evidence of where the hobby is heading.

The progress Numi has made in a short time is encouraging, and I look forward to testing its capabilities as AI models advance. By making coin grading more accessible and user-friendly, AI has the potential to attract new enthusiasts to the hobby and help experienced collectors deepen their understanding and appreciation for numismatics.

Update: After I published this article, a few members of the coin community reached out with feedback on my analysis. With their help I updated my methodology. Check out part two of my revisit to Numi.