April 20, 2024
3 Minutes

Revisiting Numi: Testing The Latest GPT-4 Update

As a coin collector and technology nerd, in late 2023, I developed Numi. This AI-powered chatbot leverages the advanced capabilities of OpenAI's GPT-4 vision model to assist coin collectors in identifying and grading their coins. Seeing the exponential growth of artificial intelligence has fascinated me, so I created Numi to test AI's abilities and tackle one of the biggest barriers to new collectors in coin collecting. Throughout Numi's development, I became more and more convinced that AI will fundamentally change the hobby's future.

Testing Numi With OpenAI's Latest GPT-4 Update

I hypothesized that more data given to the AI would yield more accurate grading results. In December 2023, I ran a series of tests on each grade using 2 to 10 photos per coin. Following OpenAI's recent April 2024 update to their GPT-4 model, which powers Numi's AI capabilities, I conducted another series of tests on Numi's grading accuracy.

I then ran statistical analyses to assess the impact on Numi's performance and compared its grading accuracy between the December 2023 and April 2024 test results.

Determining the Optimal # of Photos for Accurate Grading

A key aspect of my analysis focused on identifying the optimal number of coin photos users should upload to achieve the most accurate grading results. In December 2023, my tests indicated that uploading ten photos yielded the best accuracy across all coin grades. This aligned with my hypothesis that more data = better. However, after the GPT-4 update in April 2024, that number had changed, with just four photos now providing the most precise grading outcomes.

Just How Much Did Numi Improve?

I calculated the Mean Absolute Deviation (MAD) to measure Numi's accuracy and any improvements. This metric represents the average deviation between Numi's predicted grades and the actual, expert-assigned grades. In December 2023, Numi's MAD was 5.39, indicating that, on average, its predictions deviated by approximately 5 grade points from the actual coin's grade. By April 2024, following the GPT-4 update, Numi's MAD score decreased to 3.64, representing a substantial 32.47% increase in overall accuracy.

Given the updates, I suspected Numi would be more accurate, but I was not expecting this much of a change. While the GPT-4 vision model struggles immensely with medium-graded coins [Around XF-40], Numi performed exceptionally well for meager and high-graded coins. The most significant improvements are seen for very low-graded coins.

The Future of AI in Numismatics

After seeing these results, I am even more convinced that Artificial Intelligence will revolutionize the field of coin collecting. As models like GPT-4 continue to improve, AI tools will become increasingly valuable for collectors seeking to expand their knowledge and make informed decisions about their collections. While Numi will most likely not become the go-to tool for collectors in the future, it provides robust evidence of where the hobby is heading.

The progress Numi has made in a short time is encouraging, and I look forward to testing its capabilities as AI models advance. By making coin grading more accessible and user-friendly, AI can attract new enthusiasts to the hobby and help experienced collectors deepen their understanding and appreciation for numismatics.

Update: After I published this article, a few members of the coin community reached out with feedback on my analysis. With their help, I updated my methodology. Check out part two of my revisit to Numi.