What does AI do for the working mathematician?

Pushback on recent AI-and-math discourse
and three examples of using AI in everyday work.

Krystal Guo · May 12, 2026

The Zulip of the semi-automatic lean learning seminar at my university has, via shared articles and discussion, developed a general vibe of “is this the end of math?”, with a few tasting notes of “do we have a Claude addiction”¹. I’ve been constantly pushing back on nearly every article — not just because they evoke an image of Gollum, clutching a laptop, hissing “ooh my precious Claude”, but because they don’t address our main question: what does AI do for the working mathematician?

So, now, I’m called to write the article that I keep wanting to read. We will proceed as follows: first, all of my pushback on the doom-takes; then — to ground us in specifics, in the now — three specific things I’ve done with AI; and last, the déjà vu, as an early-ish user of SageMath in my immediate academic community².

Pushing back on recent AI articles

Why, look you, I am nettled and stung with pismires when I read of “the end of math”. Thus spake Shakespeare’s Hotspur, more or less.

This is how it’s always been

The “theorem economy” described by Bessis³ is the setup where mathematical careers are built on proving new theorems. In this context, we have Hardy’s infamous quote “Exposition, criticism, appreciation, is work for second-rate minds”. I’m always lecturing my students about providing mathematical context in the chatty bits of their thesis, so let me provide the context here. Hardy wrote those words in his book about why he does math⁴, in which he laments not being able to prove theorems, due to age and health; the book itself is a beautiful exemplar of exposition, criticism, and appreciation for mathematics. I loved reading this while choosing math as a career. If only Hardy could have heeded the words of his fellow Cambridge person Milton — “They also serve who only stand and wait” — we would not now be saddled with this elitist quote.

Bessis writes:

This is how the system worked for millennia. Mathematicians created value by introducing new concepts, but the rule was that only theorems could put bread on the table.

as part of his argument that AI threatens this structure. But the theorem-as-currency, priority-claiming enchilada is a feature of modern professional mathematics. Our current system of doing math — with journals and the academic structure of research universities — only dates back a few hundred years, around the time people discovered integration undoes differentiation. Before that, math was the same Pythagorean, Euclidean business for thousands of years⁵. I routinely tell this origin story to my undergraduate student — read all about it in Edwards’s book on the history of calculus⁶.

Since this system is only a few centuries old — basically in its infancy, in the grand scheme of things, really — it’s completely reasonable to expect that it is still in flux. Which it is. For example, the Declaration on Research Assessment (DORA) argued that metrics like impact factor are not good proxies for how good a paper is. One trickle-down outcome of this movement is that the tenure-track assessment procedures at my university were changed, to reflect valuing impact over time. What can be more natural than another shift towards valuing what Terence Tao calls the digestion⁷ of mathematical theorems?

On going the way of Chess and Go

In a January 2026 interview with science journalist Alok Jha, Geoffrey Hinton singled out mathematics as the domain where AI would make progress fastest, giving a specific reason:

There’s one area in which that’s particularly easy, which is mathematics because mathematics is a closed system. […] I think AI will get much better at mathematics than people, maybe in the next 10 years or so.

He then drew the explicit analogy to Go and chess, which has been taken up by various people, including Bessis and Kirov⁸. The upshot is that math will be “solved” by AI and then quoth the raven “nevermore” and so on.

Setting aside whether or not this is the right analogy, let’s just push back on one factual premise: AlphaZero and AlphaGo didn’t destroy chess and go. Professional go is arguably bigger than before; it now features, for example, AI explanations to help audiences follow the matches. Chess has had a huge resurgence, though it could be due to The Queen’s Gambit. Maybe instead of speculating about the end, we should start making gripping visualizations of math — like Anya Taylor-Joy’s character moving chess pieces on her ceiling — in preparation.

Most math theorems don’t get their own movies

Instead of sustained efforts at building a theory or something else reasonable but boring, most articles I’ve read identify mathematics with problem-solving at the highest level. Kirov writes about math as a “sport” and worries that AI is “the ultimate doping”. Benchmarks like First Proof and the Erdős problem leaderboards treat mathematical capability as something measurable by whether AI can crack canonical hard problems. Quanta’s recent “The AI Revolution in Math Has Arrived” frames its story around “proving new results at a rapid pace”. In Scientific American’s piece⁹, an amateur solving an Erdős problem with ChatGPT is a David-versus-Goliath story. The shared assumption is that mathematics is what happens at the top. We’re talking Fields Medals and the proofs that make it into the glossy magazines, and how AI threatens these endeavours. But are we really talking about math?

“For the growing good of the world is partly dependent on unhistoric acts,” wrote George Eliot at the end of Middlemarch. The same is true of mathematics; the state of mathematics is only epsilon-dependent on Fields-Medal-worthy results and the rest, asymptotically almost surely, comes from ordinary working mathematicians.

The real question is not whether AI can solve famous open problems faster than humans. The real question is what AI does do for the actual work of doing math. Specifically I mean, tasks like convincing students that there is something to be proven here, writing out proof details, checking if this theorem really is subsumed by this other theorem (as referee 2 claims), and other things that mathematicians actually do. None of these recent news pieces engage with that. To balance our Zulip, here are specific examples of how I’ve used AI, as a working mathematician, that I now proffer as ordinary data points.

Examples of my Claude usage

Teaching students how to use AI by example

When I asked my graduate class if they had used AI, I was met with reticence. The University of Amsterdam has provided our students and staff with UvA AI Chat, together with guidelines about plagiarism and inappropriate usage, but very few instructions about how to use it. So, I decided we’d use it together in a review class where the purpose was to go through some problems together and discuss what constitutes a good solution.

I had Claude generate a solution to one of my assignment problems. It did a good job; it set up the intricate case analysis with case labels that were arguably better than mine. However, crucially, it missed a case. We found the missing case because we read the solution line by line, and, for each sentence, we discussed whether it was a true statement and whether it followed from the previous statements. Reading line by line in a group is, by the way, a method of polishing the final draft of a paper — though sometimes you have to bribe your co-authors with baked goods to get through it.

I heard some murmurs of “this is more work than doing it ourselves”. Several students said that no one ever showed them how to use AI. Definitely the highest interactivity and engagement I’ve had all term.

Using AI as a safety net in teaching a standard theorem

Attendance in classes has been low across many courses lately, so I told my undergraduate class that, since they showed up, we should do something they couldn’t get from reading notes on their own, in the form of a “choose your own adventure” lecture. I told students the statement of Menger’s theorem and let them suggest how to prove it, following their ideas to develop the proof. I prepared by reading the three proofs in Diestel’s book and the proof from our own notes. Since this is a standard theorem in undergraduate graph theory, I felt confident Claude could help me with the details, if I got stuck.

I was hoping my students would stumble onto one of the proofs from Diestel, but no such luck. Many roads lead to Menger’s theorem, but not every road. We had lots of discussion of what has to be proven and many students engaged with the proof process, raising their hands to explain their ideas. During the 15 minute break, I put the partial proof into Claude and asked it how to complete it. It said, as I had suspected, that this can’t be completed and gave me an idea to change the setup a bit and finish the proof, which I did.

Without this safety net, I would have had to shepherd the class into one of three familiar proofs. Adventure games are not without risk; let’s not forget how often Oregon Trail, a text-based game of my childhood, ended with dysentery or snake bite at the river crossing. The students really enjoyed having their ideas formalized into mathematical statements. Many students came to ask me questions about other proof ideas and one came to thank me at the end of the class, which is not an everyday occurrence.

Computations: previously I had thoughts and now I have code

Though it’s not in the mainline of my research, I have a slow-burn project to find a new strongly regular graph because most of these graphs, despite being highly regular, do not have symmetries, while most of the known constructions use symmetry. It’s like the drunkard looking for their keys under the streetlamp who, upon inquiry, says “no, I lost them over there, but it’s dark over there”.

I’d been nursing an idea for a while. A strongly regular graph corresponds to a 0–1 matrix (its adjacency matrix) that satisfies a specific equation. I wanted to search for such matrices by starting with a matrix that has the right number of 1s in each row, then toggling entries to try to make the equation closer to being satisfied, using a method called simulated annealing (which sometimes accepts flips that make things worse, to avoid getting stuck). I’ve written several simulated annealing programs in SageMath myself. I wrote a description in LaTeX of what I want to do and gave it to ChatGPT, along with some of my previous code, and asked ChatGPT to implement the new version.

It was like the joy of the cows being released to pasture in spring (koeiendans in Dutch), because now I could implement all the variations that popped into my head. Previously, it would have taken me an hour to code up each variant, but now I could just go through a handful in an hour. It’s hard for academics to find a consecutive five-hour period uninterrupted by meetings, teaching or emails, so five hours of programming realistically means working sporadically over a period of several weeks. None of these variants produced a new graph. Knowing what doesn’t work is useful, but if I’d spent weeks implementing this, I’d have been deeply disappointed.

Gordon Royle had once mentioned to me that he’d tried searching for new strongly regular graphs by encoding the problem as a constraint satisfaction problem with binary edge variables. I had just paid for a month of Claude Max, so I decided to have a go at it. Claude Code produced five different implementations: a CP-SAT encoding using Google OR-Tools, a custom backtracking solver in C++, a row-by-row solver that fixes rows incrementally and backtracks on infeasibility, a Gecode-based version, and a Chuffed version in MiniZinc. All five found the Petersen and Shrikhande graphs quickly and some of them handled the (25,12,5,6) and (26,10,3,4) cases in seconds. Nothing terminated on (35,16,6,8) within ten minutes.

On my laptop, this type of approach just doesn’t scale past about thirty vertices; CP-SAT looked like it would not terminate, even on a supercomputer. This is the kind of negative result you usually don’t have time to produce. But the time investment was an afternoon, and now I know not to assign this as a masters thesis project. In fact, armed with this knowledge, I can now come up with much more interesting project ideas.

Both the simulated annealing and the constraint satisfaction are ideas I’ve been carrying around for years, which suddenly became accessible.

Maybe it’s wanting to know that makes us matter?

About 10–15 years ago, during my PhD, the mathematicians around me were starting to use SageMath¹⁰ for the first time, whereas I’d used it extensively during my masters degree. Some mathematicians were proud that they didn’t write code. Though my research does not contain computation results, it forms a key part of my research process. I was invited to give a mini-course¹¹ at CMS to tell mathematicians how I use SageMath in algebraic graph theory, and a talk¹² at SageDays 109, to share much the same with Sage experts.

The types of computations I did then are now unremarkable; of course you can construct the objects in SageMath and look at them. Gordon Royle and I did a great deal of computation to formulate our theorem that all cubic graphs have an eigenvalue in the interval (−1, 1), except two infinite families and some small, sporadic examples¹³. This is the sort of theorem where the computation is a thought artifact, but not a publishable result. This is a theorem of the form “everything has this property, except a small list of exceptions”; computations are needed to discover that the exceptions exist and — more importantly — that they stop existing past a certain point. Without these computations, there would have been no reason to think our proof strategy would close.

SageMath is a great tool. Some people (me) find it a powerful tool and others continue to do research without it. Maybe some people will find AI to be a powerful tool in research and others will continue to do research without it.

It could be argued that SageMath was a tool and AI could be different, in that it could do parts of the thinking and not just the computing. This is not what’s present in the anecdotal evidence so far.

To illustrate this, the titular claim of a recent Quanta article is that an AI revolution in math has arrived. The article claims that AI is doing math. But every specific example documents AI doing an exhaustive computational search or generating unreliable possible solutions, while humans provide all of the mathematical insight and interpretation.

In the article, ChatGPT generated candidate proofs for Ryu, who then discarded the wrong ones and verified the correct one. AlphaEvolve generated computations and, out of all that data, Ellenberg et al. spotted the hypercube structure. Vakil sketched a proof and asked the model to fill in details. These are undoubtedly remarkable accelerations of mathematical work, but what is accelerated is exactly what SageMath accelerated: generation of candidates, which have to be checked by a human, and the production of computational artifacts whose mathematical meaning needs to be extracted by a human mathematician. The AI is agentic, but the users have all of the agency.

I don’t know how large the effect of AI on math will turn out to be. It’s entirely possible the price of Claude Code, ChatGPT Codex and their ilk will exacerbate inequality in access. PhD students could find themselves replaced by Claude as their supervisor’s thinking partner — which is an even bigger problem for aspiring PhD students who have yet to find a position. AI is a powerful tool and we should have more discussion about how to actually use it. What are good questions to ask it? Mostly, I want us to remember that what makes us researchers is the wanting to know, independent of the potential to win Fields Medals and be depicted in A Beautiful Mind. And now, to close us out, here’s another quote from Tom Stoppard’s Arcadia.

SEPTIMUS: When we have found all the mysteries and lost all the meaning, we will be alone, on an empty shore.

THOMASINA: Then we will dance. Is this a waltz?

Rachel Thomas on how vibe coding mimics gambling and Armin Ronacher on agent-coding addiction. ↩
The algebraic graph theory community. ↩
Bessis on AI and the “theorem economy.” ↩
G. H. Hardy, A Mathematician’s Apology (Cambridge University Press, 1940). ↩
In the West, at least. Islamic, Indian, and Chinese mathematics each had substantial independent traditions in this period. But none of them had the modern system of journals, peer review, or tenure track, as we understand them, so my point about Bessis’s “millennia” claim stands. ↩
C. H. Edwards Jr., The Historical Development of the Calculus, Springer-Verlag, 1979. ↩
Terence Tao, “New mathematical workflows” (keynote, Future of Mathematics Symposium, Stanford University, May 2, 2026). Video. ↩
Rado Kirov on math as sport, May 8, 2026. ↩
Mark Buchanan, Amateur Armed with ChatGPT ‘Vibe Maths’ a 60-Year-Old Problem, Scientific American, April 2026. ↩
SageMath is an open-source math software that can be used instead of anything starting with “M”: Mathematica, Maple, Matlab, Magma, etc. ↩
Mini-course: Using the Sage Mathematics Software System in algebra and discrete math. 2019 Canadian Mathematical Society Summer Meeting, University of Regina, June 2019. ↩
Using SageMath in Algebraic Graph Theory. Global Virtual SageDays 109. May 2020. ↩
K. Guo and G. F. Royle, “Cubic graphs with no eigenvalues in (−1, 1),” J. Combin. Theory Ser. B 176 (2026), 561–583. Open access version. ↩