On prose in math papers

I had an exchange about writing styles in mathematics yesterday with Anna-Laura Sattelberger. Having in mind my latest paper on Selfadhesivity in Gaussian conditional independence structures and its Preliminaries section without any exposed definition environment (but with plenty of definitions), I took the side in favor of prose in math articles. However, I found her conviction against prose and for a short and precise presentation with clearly demarcated definitions hard to disagree with. After all, visual appeal and readability are important considerations in writing for me. So I had to do some soul-searching, the result of which is the present article.

The number one stylistic rule of composing a scientific article is to make “every word tell”. The Elements of Style say, under rule 17 in the edition that I have:

Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts. This requires not that the writer make all sentences short, or avoid all detail and treat subjects only in outline, but that every word tell.

In mathematics, reading papers is a task with a high baseline cognitive load, so clarity is all the more important. The use of flowery language, complicated sentence structure or unnecessary tangents only makes it more difficult to understand the contents. Unnecessary words distract and mislead. (For that matter, bad typography distracts, too, and must be avoided.)

I don’t believe that Selfadhesivity in Gaussian conditional independence structures contains any of those distractions. What I have instead been criticized on multiple occasions for is writing too densely and too linearly. The undesirable result of this style is that readers are forced to digest the material from the beginning to the end in the order I have chosen.

The reason, I believe, for writing like this lies 6 years past. I struggled a lot in my first 1½ to 2 years with the theory of conditional independence structures. (This was the time of my M.Sc. studies and into early PhD time.) Almost all I know about the topic came from reading papers by Milan Studený and, with a greater share, Fero Matúš. A very much on-point characterization of his writing style was given by Tarik Kaced in the Editorial of the special volume of Kybernetika dedicated to Fero’s memory:

I first got acquainted with Matúš through his papers that stood out from the usual easy-to-read, and sometimes shallow, works from the literature. It took me sometimes days (and many nights) to decipher his condensed and cryptic ideas, which, in hindsight, hinted that I found a hidden hard gem. […] Following his works and papers has been rewarding, I always find his minimalist talks very accessible while his papers contains not-to-be-overlooked notes and remarks. Indeed, a short footnote question of his lead to a journal paper of mine. I am sure more pearls are waiting to be discovered.

Unfortunately, I’m a dense and slow person. It took me a lot of time to see past the acute and precise — yet unmotivated! — definitions and constructions in Matúš’s many papers. In the beginning, I thought my PhD was going to be about an unusual branch of combinatorics which had long ago departed from its statistical origins. Just as Kaced says, deciphering his ideas one after the other and the gradual falling into place of the big picture is extremely rewarding. In my opinion, Matúš’s works are timeless pieces of mathematics, undisturbed by unnecessary words. Over time, I did recognize the connections between his papers, the inspiration behind their definitions, axioms, constructions and proofs. Most importantly, I discovered the beautiful geometry behind it all and ended up in algebraic statistics.

I think the intention behind writing the Preliminaries in Selfadhesivity in Gaussian conditional independence structures the way I did was to develop the required part of CI theory for the reader as vividly and as concisely as possible — with the statistical and geometric motivation and interconnections revealed instead of suppressed. If you read it, you may find that it is still dense and avoids unnecessary words. This is what I mean by “prose” in mathematics. When I wrote those Preliminaries, I thought about including more definition environments, but I made a conscious decision against it. My reasoning was that a definition is visually a paragraph break and hence concludes a mathematical unit of thought. This did not align with my intention of writing a smooth exposition.

But I’m looking forward to writing a forthcoming paper somewhat differently from what I’m used to.