The joy of maths

SHUBASHREE DESIKAN

Sep 2024
from Shaastra :: vol 03 issue 08 :: Sep 2024

Why Machines Learn: The Elegant Math Behind Modern AI; By Anil Ananthaswamy; Published by Dutton; 480 pages; $32

An enchanting journey into the mathematical heart and the history of machine learning.

The challenge in writing about science is to distill and 'humanise' dense subjects and make it an enjoyable read. In his earlier books, award-winning science writer Anil Ananthaswamy tackled complex subjects creatively and with a light touch. In his latest, he does much the same while taking on the question uppermost in the minds of curious people today: how does artificial intelligence (AI) work?

Given that machine learning (ML) is already making life-altering decisions for and about us, Ananthaswamy reckons that it is crucial to get under its "mathematical skin". Only this way can we understand both the power of the technology and its limitations, he reasons.

As one might expect, there's a fair bit of math underlying this book, but it's all presented in simple, digestible portions. The reader who puts in the effort to work it out will be well rewarded. And for the mathematically challenged reader, there's plenty of history with delightful anecdotage to hold interest and learn from. There are also exit points for those keen to take a detour around the math.

Indicatively, in one of the chapters – 'In All Probability', an exposition of how probability and statistics are used in ML – the author handholds the reader through a gamut of concepts: from an introduction to the non-intuitive ways of statistics, to the usefulness of the Bayesian approach, towards identifying the first use of this method in an algorithmic way, the generalisation to higher dimensions and then on to a discussion of how this is used in AI.

Starting with an example of the Monty Hall Problem, named after the host of an American TV show Let's Make a Deal (bit.ly/make-deal), Ananthaswamy illustrates how an intuitive argument would take one away from the correct answer. The actual solution, debated by mathematicians, follows a more complex reasoning, which introduces the Bayesian method in probability theory. The chapter then delves into supervised learning, one of the dominant modes of training neural networks.

To offset this theory overload, the next few pages delve into history, highlighting the first large-scale demonstration of using Bayesian techniques for ML. This is the story of two statisticians, Frederick Mosteller and David Wallace, who established the disputed authorship of the Federalist Papers. These were essays anonymously published in U.S. newspapers from 1787 under the pseudonym Publius; they were intended to convince New Yorkers to ratify the Constitution. While the authorship of most of these essays was later established, that of about 15 was disputed. Tossing out curious facts, Ananthaswamy describes how Mosteller's and Wallace's quest led to groundbreaking work that was a seminal moment for statisticians.

ML is already making life-altering decisions for and about us. The author reckons that it is crucial to get under its "mathematical skin".

From there, Ananthaswamy wades into a problem of identifying penguin species by their bodily markers, taking the reader through analyses that establish the multidimensional aspect of, and the approximations in, the math that machines do to learn their ways. The concluding section ties together these concepts to account for how ML algorithms make their predictions. Chances are that, by then, the reader would have figured out why the definite and discrete outputs of a computer are tied up with distributions and probability.

Whether Ananthaswamy is talking of ML algorithms or manipulation of matrices, he maintains a lightness of language and invokes historical accounts to advance a compelling narrative. In the section 'It doesn't get simpler', he explains how the separability problem, introduced early in the book, is tackled by the algorithmic approach. In the chapter 'There's Magic in Them Matrices', he concisely outlines the power of matrix methods in building up classifiers – but also details their limitations. In a later chapter 'With a Little Help from Physics', the reader learns the story behind John Hopfield's fundamental contribution to computational neuroscience, and to a revival of interest in neural networks. The chapter ends with a convergence proof for the Hopfield network.

The book is a must-read for anyone who is curious to understand "the elegant math behind modern AI". It is also an inspirational guide for teachers of math and mathematical sciences who can adopt these techniques and methods to make classrooms lively.

In sum, the merits of this book can be encapsulated in a mathematical equation:
Math + history + humour = A joyous read