Markov Chains: Modelling Matters

Square

Much of our modern society depends, in whole or in part, on our ability to predict the future. We predict markets, the weather, and – especially salient in times like these – the spread of pandemics. Without these prophetic powers, much of what we take for granted in everyday life would not be possible. At the heart of many of these processes are deep and complex mathematical tools developed over many generations. One of the most elegant and widely applicable of these tools is the Markov chain.

body of water between trees under cloudy sky
A highly complex dynamical system.

Markov chains were originally developed by Andrey Markov as he contemplated poetry in his homeland of Russia1. He developed them well before the dawn of the Information Age, when they would truly come into their own. When he discovered them, in 1913 they weren’t much more than a mathematical curiosity. It was his love for mathematics for its own sake, and his burning curiosity, that drove him to create one of our most significant implements for probabilistic modelling.

Markov chains are an integral component of models of probabilistic and discrete phenomena. In the most abstract sense, they are a sequence of random variables, X_1,X_2,…,X_n where each variable’s state depends entirely upon the state of the previous variable(the so-called Markov property). A variable may take its value from a preexisting set of states, which we may designate by {1,2,3,…n} Typically, when modelling we consider each X_i to be a given time, so that perhaps the index represents a given day, hour or so on. In this way, our principle that each X_{n+1} depends only on X_n can be said to reflect a system such as the earth’s atmosphere, whose change is only affected by its state immediately earlier2.

We choose to describe the process of a change between X_m and X_{m+1} by what we call a transition matrix. This matrix is constructed as follows:

P=\begin{bmatrix} p_{11}(m)&p_{12}(m)&\dots&p_{1n}(m)\\p_{21}(m)&p_{22}(m)&\dots&p_{2n}(m)\\ \vdots&\vdots&\ddots&\vdots\\p_{n1}(m)&p_{n2}(m)&\dots&p_{nn}(m) \end{bmatrix}

Each element p_{ij}(m) represents the probability of a random variable X_m being transformed from the state i to the state j. These probabilities may be functions of m, so that the way a system changes evolves over time, or we may choose to keep them constant, producing what is known as a homogeneous Markov chain.

If \textbf{x}_m=\begin{bmatrix} p_1&p_2&\dots&p_n \end{bmatrix} is then the set of probabilities that a certain variable X_m is in one of the states we designate by 1 through n, we then have that \textbf{x}P gives us similar probabilites for our variable X_{m+1}.

This may seem like a fairly abstract and dense piece of mathematics, but in this case the age old question of “when is this used in the real world?” has a clearcut answer. This simple construction provides economists and epidemiologists the world over a tool with which to analyse the evolution of complex systems that often directly impact our daily lives.

grascale photo of people standing on ground
A person is unpredictable; people are not.

Now of course, this is rather a lot of hype, but we’ve yet to explicitly outline a place these powerful modelling tools are used. They were (and still are) critical to the rise of one of this millennium’s giants: Google. The algorithm used by Google during their early days in order to rank websites in their engine, creatively named PageRank, depends upon a method fundamentally based upon the same mathematics Andrey Markov first discovered more than a century ago3. It is truly a testament both to the elegance and the efficiency of his method that it remains in use essentially unchanged even in the modern day.

The initial description of the utility of the Markov chain technique might have seemed perhaps a little narrow. However, it has been well over a hundred years since the development of the approach, and in the intervening time many robust extensions have been developed: we now know both how to model processes in continuous time using Markov chains4, and the method by which problems that do not satisfy the Markov property may nonetheless be transformed into ones we can solve. With these tricks of the trade in hand, we may model even incredibly complex and dynamic systems to an amazing degree of accuracy. Thanks to the singular utility of Markov’s discoveries, his name will likely live on for as long as those great natural cycles we seek to model.


Notes

  1. Andrey Markov(1913) “An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains.” Bulletin of the Imperial Academy of Sciences of St. Petersburg 7(3):153–162.
  2. Mind you, the atmosphere is not strictly a discrete system, and so may not be modelled by the type of Markov chain we talk about here.
  3. Ryan Tibshirani(2012) “PageRank” http://statweb.stanford.edu/~tibs/sta306bfiles/pagerank/ryan/01-24-pr.pdf
  4. Note that Poisson processes(essentially analogous to continuous Markov chains), which you may read about here, were developed before Markov chains.

Author

Leave a Comment