Suddenly everyone seems to have opinions about viruses and how to deal with them. If you’ve been following this blog for a while, you may already know my opinion about opinions – briefly, do some (real) research or shut up.1 But what if research doesn’t help because the experts don’t have the answers to your questions either? The problem is that that isn’t necessarily uncommon for a number of reasons. Most importantly, the questions you and I might be interested in – questions like What is going to happen? or What should we do? – are not really the kind of questions scientists try to (or even can) answer.2 Science compartmentalizes – it cuts up the world into different systems that are studied by different scientific disciplines – and (almost?) no one can reintegrate everything into a single model that takes everything into account. And scientists are – for good reasons – extremely hesitant to answer questions if they don’t have sufficient evidence to be very sure about the answer.3

Sometimes, if scientists don’t have the answer to the question you’re pondering, you might be able to get close to an answer yourself. You can read scientists’ answers to closely related questions, and those answers becomes pieces of the puzzle. If you have enough pieces, you might be able to fit them together and see the bigger picture. Sometimes… because you need enough pieces, and you need the ability to find (and understand) those pieces, and to put them together. And that may require a lot of work, a lot of thinking, and at least some acquaintance with the scientific disciplines you’re dealing with.

It’s the last part of the previous sentence where I have a problem. I don’t know anything about viruses (or medical stuff in general), although I have been catching up. (I am – or was – in fact so ignorant that three months ago I didn’t even know that antibiotics are ineffective against viruses.) And that makes me very hesitant to write about coronavirus-related matters. It took me a month to finally write down and publish my previous article about the topic, for example. But the topic is – obviously! – important, so I want to understand what is happening and what is likely going to happen. And I want to know what would work and what wouldn’t in dealing with this crisis. Not because I think anyone would (or should) listen to me – just because I want to know. I’m like that – I want to know and understand things, regardless of whether that knowledge and understanding serve a purpose. But there is much in this crisis that I don’t understand. For example, I remember that in January some experts – off the books! – remarked that, if left unchecked, the new coronavirus (then still nameless) could infect approximately two-thirds of the/a (?) population. But what I didn’t understand is why it would stop at that percentage, or even why it would stop spreading at all. This was (and is) one of the questions that puzzled me, but I was (and am) equally curious about the effects of different policies: social distancing, lockdown, test-and-trace, and so forth, and I couldn’t find clear answers to those questions either.

So I did a bit of reading, and soon I stumbled upon the notion of a SIR model. SIR models divide the population into susceptible, infected (and infectious), and recovered sub-populations and simulate the flows between those populations. But modeling population flows like this is – basically – demography, and that is a discipline I’m not unfamiliar with. In the contrary, I have a degree in geography and in the Netherlands (where I got that degree) demography is part of geography. Furthermore, early after graduating I did some research in a branch of demography and in that context even studied some epidemiology. But that is almost two decades ago,4 so while the notion of a SIR model rang a bell, the ringing was rather faint. Regardless, building a simple computer model of population dynamics is something I can do, and the more I thought about it, the more I realized that I don’t really need to understand viruses, because such a model might be all I need to get a better understanding – at least for myself – of the main questions that puzzle me.

The model I built is not a SIR model, however – it is a SEIR model. It adds one more key sub-population: people who are exposed (i.e. infected) but who are not infectious themselves yet. If there is no asymptomatic infection, then during the incubation period of the virus, people are in population E (exposed), but not yet in I (infectious).5 Only people in I spread the virus, and they only spread it to S. Recovered people (in R) are immune – at least for now, but more about that below.

Furthermore, the model is not an implementation of existing SEIR models. Rather, I built it from scratch. The reason for that is simple: its purpose was (and is) to help me understand the human population dynamics of this virus, and the only way to improve that understanding is to contemplate and model all the processes and transitions myself. But that has two obvious implications. Firstly, much of the following might be much less useful to you than it was to me. And secondly, you should probably take my model and what it seems to imply with a grain of salt. Again, this is not my area of expertise.

The model is also relatively simple, which is the consequence of another limitation. Right now, the only computer I can use is my old (and dying) laptop, and every time I change one parameter of the model, it takes much groaning and about 15 seconds of processing to make the roughly 20,000 calculations involved.6 And consequently, making the model any more complicated than it is right now is not really feasible with my current hardware.

the model

A population P consists of a number of sub-populations: susceptible people S who haven’t been infected (yet), but who can be infected; exposed people E who are infected, but aren’t infectious yet; infectious people I; and recovered people R, who are immune (but see below). Additionally, there are dead people D who are no longer part of the population, and people who are infected, but are quarantined or isolated Q and thus are unable to infect others. The model assumes that P+D is constant, or in other words, that in the (short) modeling period (of 201 days) natural births and deaths cancel each other out – the only change in population size is due to virus-related deaths.

There are five transitions in the model. Susceptible people get exposed (i.e. infected), exposed people people become infectious, some infectious people are isolated/quarantined, and all infectious people either die or recover. Let’s call these five transitions exposure, onset, isolation, death, and recovery.

Exposure determines the number of new infections (i.e. the growth of E). This number depends on three things. Firstly, it depends on the infectivity or transmissibility of the virus. Neither term seems to have an exact and universally agreed upon meaning, but in the model it is the chance of getting infected due to a certain kind of risky contact with an infectious person. Secondly, it depends on how many such “risky” contacts take place. These are two of the parameters of the model: average daily number of “risky” interactions, and chance of infection in such an interaction. And thirdly, it depends on the spread of the virus. The more infectious people there are in a population (i.e the higher I relative to S) the greater the chance of encountering – and thus interacting with – someone who is infectious. The model also takes into account that if very many people are sick, the number of interactions drops because you would meet less people.

Onset of infectivity, recovery, and death all depend on average periods. The period from exposure (or infection) to onset (i.e. becoming infectious) is the latency period. If there is no pre-symptomatic infection, then this is the same as the incubation period. Estimates of the incubation period of SARS-CoV2 differ from approximately 3 days to over 5 days (with most estimates around 5 days), and there is uncertainty about the extent of pre-symptomatic infection. The effect of pre-symptomatic infection would be that the latency period is shorter than the incubation period, so I set the latency period in the model at 4 days. (But this parameter, like many other model parameters can be easily changed.) Of course, this is an average, and the actual period differs from case to case. These individual latency periods are assumed to be more or less normally distributed, but with a fat tail on the far end, like this:

And consequently, the average period is longer than the most common (i.e. modal) period. The flatter this curve, the more spread-out the new onsets (i.e. E to I transitions). Available data suggests that onset is typically between 1 and 14 days, so the curve isn’t very flat. (And the distribution in the model is determined by these values.)

Recovery and death work the same way, but the average time from onset to death is much longer, and there is huge variation as well (and consequently, that curve is much flatter). To limit the number of computations required, I set the limit at 30 days in the model (with an average of 14 days), which really is a bit too short. Because of this, the number of deaths seem to respond a bit faster to changes in the number of infectious people in the model than in reality. More problematic than the period between onset and death is that between onset and recovery, however. Some people seem to recover almost instantly, while others take six weeks or more. What matters here, however, is that the cases that require a long recovery are all (or almost all) hospitalized and thus do not (normally) infect others. Hence, for the model these long recovery periods don’t matter. What matters for the model are the average recovery periods of people who are not hospitalized and isolated, but who go about their ordinary lives, or something sufficiently close to that to infect others. That period is very short, because this mostly involved mild cases with very short recovery periods, but I haven’t seen any reliable data that tells me how short exactly. Fortunately, there are workarounds.

The basic reproduction number R₀ is not itself a parameter of the model, but it can be easily derived from other parameters, and this can be used to calibrate the model. R₀ is defined as the expected average number of people infected by an infectious person (in the “natural” situation). That number is equal to the average number of risky interactions between infectious and susceptible people during the infectious period (i.e. from onset to recovery) multiplied with the transmission risk. And that average number of risky interactions is the length of the infectious period multiplied with the average number of daily interactions. Thus, R₀ = infectious period × avg. interactions × transmission risk, and all three of the latter numbers are parameters of the model. R₀ of SARS-CoV2 is estimated at between 1.4 and 3.9, but the first number seems much too low considering how fast it spreads. I chose a target value of 3.5, and further calibration (see below) did not result in a change of that number. There are multiple ways to set the three parameters mentioned such that the result is R₀=3.5, but the second and third only occur together in the model so those two can almost be regarded as a single parameter. The setting of the infectious period relative to the other two makes a difference for the speed of spread, so I ended up using the case of Wuhan (more below) as a test case. This resulted in an estimate of the average infectious period (i.e. from onset to recovery) of 4 days, which doesn’t seem implausible given that estimates of recovery for mild cases that I have seen vary from 2 to 7 days.

As mentioned, the average time from onset to death in the model is 14 days, with more spread than in case of the other two periods. The percentage of people that die depends on circumstances. There is a base mortality, which can be estimated on the basis of data from South Korea, the country with the most extensive testing. That data suggests that between 0.7% and 1% of infected people dies. You may have seen higher percentages, but those are (probably) case fatality rates (CFRs). CFR is number of deaths divided by identified cases, and not by the total number of infections. Because South Korea tested so much, it is likely that they found most of the infections and consequently, that their CFR is (or was) close to the mortality rate of the virus. Mortality isn’t fixed, however. If there are many more patients than hospitals can handle, then mortality will rise. The percentage of severe cases seems to be somewhere between roughly 5% and 10%. Many of those severe cases are at risk of dying if they cannot get the necessary care, but how many is hard too say. Perhaps, maximum mortality is 3%; perhaps it rises to 5% or even more. This maximum mortality is another parameter of the model (in addition to base mortality, which is between 0.7% and 1% as mentioned). Its effect is added by means of a logistic curve that starts climbing as soon as the number of severe cases equals the number of intensive care beds (another parameter of the model) and hits the maximum at roughly 40 times that number.

Finally, the model adds a sub-population of people who are isolated or quarantined Q. These people are part of the infectious sub-population I, but cannot infect other people. Hence, they play no role in exposure (see above). Without any testing and active isolation or quarantine policy, this number is probably between 5% and 10% because of sick people who effectively “self-isolate” (i.e. stay at home sick) or are hospitalized.

In the model, the testing and isolation policy can change once on a given date. Hence, in addition to the starting level, a second level for the Q/I ratio can be set, as well as a date when that second level goes into effect. The same applies to the interaction parameter. A change in the latter would reflect the implementation of a social distancing or lockdown policy.

Wuhan and Italy

After building this model, the next step was “testing” it to see whether what it predicts is at least somewhat close to reality. The best test case is Wuhan in China because that is the only location with long time series data and a significant number of cases. A major problem, however, is that the number of infections is unknown. All we know is the number of identified infections (i.e. “cases”) and the number of deaths. The latter number may be the most reliable data available, but it is quite possible that there have been deaths that were misidentified as deaths due to other causes. Furthermore, it is also a question how reliable the Chinese data really is. China doesn’t exactly have the best track record when it comes to reliability, so there is reason to be suspicious. On the other hand, China probably has more to lose from lying and being exposed later than from telling the truth in this case, and it seems that the Chinese government is aware of that.

Spillover (probably from a pangolin to a human) seems to have taken place some time in November 2019. To test the model, I assumed that there were 2 cases in the beginning of the last week of November. The following graph shows the number of infections (red line, left Y axis) and deaths (continuous black line, right Y axis) according to the model from early January to halfway March. The kink in the red line is caused by the lockdown, which went into effect on January 23.

Wuhan. Red line (left Y axis): infections. Black lines (right Y axis): deaths, according to the model (continuous line) and real data (dotted line). See text for further details.

According to the model, there must already have been 40 or so serious cases around January 1, and perhaps a quarter of that number roughly one week earlier. Considering that there was some communication between doctors around that time about SARS-like cases, this seems about right. The model further suggests that when the lockdown went into effect, there already were 200,000 people infected (exposed, infectious, or recovered), while the official number of cases at that point was just 770. This is considerably more than I expected. As mentioned, numbers of (identified!) cases are not a reliable proxy for the number of infections, however.

The dotted black line in the graph is the reported number of deaths, the most reliable statistic available. The model shows a steeper line for deaths, and this is at least partially due to the aforementioned limitation. If the normal distribution for deaths becomes narrower, the slope becomes steeper; if it becomes wider (or flatter), then the slope becomes more shallow, and because my old laptop already had difficulty with the model as it is, I couldn’t really make the normal distribution of deaths any wider/flatter to approximate real deaths any closer.

I must emphasize here, that it is not the case that this “testing” of the model just means that I input some data and then found that they resembled the situation in Wuhan. The initial parameter settings were based on guesses and/or value ranges found in papers and/or the internet, but after that I fine-tuned settings to see whether the model could approximate the Wuhan data. So testing didn’t mean seeing whether it fit already, but whether I could make it fit. (While keeping all parameters within plausible ranges, of course!)

It seems to me that much of what the model predicts after a little bit of tweaking with parameters is reasonably accurate. The mortality curve follows reality quite closely. The number of severe cases at various points in time seems plausible as well, and the model predicts that a lockdown implemented in the end of January would result in a decline of new infections to near zero in the middle of March (and zero before the end of March). If we can believe Chinese data, the lockdown policy in Wuhan indeed appears to have had these effects (and around those times). The model also suggests that 246,000 people got infected in Wuhan. Many more than the 80,000 identified cases, but less than I expected before building this model. (Previously, I guessed that only about 20% of infections was identified, but if this model is right, it was almost a third.)

A single test is hardly enough, but the second best case for testing, Italy, poses some problems. Firstly, the epidemic in Italy is not nearly under control yet. And secondly, Italy didn’t make one big policy change once, but very many small changes, slowly restricting interactions (and thereby reducing R₀) further and further. This is most likely what caused the fact that Italy’s deaths curve is almost linear rather than exponential. But this is also what makes fitting the model to the case of Italy impossible. In the model, I can make only one change in the interaction level (simulating a lockdown or social distancing policy), but not very many small changes.

The following graph shows the best fit between the (original) model and real data from the last week of February until the first weekend of April (i.e. the time of writing of this article). The model suggests a much higher number of deaths, which may be due to all the small incremental changes in the period before the kink in the red line. The date of that kink is the lockdown of parts of Italy on March 8.

Italy – model 1. Red line (left Y axis): infections. Black lines (right Y axis): deaths, according to the model (continuous line) and real data (dotted line). See text for further details.

Because of all those incremental changes, it is almost certain that not only the death statistics, but also the number of infections grew much slower than the model predicts, so the fit of the model to Italy is actually quite bad. Again, this is likely due to limitations of the model. The model predicts that the number of deaths in Italy will increase to almost 24,000, but also that the number of new infections is already sharply declining and that there will be no more new infections in about two-and-a-half months from now. However, the mortality predictions of the model are based on the mortality rates found in Wuhan and South Korea, and a quick glance at the population pyramids of those countries and that of Italy reveals that Italy has a significantly older population and – because older people are much more likely to die from COVID-19 – must have a significantly higher mortality rate as well. If this is taken into account, the modeled number of deaths deviates from reality even more.

The bad fit of the model to the case of Italy left me deeply unsatisfied, so I made a special version of the model with small daily decreases of the interaction parameter and a few bigger changes following the implementation of various regional and national policies and Italy. (And with a higher mortality rate, based on the difference in age composition.) This, as the following graph shows, results in a much better fit:

Italy – model 2. Red line (left Y axis): infections. Black lines (right Y axis): deaths, according to the model (continuous line) and real data (dotted line). See text for further details.

However, this can hardly be construed as support for the original model. That – with some creativity – the model can be made to fit to any data set is not really surprising and doesn’t prove anything. On the other hand, the extent of manipulation of the model is not that great – all I really did is lower interaction incrementally, more or less following the implementation of policies that should be expected to have such effects. Still, these manipulations only succeed in somewhat accurately simulating part of the upward slope, and it is quite possible that Italy will start deviating from the projection from now one. As mentioned before, Italy is not a good test case because – unlike Wuhan – it is still in the growth phase. In case of Wuhan, the model correctly predicted the decline phase as well, but that phase hasn’t started yet in Italy. If this second model is roughly right, on the other hand, then what it predicts for Italy gives little reason for optimism – the peak of daily new infections would be in the end of May, more than half of Italy’s population would get infected, and between 1.7% and 2% would die.

As mentioned, Italy is a problematic test case, and these last paragraphs should be taken with a big grain of salt, but I think it is fair to say that the case of Italy does not disprove the model. It is harder to fit the model to actual data from Italy, but it can be made to fit, and most importantly, it can be made to fit in a plausible way – that is, by using parameters and parameter changes that are based on real data and real differences.

herd immunity and mutation

So, now that I have a model that seems at least somewhat plausible, I can finally use it to see whether I can get some clarity with regards to the things that puzzled (and continue to puzzle) me. One of those things is (related to) the idea of “herd immunity” suggested by some. That idea is to let the virus rage through the population while protecting the most vulnerable so most people become immune and the virus goes extinct. The idea makes sense in theory – if enough people have recovered and have become immune, then the virus cannot infect any people (or not enough people) anymore and will die out (provided that it dies out in all other populations as well!). But how many people must be infected for that to happen? If nothing is done to stop or slow the virus, how many people will get infected? And how many people will die? As mentioned above, I have seen estimates of roughly two-thirds, but not in academic writings, and such estimates really just appear to be guesses. With this model, is it possible to come up with a better guess?

I don’t know. With the same starting settings as those used to model Wuhan and no policy changes, within a bit over three months 96% of the population is infected. That, however, I find rather hard to believe. If 10% (rather than 5%) of infectious people self-isolate (because they are too sick to leave the house or are hospitalized), it is still 95%. In either case, the model suggests that approximately 4% of people would die. (Probably less in countries with better health care.)

What the model doesn’t take into account, however, is that even if no policies are implemented, people change their behavior in response to circumstances, and that would have an effect on the spread of the virus. If people by themselves start practicing something like social distancing, for example, between 45% and 50% gets infected,7 but it takes close to a year before the virus goes extinct. This is a huge margin – according to the model between roughly 45% and 95% of people get infected if nothing is done.8 If anything can be concluded from this, it is that the model is probably not reliable – or even far from reliable – in case a virus spreads very widely.

Nevertheless, regardless of whether 45% or 95% of people get infected, very many people will die if the virus spreads very widely. Depending on various other circumstances, such as the age of the population, the model suggests that COVID-19 would kill between roughly 2% and 4% (but probably closer to the former). That seems a rather high price for herd immunity.

What is even worse is that herd immunity might be a dangerous illusion. The idea depends on the assumption that recovered people are immune. The model explained above makes that same assumption – it is a SEIR model. Not all viruses follow that pattern, however. Influenza and the common cold are better modeled by SEIS models,9 for example, because those viruses have many strains and mutate all the time, so no one builds up immunity or at least not for all of them. You might build up immunity for the last strain, but that won’t help you when the next, (mutated) strain spreads. Like the influenza viruses and the viruses that cause the common cold, SARS-CoV2 is a RNA virus, and RNA viruses mutate quickly. How quickly SARS-CoV2 mutates is still unknown, but the more widespread it is, the higher the chance that there are successful mutations.10 If a thousand times more people get infected, there is a thousand times more chance of a successful mutation.11 There is no way of knowing (yet) whether and when this will happen, but it is possible that if SARS-CoV2 becomes very widespread, it starts resembling influenza and the common cold. Then, no one will be immune and there will just be wave after wave, pandemic after pandemic, of different strains or mutations of the virus. Here, I’m moving far outside my “comfort zone”, however. While I kind of understand the population dynamics of epidemics (i.e. that what is modeled here), how and how fast viruses mutate and what the implications thereof are, I really do not know.

supplemental note (April 15)

The foregoing just discusses the maximum spread of the epidemic, and thus the process towards achieving herd immunity, but it does not look into the question of what percentage of immune people there need to be in a population for herd immunity to work. The model can easily answer that question, however. (With the usual caveats about the model’s reliability.) Using the values for infectivity etcera mentioned above, it can be modeled what would happen if new infections occur within a population with various levels of immunity – that is, different percentages of the total population in R (recovered and immune). It turns out that if R/P is just over 60% the virus does not spread. If it is lower than that it does spread further leading to similarly high total infection rates as mentioned above (i.e. over 90%), but the closer it gets to 60% the lower the number of new infections and the slower the spread, and if it passes 60% (i.e. 60% immune people in the living population) then the virus cannot spread at all and dies out almost immediately.

So that’s the threshold for herd immunity: 60% of people need to have recovered and gained immunity (assuming that the virus doesn’t mutate and that recovered people stay immune). Even without government policies to contain the virus, the model suggested that if people start practicing something like social distancing out of fear for infection, the percentage of infected people will stabilize at roughly 45% or 50% (see above), which is well below the herd immunity threshold. New infections in such a population will spread slowly and might be easier to contain, but it will take years of social distancing and repeated smaller and smaller waves of infections before finally the threshold is reached (or exceeded) and herd immunity is achieved. Developing a vaccine is faster. (And has a lower death toll.)

a further addition (May 8)

Apparently it is quite common for corona viruses that people (and other animals) don’t develop long-term immunity after recovery, but only temporary immunity that wears off in two years or so. If that is the case for SARS-CoV2 as well, then herd immunity is impossible because – as shown above – developing herd immunity takes longer than immunity in individuals lasts.

This also implies that some form of repeated lockdown and social distancing regime will have to continue (to prevent health care from collapsing completely) until a vaccine is developed and the majority of people are vaccinated, which can take several years. That is, if a vaccine can be developed at all, because that is also still uncertain at this point.

lockdowns and other policies

In addition to the puzzle of the maximum spread of the virus (and the implications thereof), I’m also curious about the probable effects of different kinds of policies, and that was my main reason for building the model. Social distancing and lockdown change exposure – by reducing the (risky) interactions between people (or by reducing their riskiness), and the effect thereof can be shown in the model by changing a single parameter. A lockdown reduces interactions to a very low lumber, while social distancing has less extreme effects, and various intermediates could be tested as well.

The question, of course, is how much social distancing and lockdowns reduce R₀. If you are rather outgoing and/or tend to have much close contact with other people, then the difference with what you are used to may seem extreme, but then you are probably not average. Many people have very little interaction with other people, and certainly not much interaction that is close enough to risk transmission. In the end, I couldn’t find any useful data that tells me how much policies like these affect R₀, so I had to guess. For social distancing, I estimated its effect to be a reduction of R₀ to one fourth (of the normal value); and for the much more extreme policy of a lockdown (which keeps people inside their houses with very few exceptions) to one tenth.

Another kind of policy that can be tested in the model is one focused on mass testing and isolation of identified infections. Obviously, if 100% of infections are found and isolated, the virus dies out soon, while, if no testing is done, it spreads fast and wide.

Let’s say that we have a population of 10 million with 10 infected people at day 0 (5 in E, 5 in I). According to the model, infections would reach the maximum of 95.3% at day 114, and there would be no new infections after day 155. The death rate may be as high as 4%. This is the baseline.

If a social distancing policy would be implemented on day 50, then only about 3.1% of people would get infected, but after 200 days (the end of the modeling period) there still would be more than 50 new infections every day. Waiting until day 70 would result in 40% of people getting infected, but the virus would go extinct within a year because there wouldn’t be sufficient interactions with susceptible people to survive. A lockdown is much more extreme, but also much more effective than social distancing. A lockdown on day 50 limits infections to about 1.2%, and no new infections occur after day 115 (i.e. after 65 days). A lockdown on day 70 results in 26% infections and no new infections after day 153 (i.e. after 83 days).

These data points suggest that social distancing by itself is not sufficient to get a local coronavirus epidemic under control. If “under control” means less than 10 new cases a day, social distancing can only achieve that goal if it is implemented so late that the virus has already burned through most of the population. (With an extremely high death count as a result.) A lockdown, on the other hand, brings an epidemic under control in roughly two months, even without additional testing and isolation of confirmed cases. And – rather obviously – these data points also suggest that an early response is better than a late response, but anyone could have guessed that.

If there is no social distancing and no lockdown, but a testing-and-isolation policy instead then the effects are different, of course. If testing and isolation increases from the base level of 10% (just self-isolation and/or hospitalization of very sick people) to 30% on day 50, then 90% of the population will get infected and there won’t be any new infections after day 185 (but mostly because almost everyone is immune or dead). If 50% of infections are found and isolated, 75% of people gets infected, and the virus might be brought under control in less than a year (but well outside the model period). If 70% of infections are found and isolated, but nothing else is done, only about 13.5% of people gets infected on or before day 200, but the 30% unidentified cases assure that there are still 1000s of new infections every day. 80% would result in 2.3% infections and approximately 20 new cases a day after day 200. 90% reduces infections to 1.2% and no more new infections after day 120.

Achieving such high identification percentages is pretty much impossible, however, except maybe very early in an epidemic. South Korea may have been close to or even exceeded 90% for a while, but they had about 7000 identified cases then. In the example used here, with 10 infected people at day 0, there would already be 90,000 infections by day 50 when a policy is implemented. That is obviously much too late, but it should be equally obvious that if there are so many infections, testing can never catch up to reach such high percentages. Reaching 30% would already be quite an achievement.

But things change, of course, if testing starts very early. At day 30 there are only 3200 infections, and it might be possible to trace and identify 90% of those. If that success rate is kept up, there might be on average one new infection every day for a long time, but the epidemic would be pretty much under control fairly soon. This would, however, require a very effective test-and-trace policy.

Later in an epidemic, when the number of infections is such that it is impossible to trace and test even half of them, social distancing or a lockdown is necessary to bring the spread of the virus under control, but it was mentioned above that the model suggests that social distancing is insufficient. If at day 50 testing and isolation is increased to 30% and simultaneously a social distancing policy starts, total infections reach 2% and zero new infections will be reached some time after day 200. (Down from 3.1% and 50 new infections a day after day 200 for just social distancing.) If there is a lockdown plus the same increase in testing efficiency, total infections reach 1.1% and there are no new infections after day 112. (Almost the same as just a lockdown.)

Timing matters a lot. Again, social distancing plus testing efficiency at 30% from day 50 reduces total infections to 2%, but starting 5 days earlier results in 0.8%, and 5 days later in 4.2%. But regardless of when social distancing starts, it is unlikely to be sufficient to stop a local epidemic. If this model is right, then only a lockdown can do that. And increasing testing does little to change that.

Of course, a lockdown is extremely damaging to an economy and society, but if the foregoing is right, then every alternative may be even worse. The best alternative to two months of lockdown is at least half a year of social distancing. But if many business have difficulty surviving two months without income, it is unlikely that they’ll survive half a year or more with only slightly more income. And it is an illusion to think that abstaining from such policies is any less damaging. If a two-digit percentage of people is too sick to work and between 2% and 4% of people is dying, that will be at least as devastating as a lockdown or long-term social distancing policy. Economic and social effects of an epidemic are outside the scope of this model, however, so I won’t anything else about that topic here.

Something else that needs to be taken into consideration, is that a lockdown may eliminate the virus from a population, but does not protect it from reintroduction from elsewhere (as the Chinese experience has made very clear). And given that it is virtually impossible that SARS-CoV2 will go extinct in all populations, it is almost certainly here to stay, and even a lockdown can only be a temporary solution. There may be a long-term solution, including a vaccine as well as other measures,12 but that is well outside the scope of this article.

For now, I have some answers to the main questions that puzzled me. Whether they are the right answers, I don’t know, but I think that in trying to find answers to these questions I improved my understanding of how viruses spread and what that implies in this particular case, and that was my main goal. I hope that reading this article had a similar effect on you, but I want to emphasize one more time that I’m not an expert about any of this. Don’t trust me – trust real experts. And do your own research.

If you found this article and/or other articles in this blog useful or valuable, please consider making a small financial contribution to support this blog 𝐹=𝑚𝑎 and its author. You can find 𝐹=𝑚𝑎’s Patreon page here.


  1. And watching videos on Youtube (or reading this blog) does not count as “research”.
  2. Of course, economists routinely answer this kind of questions, but their answers almost always turn out to be wrong.
  3. Again, economists are the exception here, but that is mainly because they have redefined “evidence” to mean something like “true in a fictional universe that has absolutely nothing to do with the real world”. See the economics category of this blog.
  4. And my research of the last 15 years has been in philosophy mostly.
  5. Strictly speaking, this is the latency period rather than the incubation period, but if there is no pre-symptomatic infection, then these are the same.
  6. That number is really just the number of spreadsheet cells that have calculations, and the vast majority of those cells have formulas in them that are longer than one line of text in this blog. So, I guess that the actual number of calculations involved is closer to 100,000 than to 20,000, and probably even more than that.
  7. See below on the implementation of social distancing in the model.
  8. But probably closer to the lower end of that range than to the higher end. Perhaps, 60% is in the right direction, which would indicate that the two-thirds estimate mentioned above isn’t far off.
  9. Susceptible → exposed → infectious → (back to) susceptible.
  10. “Successful” here means that it doesn’t die out immediately, but starts spreading.
  11. I think, but I’m not sure about this. I don’t know anything about the mutation of viruses.
  12. Such as changes in health care, economic changes, and much more.