What Makes The COVID-19 Mortality Forecasts Upon Which The White House Relies Seem So Low

If in the end, the total number of U.S. deaths from COVID-19 turn out to be remotely close to what the White House is projecting, will the outcome be due to brains or luck?

The forecasts of U.S. COVID-19 deaths, upon which the White House relies, imply that by August 4 the outbreak will be fully contained. By then, daily death rates will have dropped to zero, and total deaths will have reached 68,000. These same forecasts tell us that one month from now, the outbreak will be 96% contained, meaning that of the total eventual U.S. deaths from COVID-19, only 4% will occur after five weeks from now.

The source of the mortality forecasts just described is the University of Washington’s Institute for Health Metrics and Evaluation (IHME), whose director is Christopher Murray.

To be precise, the IHME is now projecting that the eventual number of COVID-19 deaths in the U.S. will be 68,841, with a confidence interval between 30,188 and 175,963.

There is an important debate taking place in the country around these numbers. For those arguing that it is time to reopen the economy, these are supportive forecasts. For those arguing that the country’s containment measures in the last month have been an overreaction, these are supportive forecasts. For those arguing that the country needs to avoid actions that severely weaken containment, the IHME’s forecasts are problematic, and appear unrealistically optimistic.

Last Sunday, on CBS News’ “Face the Nation,” IHME director Murray, stated that the U.S. would “very clearly have a rebound” in COVID-19 cases were social distancing guidelines to be eased on May 1. On April 16, the White House outlined a general plan for restarting the economy on a state-by-state basis. The plan consists of a three-phase process in which after each phase, states need to record a 14-day “downward trajectory” in their COVID-19 cases.

In this blog post, I discuss the question of whether the IHME’s forecast methodology exhibits unrealistic optimism. To that end, I compare the IHME methodology to both the methodology used by other institutions and to a naïve forecasting model.

Readers who prefer short posts, and want to skip detailed discussion, can skip right to to conclusion for the main takeaways from this post.

The White House, IHME, and Imperial College

Dr. Deborah Birx is a physician who serves as serves as the Coronavirus Response Coordinator for the White House. She is often at the side of President Trump during press briefings about the pandemic. In a press briefing on March 30, 2020 she explained how the White House came to rely on IHME.

Dr. Birx stated that the team advising the President on COVID-19 had reviewed 12 different models, from institutions such as Imperial College London and Columbia University. The models predicted a range of deaths between 1.6 and 2.2 million if the country took no containment measures, with at least half the country eventually becoming infected. The team did an analysis and evaluated the models by comparing their outputs to actual reported cases.

According to Dr. Birx, at the time of the analysis, her team was unaware of a parallel effort by IHME. However, they did then become aware and discovered that IHME had arrived at similar results as them. After that, IHME’s forecasts came to play a central role in White House thinking.

IHME Forecasts

According to the IHME, the peak for new or daily deaths in the U.S. occurred on April 12; and on April 9, the forecast for April 12 was 2,212 (revised down from an earlier estimate of 3,130). Actual new deaths appear to have peaked on either April 10 or 11, at 2,087, and then declined for the next three days.

In a C-Span interview on April 15, Ali Mokdar, Chief Strategy Officer and Professor of Global Health at IHME stated that the peak for the U.S. as a whole had occurred on April 10, although some states were still experiencing increased numbers of deaths.

During the C-Span interview, Professor Mokdar explained that the IHME methodology for projecting deaths is based on models that are different from most other research groups, because of IHME’s emphasis on fitting the patterns of daily mortality observed in the experiences of other geographic areas such as Wuhan, Italy and Spain.

Herd Immunity

Standard epidemic models, such as the Kermack-McKendrick model from 1927, emphasize that in the absence of vaccines and treatments, epidemics only conclude when herd immunity is achieved in the population. Herd immunity refers to the proportion of the population that is immune to the source of infection. When the proportion of the population which becomes infected and then immune is sufficiently high, the total number of infections stabilizes, instead of growing.

According to Dr. Anthony Fauci, director of the National Institute of Allergy and Infectious Diseases, who advises the President on COVID-19, a vaccine for COVID-19 will note be available for at least 10 months. Therefore, the country is going to have to rely on herd immunity if it is to end the COVID-19 pandemic in the U.S. this year.

I wish to emphasize that the research methodology employed by IHME implicitly builds in the containment measures being followed by the population, be those measures voluntary or government mandated. As a result, the number of deaths forecasted by the IHME model factors in the intensity of those measures.

In contrast to methodologies such as those used by researchers at Imperial College London, herd immunity is not a salient part of IHME communications. The Imperial College approach begins with herd immunity and then analyzes changes that result from containment measures. The IHME approach instead appears to focus on asking how behavioral patterns observed in Wuhan, Italy, and Spain carry over to the U.S.

Computations For Herd Immunity

Projected mortality based on herd immunity leads to much higher eventual deaths from COVID-19 than the forecasted U.S. deaths from the IHME model. Moreover, it is straightforward to compute the expected number of deaths until herd immunity is achieved, involving the product from multiplying the following three numbers: the proportion of the population who become infected by the time herd immunity is achieved, the death rate per infection, and the size of the population.

The computation of herd immunity begins at the beginning of an outbreak with the question of how many people an infected person in turn infects. This is called the initial transmission rate. For example, suppose that for COVID-19, the initial transmission rate is 2, meaning every person infected with the novel coronavirus in turn infects two others, at least at the beginning of the outbreak. The initial transmission rate for the source of infection has the symbol R0.1 An R0 of 2 implies that the number of infections will double in short order and begin to grow exponentially. For example, if Person A encounters and infects two others, B and C, then in short order A will either die or recover and no longer be infected. However, B and C will be infected, and so in short order the number of those infected will have doubled.

Although the number of infected persons grows when R0 exceeds unity, over time, the newly infected will be more likely to encounter people who were infected in the past and have become immune. Such immunity will eventually slow the growth rate of new infections.

Suppose that half the population becomes immune when R0 is equal to 2. Then on average, whenever an infected person encounters two others, he will infect only one of them, the one who is susceptible, because the other one will be immune. This is important because in this situation, the number of infections will stabilize. To see why, suppose that Person A encounters two others, B and C, and in the course of their interactions transmits the virus to both. A, already infected, will either die or recover, and as a result will no longer be infected. If B is susceptible, then B will become infected. However, if C is immune, then C will not become infected (again). Notice that the number of infected cases remains the same. In this example, B simply replaces A as the infected party.

There is a simple formula to compute the proportion (p) of the population that needs to be infected to achieve herd immunity, and that formula is p = 1 – 1/R0. So, if R0 = 2, the p = 1 – ½ = ½.

Suppose that the death rate per infection is 1%, and that the size of the population is 300 million. In this case, with an R0 of 2, half the population, 150 million eventually become infected, and 1.5 million (1% of the infected) die. That, more or less, is how to forecast the eventual number of deaths from an epidemic when the only way to stop the epidemic is to achieve herd immunity.

For COVID-19, there are varying estimates of R0, ranging from 2.2 at the low end to 5.7, with the upper end of a 95% confidence interval being 8.9. There are also varying estimates of the death rate per infection. Evidence from the Diamond Princess, suggests a death rate of 0.99%. Economists Eichenbaum, Rebelo, and Trabandt report a similar statistic from South Korea, adjust the rate for age, and arrive at a value of 0.5%. The size of the U.S. population is approximately 330 million.

Taking the “best” case, with R0 equal to 2.2 and the death rate per infection equal to 0.5%, the estimated total number of deaths in the U.S. from COVID-19 to be 900,000. This is 13 times as great as the IMHE estimate.

What 13X Means

Containment measures such as social distancing reduce the effective transmission rate, reducing the actual transmission rate below what it would otherwise be. In this regard, social distancing might lead a society facing an outbreak with an R0 of 2 to behave as though R0 was instead 1.2. The reduction occurs, not because R0 drops, but because the average infected person is less prone to interact with two other susceptible people.

The 13X difference tells us just how important it will be that containment measures such as social distancing, hand washing, and self-isolation in response to infection symptoms will be to achieve the low mortality rate projected by IHME. Think of herd immunity as how hard you press down on the accelerator of your car as it runs. Think of containment as how hard you press on the brake at the same time to stop the car from moving. A high R0 corresponds to pressing very hard on the accelerator. The 68,000 deaths correspond to the car moving forward but not racing. Strong containment measures correspond to keeping one’s foot firmly on the brake.

We are in a situation where the accelerator pedal is stuck. If we relax the foot on the brake, the car will race forward, meaning that the death rate from COVID-19 will soar; and remember R0 might be much higher than 2.2. That is why the Imperial College model suggests that U.S. deaths would be over 2 million in the absence of containment measures.

Is a forecast of 68,841 eventual deaths unrealistically optimistic? To answer this question, consider what transmission rate is associated with that number of deaths. It will be below 2.2, but how much below?

The answer is 1.04, barely above the threshold of 1.0 associated with a zero infection growth rate. That is a lot. Large parts of the U.S., that have yet to experience high rates of infection, has yet to adopt strong containment measures. The governors of some states are resistant, being more concerned about slowing their state economies. This low transmission rate is what gives me pause, leading me to believe that the risk is high that the country will be able to sustain behavior that produces a transmission rate around 1.04.2 I would also say that for the IMHE confidence intervals the associated transmission rates lie between 1.02 and 1.12.

A Naive Forecasting Model

For the purpose of comparison, I developed a naïve forecast of the eventual U.S. deaths attributable to COVID-19 by fitting a curve to the U.S daily mortality data, using a standard approach for epidemic modeling involving the Gompertz curve. Although the approach takes no account of herd immunity, it implicitly reflects the containment behaviors in place during the last month, because these behaviors are embodied within the data. The curve fitting approach is both fast and frugal.

Figure 1 below depicts the outcome of a curve fitting exercise done at the end of the day on April 16, 2020.

If the IHME forecasting model is statistically efficient, then we should not expect that a naïve curve fitting exercise will outperform the IHME model.

According to the IHME model, by August 4, 2020 the outbreak will come to a close, featuring a total of 68,814 deaths attributable to the virus. The associated confidence interval for this forecast is 30,188 to 175,963. With this in mind, I point out that the output from the naive curve fit on daily deaths from February 29 through April 16 produces an estimate of 338,754 total eventual deaths. This, I note, is greater than 175,963 which forms the upper end of the IHME’s forecast confidence interval, but is also less than the 900,000 which is associated with the lowest estimate for herd immunity.

A Simple Test of IHME Forecasting Accuracy

Applying the approach described above for computing R0, we can ascertain that the value of the containment transmission rate associated with 338,754 total deaths is 1.26. That is, the containment measures put in place to mitigate COVID-19 reduce the transmission rate from its initial level between 2.2 and 8.7, to a value of 1.26. The 1.26 value seems more plausible than a value hovering just above 1, an issue I mentioned above.

When August 4 arrives, it will be possible to compare the current IHME forecast of total deaths with the naïve model, to see which approach was the more accurate. However, as it happens, we need not wait that long to come to a conclusion, because going forward, we can make a series of informative comparisons along the way.

Recall from the above discussion that the IHME identifies the peak of U.S. daily deaths to have occurred on April 10. The naïve model predicts that the peak will not occur until May 22.3

Readers should feel free to compare the IMHE forecast of total deaths against the corresponding forecasts from the naïve model at a series of dates. Let me provide some milestones, meaning dates with associated forecasts.

For April 30, the naive model predicts total U.S. deaths from the beginning of the pandemic to April 30, to be 58,185. In contrast, the IMHE projects total deaths to be lower, 50,291, with a confidence interval of 28,042 to 109,208. As for the daily death rate, the Gompertz fit estimate for April 30 is 2,750. In contrast, the IMHE model projects daily death rate for April 30 to be 1,146 with confidence interval 249 to 3,421. In this regard, the forecast from the naïve model lies within the IMHE confidence interval.

For May 24, the naive model predicts total U.S. deaths from the beginning of the pandemic to May 24, to be 135,143. In contrast, the IMHE projects total deaths to be lower, 66,319, with a confidence interval of 30,138 to 165,473. In this regard, the forecast from the naïve model lies within the IMHE confidence interval. May 24 is an important date, because according to the IHME, total number of deaths will be more than 96% of the way to the eventual total death toll of 68,814.

I should mention that even if actual deaths were exactly predicted by the naïve model, the forecasts from the naïve model described above fall into the IHME’s confidence intervals. Therefore, it is possible to argue that the actual results fall into the confidence region generated by IHME. However, after June 6, the forecasts from the naïve model exceed the upper bounds of the IHME confidence intervals; and that defense is no longer available. For June 6, the IHME forecast is 68,573 with the upper end of the confidence interval being 174,687, while the naïve model forecast is 177,564.


It is possible for the eventual total number of deaths in the U.S. from COVID-19 to be about 68,000, as predicted by the IHME model. We should hope that this turns out to be the case. However, for that to happen, the country needs to reduce the transmission rate from above 2 to a hair above 1.0.

Doing so would be a tall order in the best of times. However, given the lack of political uniformity across the country, in respect to willingness to engage in sustained containment, this is not the best of times.

There is a distinct risk that the IHME forecasts of COVID-19 related deaths exhibit unrealistic optimism bias. This bias has many contributing factors, and I can only offer speculative remarks here.

Being familiar with the forecasting task at hand is one factor.

Desirability, otherwise known as wishful thinking is a second factor.

Becoming anchored on an estimate is a third factor. In this regard, the U.S. death rate for seasonal flu, at just under 35,000, might have served as an anchor: 68,000 is almost 70,000 which is twice as bad as 35,000.

Motivated reasoning is a fourth factor, whereby people attach undue weight to evidence that supports a position they hold, and correspondingly insufficient weight to evidence that does not. In this regard, Professor Mokdar has stated in interviews (cited above) that from the first, he and his team have thought that the total number of deaths would not exceed 100,000.

To test whether the IHME forecasts are too optimistic, readers can engage in the forecast comparisons describe above, at the specified milestone dates, checking to see whether the IHME model or the naïve model provides the more accurate forecast. I hope it is the IHME model that is more accurate, but fear that it will be the naïve model.

