Coronavirus forecast

I’ve made a shiny app that gives a ten-day forecast, by country, on likely numbers of coronavirus cases.

The app is designed to give people a sense of how fast this epidemic is progressing, as well as one of the key uncertainties; the true number of cases.

At last update (23 March 2020), it is very impressive to see how things are progressing in China, Korea, and Japan, and quite alarming to see how things are progressing elsewhere.

The top graph gives the raw number of cases each day, with a ten-day projection. The projection is based on the bottom graph, which are the same data plotted on a log scale: exponential growth presents as a straight line on the log scale. So I fit a straight line to the last ten days of data, extrapolate it by ten days, and project that up onto the original scale in the top graph.

If your country is missing from the app, it is because you have fewer than 30 active cases. As case number breaches this threshold, your country will appear.

Raw/Active cases

Raw cases is the total number of cases reported to the WHO. As of 23 March, collation of data on recoveries ceased. As of 23 March, active cases is raw cases, minus reported deaths, minus approximate recoveries. To approximate the number of recoveries, I assume it takes 22 days for recovery, so Active cases is raw cases minus raw cases 22 days ago (corrected for deaths), minus deaths.

Doubling time

I estimate the instantaneous growth rate over the last ten days, r, using simple regression on the log scale and calculate doubling time as ln(2)/r.

Detection

I have also made a (very) rough estimate of the proportion of cases that are being detected in each country. This is very rough (have I said “very” enough?), and there are often not many data. The method assumes that there is community transmission, deaths do not go unnoticed, that the case fatality rate of symptomatic (infective) people is about 3.3% and it takes about 17 days for people that are going to die to die. Under these assumptions we look at the number of deaths in a five day period, and estimate the number of symptomatic infections required to generate these deaths (expected = deaths/0.033), we compare that to the number of new cases detected in the five day period 17 days earlier (observed), and use observed/expected to estimate a detection probability. Please take this number with a big dose of salt, but it does give you some indication of how good/bad it might be in each country. For some countries there are insufficient data to even make this estimate.

Curve flattening

The goal of every community in this pandemic should be to flatten the pandemic curve. I’ve made an index so you can see how well your country is doing. This index takes the function of log(active cases) against time and calculates the slope of that function as well as the change in slope. The index is calculated as the change in slope divided by the absolute value of slope (so the change you are achieving at any time is relative to steepness of the slope at that time). I then multiply the index by negative one, so positive numbers are good, negative numbers are bad.

Growth Rate

The per day growth rate is calculated as the change in number of infected people between day t and day t-1, divided by the number infected on day t-1. This gives a per-infection per day growth rate.

This is a work in progress

The code is now up on github. Anyone that would like to help make this app better (particularly the detection estimation), please get in contact. This is all correlative at present, no mechanism anywhere; would be great to work SIR into the backend (which will take time, and the current trajectories are sufficiently concerning as to warrant speed at this time). Alex Perkin’s group at U Notre Dame have built such a model for the US, and (as 0f 14 March) our estimate of the number of undiagnosed cases in the US (about 100,000) was eerily similar, and alarming. Alison Hill at Harvard has built a lovely Shiny app with an SEIR model, and we are working to push data to her interface now.

The app is based off data collated by the team at John Hopkins University for their really excellent global coronavirus tracker. In large countries with many states/provinces people will be wanting more localised reporting. The John Hopkins data reports down to State/Province level, so my code would be a great place to start for anyone wanting to develop country-specific apps.

Acknowledgements

Various people on twitter have made suggestions and caught bugs along the way. Thanks for the feedback. Matteo Tomasini (Michigan State University) and Daniel Bolnick (University of Connecticut) are helping with the code, data and ideas.

Join the Conversation

38 Comments

  1. Thanks a lot for this work Ben. Really useful. I shared it a lot around in France and hopefully it contributed – a bit – to the political decision to close schools and universities on monday and sine die. They should have made that decision a few days ago, but things are moving forward. I think they will have to take even more important decisions on movement restrictions (e.g. as in Belgium where they have just banned bars and restaurants). To be continued.
    Take care!
    Kind regards
    Denis Bourguet

  2. Ben,
    Can you also share your mathematical equations? That would be really helpful how you are reaching out to these conclusions.
    Great work, thanks!

  3. Hi Ben (an all readers)

    Please take into account that death rates vary highly across ages and country death rates depend not only on detention levels but on the people who got affected.

    Whereas in Italy and Spain about 1/3 of affected people is over 70, only 1/10 of infected people in China is above that age. Contrary, in South Korea, the most affected group is population between 20 and 29 years old (Below link in Spanish)

    Personally, I would not share a prediction on number of potential cases based on a estimator that is that weak as the one presented here. As data scientist we also have a responsability here. And even if Ben wrote adequately that it is a bad estimator, this kind of predictions might spread fast and scare people who not always would read that carefully the blog description (and who even reading it might not always the statistic skills and / or information required to really understand the meaning of the numbers)

    In any case, thanks ben for you illustrative example. I hope that once there is a shares and homogeneous database at the micro level which enable us to better understand what’s going on.

    Yours. Pablo.

    https://www.eldiario.es/sociedad/fallecidos-coronavirus-Espana-anos_0_1008599338.html

    1. “If you need to be right before you move, you never win. Perfect is the enemy of what is right when it comes to emergency management. Speed trumps perfection.” – WHO Executive Director Dr. Michael Ryan

      Data scientists also have a responsibility to let people know that their house is on fire, before it burns to the ground.

      Yes, the estimator could be improved, but that takes time and data. Data in this case equals deaths, and those deaths are doubling in number about every three days across much of the globe. If these numbers alarm people, so be it.

      My method is ballpark correct. When there are more deaths than there should be given the confirmed cases, we are either missing a lot of cases, or the death rate is very much higher than we think. Let’s say we have 10 deaths today, and 17 days ago we detected 10 new cases. We either conclude that 100% of people die from infection, or that the case fatality rate is about 3.3% and there were actually about 300 cases 17 days ago, and we only noticed ten of them. Yes, I am glossing over all manner of uncertainty, and (without accounting for that uncertainty) wouldn’t get this method published in a scientific journal, but as a heuristic in a time of crisis, it suffices.

    2. …though I just noticed that I had a line on the front page warning people that this was a rough estimate, and it’s not there any more. I’ll fix that now.

      1. When one of the fundamental inputs to your estimation is a naive death rate you are literally guessing. Dont pass it off as science, be responsible.

      2. I didn’t say I had the answer, I said your information was significantly flawed. Be respsonible with the information you are providing is what I suggested. Science is about being accurate, not posting your first attempt on the internet.

      3. Thanks for the lecture on what science is, Robert. We’re trying. I have finally had a chance to write the methods down. They’ll be up soon (with all the assumptions listed). I am painfully aware that detection estimate is not perfect, but also how important it is that we generate some idea of how poor our detection is in each country. Our estimate is MUCH better than complete ignorance.

  4. Great app. Over plotting data for different countries would be great addition. I can help with that, I started working on my own app to do that.

  5. Hi Ben, I can see you’re using information we have now about how the number of cases is multiplying in each country. I was surprised that Germany was predicted to be so high in 10 days compared to some other countries. Were there any special or extra factors that influenced the prediction in Germany?

  6. Thanks Ben, I useful and most importantly transparent framework. I had been doing my own work with active cases and wondered whether you dropped the data because it was no longer available or no longer of good quality. I notice JHU still has data on recoveries so you could impute active cases…

  7. We all appreciate the Scientific Community coming together and providing information to the world. You’re helping remove the unknowns or at least helping us all to understand them. At the end of the day it’s up to each individual to make their own decisions with respect to personal safety based on data collected by them. This is really helpful as one of those data points. Thank you to all of you for sharing your work.

  8. This is a very useful app. Thank you for sharing this to the world. You are a big help to us. May I know how often do you update the numbers?

  9. Hello, I like your website, but I wonder why you state “Positive values are good, and China is an excellent reference series” and this curve for China is negative… I’ll appreciate your reply.

    1. This is a good point, and I’ve been noticing the same thing. The words were written ten days ago, and things have changed. Have a look at China on the growth chart. Their negative growth has been trending towards positive of late. They are still doing really well, but not as well as they were. The CFI is a tricky measure though: I may retire it soon and focus on improving plots of growth rate over time. These actually tell a clearer story than the curve-flattening index (see China on the current growth plot).

  10. Ben
    Thanks for the effort you have put into this, it is a brilliant visual presentation of the state of play and what might eventuate. I actually have provided a link to this (as well as government links) on the Fishing Victoria forum. We need to get the info out to the average Jo so they can comprehend the immensity of the situation.

    While I understand some of the issues with CFI it might still be useful to retain. Perhaps a clear explanation for the lay person of the interaction between growth and flattening might help.

  11. HI. Excellent app. Is it possible to link this to world o meter page so that when they update their numbers this will also change? The raw cases are out dated by one day in Asian countries such as the Philippines
    Thank you

  12. “Take this last number with a grain of salt; it is rough. But low detection indicates that there are many more deaths in the country than there should be given reported case numbers (so there must be more cases than are reported). Active cases are total number of infections minus deaths and recoveries.”

    Not sure how you come to this conclusion about misreported deaths. In small countries like mine (Greece), I’m pretty positive almost all deaths related to COVID-19 are attributed to it. People usually end up in intensive care first, then die, so they’ve been tested in the process. For any sudden deaths, they also do the testing if they can’t find other cause (and as a precaution before anatomy of the corpse etc.)

    1. Hi George. On the contrary, I assume that deaths are not misreported; that all deaths are detected. This is how we estimate how many cases there must have been about 17 days ago (as opposed to the number of cases that were reported 17 days ago).

  13. That’s great to see, Vipin. Nice job! You might have noticed that we have recently integrated state-level drill down into the main app, but we didn’t have state-level data for India. Perhaps we should feed your data pipeline into the main app?

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: