I’ve made a shiny app that gives a ten-day forecast, by country, on likely numbers of coronavirus cases.
The app is designed to give people a sense of how fast this epidemic is progressing, as well as one of the key uncertainties; the true number of cases.
At last update (23 March 2020), it is very impressive to see how things are progressing in China, Korea, and Japan, and quite alarming to see how things are progressing elsewhere.
The top graph gives the raw number of cases each day, with a ten-day projection. The projection is based on the bottom graph, which are the same data plotted on a log scale: exponential growth presents as a straight line on the log scale. So I fit a straight line to the last ten days of data, extrapolate it by ten days, and project that up onto the original scale in the top graph.
If your country is missing from the app, it is because you have fewer than 30 active cases. As case number breaches this threshold, your country will appear.
Raw cases is the total number of cases reported to the WHO. As of 23 March, collation of data on recoveries ceased. As of 23 March, active cases is raw cases, minus reported deaths, minus approximate recoveries. To approximate the number of recoveries, I assume it takes 22 days for recovery, so Active cases is raw cases minus raw cases 22 days ago (corrected for deaths), minus deaths.
I estimate the instantaneous growth rate over the last ten days, r, using simple regression on the log scale and calculate doubling time as ln(2)/r.
I have also made a (very) rough estimate of the proportion of cases that are being detected in each country. This is very rough (have I said “very” enough?), and there are often not many data. The method assumes that there is community transmission, deaths do not go unnoticed, that the case fatality rate of symptomatic (infective) people is about 3.3% and it takes about 17 days for people that are going to die to die. Under these assumptions we look at the number of deaths in a five day period, and estimate the number of symptomatic infections required to generate these deaths (expected = deaths/0.033), we compare that to the number of new cases detected in the five day period 17 days earlier (observed), and use observed/expected to estimate a detection probability. Please take this number with a big dose of salt, but it does give you some indication of how good/bad it might be in each country. For some countries there are insufficient data to even make this estimate.
The goal of every community in this pandemic should be to flatten the pandemic curve. I’ve made an index so you can see how well your country is doing. This index takes the function of log(active cases) against time and calculates the slope of that function as well as the change in slope. The index is calculated as the change in slope divided by the absolute value of slope (so the change you are achieving at any time is relative to steepness of the slope at that time). I then multiply the index by negative one, so positive numbers are good, negative numbers are bad.
The per day growth rate is calculated as the change in number of infected people between day t and day t-1, divided by the number infected on day t-1. This gives a per-infection per day growth rate.
This is a work in progress
The code is now up on github. Anyone that would like to help make this app better (particularly the detection estimation), please get in contact. This is all correlative at present, no mechanism anywhere; would be great to work SIR into the backend (which will take time, and the current trajectories are sufficiently concerning as to warrant speed at this time). Alex Perkin’s group at U Notre Dame have built such a model for the US, and (as 0f 14 March) our estimate of the number of undiagnosed cases in the US (about 100,000) was eerily similar, and alarming. Alison Hill at Harvard has built a lovely Shiny app with an SEIR model, and we are working to push data to her interface now.
The app is based off data collated by the team at John Hopkins University for their really excellent global coronavirus tracker. In large countries with many states/provinces people will be wanting more localised reporting. The John Hopkins data reports down to State/Province level, so my code would be a great place to start for anyone wanting to develop country-specific apps.
Various people on twitter have made suggestions and caught bugs along the way. Thanks for the feedback. Matteo Tomasini (Michigan State University) and Daniel Bolnick (University of Connecticut) are helping with the code, data and ideas.
Thanks a lot for this work Ben. Really useful. I shared it a lot around in France and hopefully it contributed – a bit – to the political decision to close schools and universities on monday and sine die. They should have made that decision a few days ago, but things are moving forward. I think they will have to take even more important decisions on movement restrictions (e.g. as in Belgium where they have just banned bars and restaurants). To be continued.
Thanks Denis. Glad it was useful!
Thanks Ben ! …please disregard obvious error
We should thank Denis also. He (co-) created Peer Community In… Which (if you are interested in open science) is a very fine thing: https://evolbiol.peercommunityin.org/about/help_generic
Ben, any chance you can put some grid lines, or put the axis on the right hand side?
Working to put hover-reporting functionality in at the moment. I can add an axis on LHS fairly easily though: will add to the next deployment.
Can you also share your mathematical equations? That would be really helpful how you are reaching out to these conclusions.
Great work, thanks!
I will try to get that done today.
I’m a time optimist. They are up there now. Apologies for the delay.
Hi Ben (an all readers)
Please take into account that death rates vary highly across ages and country death rates depend not only on detention levels but on the people who got affected.
Whereas in Italy and Spain about 1/3 of affected people is over 70, only 1/10 of infected people in China is above that age. Contrary, in South Korea, the most affected group is population between 20 and 29 years old (Below link in Spanish)
Personally, I would not share a prediction on number of potential cases based on a estimator that is that weak as the one presented here. As data scientist we also have a responsability here. And even if Ben wrote adequately that it is a bad estimator, this kind of predictions might spread fast and scare people who not always would read that carefully the blog description (and who even reading it might not always the statistic skills and / or information required to really understand the meaning of the numbers)
In any case, thanks ben for you illustrative example. I hope that once there is a shares and homogeneous database at the micro level which enable us to better understand what’s going on.
“If you need to be right before you move, you never win. Perfect is the enemy of what is right when it comes to emergency management. Speed trumps perfection.” – WHO Executive Director Dr. Michael Ryan
Data scientists also have a responsibility to let people know that their house is on fire, before it burns to the ground.
Yes, the estimator could be improved, but that takes time and data. Data in this case equals deaths, and those deaths are doubling in number about every three days across much of the globe. If these numbers alarm people, so be it.
My method is ballpark correct. When there are more deaths than there should be given the confirmed cases, we are either missing a lot of cases, or the death rate is very much higher than we think. Let’s say we have 10 deaths today, and 17 days ago we detected 10 new cases. We either conclude that 100% of people die from infection, or that the case fatality rate is about 3.3% and there were actually about 300 cases 17 days ago, and we only noticed ten of them. Yes, I am glossing over all manner of uncertainty, and (without accounting for that uncertainty) wouldn’t get this method published in a scientific journal, but as a heuristic in a time of crisis, it suffices.
…though I just noticed that I had a line on the front page warning people that this was a rough estimate, and it’s not there any more. I’ll fix that now.
When one of the fundamental inputs to your estimation is a naive death rate you are literally guessing. Dont pass it off as science, be responsible.
I completely disagree with you. Feel free to propose a better method.
I didn’t say I had the answer, I said your information was significantly flawed. Be respsonible with the information you are providing is what I suggested. Science is about being accurate, not posting your first attempt on the internet.
Thanks for the lecture on what science is, Robert. We’re trying. I have finally had a chance to write the methods down. They’ll be up soon (with all the assumptions listed). I am painfully aware that detection estimate is not perfect, but also how important it is that we generate some idea of how poor our detection is in each country. Our estimate is MUCH better than complete ignorance.
Great app. Over plotting data for different countries would be great addition. I can help with that, I started working on my own app to do that.
Thanks Pawel. So many possibilities, so little time! Let me know how you go.
Hi Ben, I can see you’re using information we have now about how the number of cases is multiplying in each country. I was surprised that Germany was predicted to be so high in 10 days compared to some other countries. Were there any special or extra factors that influenced the prediction in Germany?
Hi Janine. Nothing special. All countries are treated identically. It is simply an extrapolation from recent growth rates.
Thanks Ben, I useful and most importantly transparent framework. I had been doing my own work with active cases and wondered whether you dropped the data because it was no longer available or no longer of good quality. I notice JHU still has data on recoveries so you could impute active cases…
Hi Hamish. Thanks. JHU are dropping reporting on recoveries in the next few days. No-one has any time to track them. https://github.com/CSSEGISandData/COVID-19/issues/1250
We all appreciate the Scientific Community coming together and providing information to the world. You’re helping remove the unknowns or at least helping us all to understand them. At the end of the day it’s up to each individual to make their own decisions with respect to personal safety based on data collected by them. This is really helpful as one of those data points. Thank you to all of you for sharing your work.
This is a very useful app. Thank you for sharing this to the world. You are a big help to us. May I know how often do you update the numbers?
Thanks Coly. The good people at JHU update the numbers every 24 hours. The app updates soon thereafter.
Hello, I like your website, but I wonder why you state “Positive values are good, and China is an excellent reference series” and this curve for China is negative… I’ll appreciate your reply.
This is a good point, and I’ve been noticing the same thing. The words were written ten days ago, and things have changed. Have a look at China on the growth chart. Their negative growth has been trending towards positive of late. They are still doing really well, but not as well as they were. The CFI is a tricky measure though: I may retire it soon and focus on improving plots of growth rate over time. These actually tell a clearer story than the curve-flattening index (see China on the current growth plot).
Thanks for the effort you have put into this, it is a brilliant visual presentation of the state of play and what might eventuate. I actually have provided a link to this (as well as government links) on the Fishing Victoria forum. We need to get the info out to the average Jo so they can comprehend the immensity of the situation.
While I understand some of the issues with CFI it might still be useful to retain. Perhaps a clear explanation for the lay person of the interaction between growth and flattening might help.
Thanks Will. I think I’ll put growth and cfi on the same tab, with some explanatory text as you suggested.
HI. Excellent app. Is it possible to link this to world o meter page so that when they update their numbers this will also change? The raw cases are out dated by one day in Asian countries such as the Philippines
Thanks. I’ll look into it. There are a few data updating issues which I hope to resolve in the next few days.
“Take this last number with a grain of salt; it is rough. But low detection indicates that there are many more deaths in the country than there should be given reported case numbers (so there must be more cases than are reported). Active cases are total number of infections minus deaths and recoveries.”
Not sure how you come to this conclusion about misreported deaths. In small countries like mine (Greece), I’m pretty positive almost all deaths related to COVID-19 are attributed to it. People usually end up in intensive care first, then die, so they’ve been tested in the process. For any sudden deaths, they also do the testing if they can’t find other cause (and as a precaution before anatomy of the corpse etc.)
Hi George. On the contrary, I assume that deaths are not misreported; that all deaths are detected. This is how we estimate how many cases there must have been about 17 days ago (as opposed to the number of cases that were reported 17 days ago).
We used your app/code and tailored it to use the data from India (for its states). We are still learning the other details so that we can implement those local parameters affecting the COVID19 spread.
We have hosted it at: https://puehep.shinyapps.io/IndStates/
That’s great to see, Vipin. Nice job! You might have noticed that we have recently integrated state-level drill down into the main app, but we didn’t have state-level data for India. Perhaps we should feed your data pipeline into the main app?
Leave a comment