Skip to content

Using google search trends and machine learning to predict emergency department visits

Notifications You must be signed in to change notification settings

nickmmark/ED-visit-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ED-visit-prediction

Using Google search trends and local weather to predict patient visits to local emergency departments I conceived of this shortly after my toddler accidentally gave me a corneal abrasion; one of the first things I did was google my local ED for driving directions. This led me to wonder, how many ED walk ins are preceeded by people doing the same? Can the volume of searches for ED's be used to predict visits in the near term?

Background

Emergency Departments exhibit significant hourly, daily, and seasonal variability in the number of new patients arriving. The volume of patients who visit emergency departments is highly variable in part because numerous environmental factors (weather, local events, traffic, etc) can effect the onset of illness/injury and influence the decision of patients to seek care the ED. Being able to accurately predict patient arrivals at emergency departments is useful because it can be used to optimize staffing and resource availability to minimize patient waiting times and maintain optimal staffing ratios.

I explored how publicly available data, such as Google Searches and local weather, can be used to predict the number of patients arriving at local emergency department arrivals.

Google Trends has been used as a near realtime method of predicting interest; for example GT has been used to predict sales of retail, automotive, and home sales , to forecast stock market changes, to predict changes in cryptocurrency prices, to to predict political election results.

In the medical arena, Google Trends was used for many years to estimate influenza activity in more than 25 countries, potentially predicting the onset of outbreaks by up to 10 days.

Google Trends predicts ED arrivals

First I compared the pattern of searches for "hospital" and the pattern of ED arrivals in a publically available database. (MIMIC III is a good place to start for those interested in doing this work themeselves; for those interested in requesting access see here). I observed that the same diurnal pattern in ED visits was also noted in GT searches:

pattern of ED visits and Google searches

These diurnal patterns exist for all hospitals, however they are specific to the individual hospital and importantly can predict the arrivals at that hospital. For example if I search for three local hospitals in the Seattle area: local hospitals autocorrelation

To gather the data I used two R packages:

  • gtrendsR which is a "pseudo-APi" that pulls google trends data as specified. Note that there are limitations imposed on this (max of 4000 searches per day, so creative solutions (using a VPN) can be helpful to build a large historical dataset for model training.
  • rwunderground which uses the Weather Underground API to pull historical or current weather data. There are free API keys available, however I recommend a paid subscription.

To pull the Google Trends data for a particular hospital you can use the following code:

# load library
library(gtrendsR)

# connect to google (optional)
username <- "<gmail id>"
password <- "<password>"
gconnect(username,password)

# define search parameters
search_word="hospital name"
time_range="2018-06-01T01 2018-06-02T12"
geo_range="US-WA"

# perform search
google.trends = gtrends(c(search_word), gprop = "web", time = time_range, geo = geo_range)[[1]]
google.trends = dcast(google.trends, date ~ keyword + geo, value.var = "hits")
rownames(google.trends) = google.trends$date
google.trends$date = NULL

# export results
downloadDir="filename"
setwd(downloadDir)
write.csv(google.trends, file = "MyData.csv")

This API can provide data at different levels of granularity, depending on the time scale. For predicting ED arrivals in the next hour we need to pick an appropriate time scale. For example we could choose between these time scales: search for 'emergency room' at different time scales

Using the above code, with a simple for loop to automate different searches and string the results together, I pulled an hourly dataset covering a month. By aligning the GT searches and the NEXT hour's ED arrivals we can build a database and perform some basic analysis. It is straightforward to do this R:

library(Hmisc)
rcorr(GTandArrivals, type="pearson")

You can graph the results as a correlogram using

library(corrgram)
corrgram(GTandArrivals, order=TRUE, lower.panel=panel.shade,
  upper.panel=panel.pie, text.panel=panel.txt,
  main="Last hour Google Searches and ED arrivals")

I made a slightly nicer looking figure using GraphPad Prism. As you can see below, even as a single variable, Google Searches for the name of a specific hospital predicts the number of arrivals there in the next hour. searches in the last 30 minutes predict the arrivals in the next 30 minutes If we use the (admittedly arbitrary) number of searches >= 35/hr as cutoff it does a reasonable job of predicting high or low volume over the next hour. It is important to recognize that a lot of this is just due to similar diurnal patterns in both GT searches and ED arrivals, but imagine how this data point could be combined with other temporal and environment features to build an even better predictive model.

Local weather predicts ED arrivals

Anyone who's every worked in an ED knows that extreme weather (blizzard, torrential rain, etc) often "keeps people home." I posited that incorporating real-time weather information can help predict ED arrivals for the immediate future.

Parsing 911 dispatched to predict ED arrivals

Another data signature that may preceed an ED visit is a call to 911 and the dispatch on an ambulance. When I worked in the Harborview ED as IM Resident, I would always keep the Seattle Fire Realtime 911 dispatch window open for situational awareness about EMS activity. This was a great way to know (before the radio call) about serious medical emergencies like cardiac arrest, overdoses, difficulty breathing, etc.

  • Caveat: Most cities don't provide EMS dispatch in real-time like Seattle.

My Hypothesis is that by combining Google searches (for lower acuity emergencies) and 911 dispatches (for higher acuity) we can build a more robust ED arrivals prediction.

Version/To-do

[] explore other publically available API data: traffic, social media posts (such as with the Rtweet twitter API), etc

[] Demonstrate how to build more sophisticated models that use time, weather, and GT searches to predict the next hours ED volume

[] In the future I would love to combine this with the work I did with the Seattle Fire realtime 911 API and geospatial exporation of out of hospital cardiac arrest.

References

About

Using google search trends and machine learning to predict emergency department visits

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages