Disaster impacts surveillance from social media with topic modeling and feature extraction: case of Hurricane Harvey
This study analyzes the content of Twitter data collected during Hurricane Harvey to identify the data of the highest relevance for assessing the impacts on infrastructure through automatically grouping the tweets by topics of discussion. More specifically, the researchers aimed to answer three research questions: (1) What are the common themes of discussion on Twitter during a major disaster, and do they contain infrastructure-related information? (2) How does the volume of tweets in each of the topics related to infrastructure impacts change over the course of the disaster response? (3) Does the spatial pattern of the locations of infrastructure-related tweets correlate with other measurements of real-world phenomena, such as flood depth, distributed disaster aid, or population density?
Through a series of filtering by keywords and geographic information and applying latent Dirichlet allocation modeling, we identified 24 topics that dominated Twitter during Hurricane Harvey. Among these topics, nine of them were of interest to this study. To answer the first research question, the researchers found that the nine infrastructure-related themes were (1) urban flooding and needs for rescue vehicles; (2) impacts to coastal areas; (3) overflowing waterbodies and associated evacuations; (4) impacts to roads, highways, and airports; (5) personal vehicle impacts and road accidents; (6) impacts to multiunit housing; (6) shortages of gas and supplies; (7) personal property damage; (8) insurance claims; and (9) prolonged power, cell, and Internet outages. To answer the second question, they found that the relevance of the topics changed over time, with shortages of gas and supplies discussed primarily before the landfall, various damage impact topics during the active flooding phase, and the property damage and insurance claims gaining traction after the initial impacts dissipated. As for the third research question, the authors found significant correlations between the number of infrastructure related tweets and population density, whereas correlations with flood depth or disaster aid were not significant.