This report provides an in-depth analysis of the impact of various natural disasters, both globally and specific to Canada. The analysis is based on historical data from 1900 to 2010 and aims to offer insights that could inform the development and features of our team's mobile app for crisis response and management.
import pandas as pd
import matplotlib.pyplot as plt
# Filter warnings
from warnings import filterwarnings
filterwarnings('ignore')
# Load the dataset
raw_data = pd.read_csv('natural-disasters.csv')
# Set display options to show all columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# Display the first few rows of the dataset
raw_data.head()
| Entity | Year | Number of deaths from drought | Number of people injured from drought | Number of people affected from drought | Number of people left homeless from drought | Number of total people affected by drought | Reconstruction costs from drought | Insured damages against drought | Total economic damages from drought | Death rates from drought | Injury rates from drought | Number of people affected by drought per 100,000 | Homelessness rate from drought | Total number of people affected by drought per 100,000 | Number of deaths from earthquakes | Number of people injured from earthquakes | Number of people affected by earthquakes | Number of people left homeless from earthquakes | Number of total people affected by earthquakes | Reconstruction costs from earthquakes | Insured damages against earthquakes | Total economic damages from earthquakes | Death rates from earthquakes | Injury rates from earthquakes | Number of people affected by earthquakes per 100,000 | Homelessness rate from earthquakes | Total number of people affected by earthquakes per 100,000 | Number of deaths from disasters | Number of people injured from disasters | Number of people affected by disasters | Number of people left homeless from disasters | Number of total people affected by disasters | Reconstruction costs from disasters | Insured damages against disasters | Total economic damages from disasters | Death rates from disasters | Injury rates from disasters | Number of people affected by disasters per 100,000 | Homelessness rate from disasters | Total number of people affected by disasters per 100,000 | Number of deaths from volcanic activity | Number of people injured from volcanic activity | Number of people affected by volcanic activity | Number of people left homeless from volcanic activity | Number of total people affected by volcanic activity | Reconstruction costs from volcanic activity | Insured damages against volcanic activity | Total economic damages from volcanic activity | Death rates from volcanic activity | Injury rates from volcanic activity | Number of people affected by volcanic activity per 100,000 | Homelessness rate from volcanic activity | Total number of people affected by volcanic activity per 100,000 | Number of deaths from floods | Number of people injured from floods | Number of people affected by floods | Number of people left homeless from floods | Number of total people affected by floods | Reconstruction costs from floods | Insured damages against floods | Total economic damages from floods | Death rates from floods | Injury rates from floods | Number of people affected by floods per 100,000 | Homelessness rate from floods | Total number of people affected by floods per 100,000 | Number of deaths from mass movements | Number of people injured from mass movements | Number of people affected by mass movements | Number of people left homeless from mass movements | Number of total people affected by mass movements | Reconstruction costs from mass movements | Insured damages against mass movements | Total economic damages from mass movements | Death rates from mass movements | Injury rates from mass movements | Number of people affected by mass movements per 100,000 | Homelessness rate from mass movements | Total number of people affected by mass movements per 100,000 | Number of deaths from storms | Number of people injured from storms | Number of people affected by storms | Number of people left homeless from storms | Number of total people affected by storms | Reconstruction costs from storms | Insured damages against storms | Total economic damages from storms | Death rates from storms | Injury rates from storms | Number of people affected by storms per 100,000 | Homelessness rate from storms | Total number of people affected by storms per 100,000 | Number of deaths from landslides | Number of people injured from landslides | Number of people affected by landslides | Number of people left homeless from landslides | Number of total people affected by landslides | Reconstruction costs from landslides | Insured damages against landslides | Total economic damages from landslides | Death rates from landslides | Injury rates from landslides | Number of people affected by landslides per 100,000 | Homelessness rate from landslides | Total number of people affected by landslides per 100,000 | Number of deaths from fog | Number of people injured from fog | Number of people affected by fog | Number of people left homeless from fog | Number of total people affected by fog | Reconstruction costs from fog | Insured damages against fog | Total economic damages from fog | Death rates from fog | Injury rates from fog | Number of people affected by fog per 100,000 | Homelessness rate from fog | Total number of people affected by fog per 100,000 | Number of deaths from wildfires | Number of people injured from wildfires | Number of people affected by wildfires | Number of people left homeless from wildfires | Number of total people affected by wildfires | Reconstruction costs from wildfires | Insured damages against wildfires | Total economic damages from wildfires | Death rates from wildfires | Injury rates from wildfires | Number of people affected by wildfires per 100,000 | Homelessness rate from wildfires | Total number of people affected by wildfires per 100,000 | Number of deaths from extreme temperatures | Number of people injured from extreme temperatures | Number of people affected by extreme temperatures | Number of people left homeless from extreme temperatures | Number of total people affected by extreme temperatures | Reconstruction costs from extreme temperatures | Insured damages against extreme temperatures | Total economic damages from extreme temperatures | Death rates from extreme temperatures | Injury rates from extreme temperatures | Number of people affected by extreme temperatures per 100,000 | Homelessness rate from extreme temperatures | Total number of people affected by extreme temperatures per 100,000 | Number of deaths from glacial lake outbursts | Number of people injured from glacial lake outbursts | Number of people affected by glacial lake outbursts | Number of people left homeless from glacial lake outbursts | Number of total people affected by glacial lake outbursts | Reconstruction costs from glacial lake outbursts | Insured damages against glacial lake outbursts | Total economic damages from glacial lake outbursts | Death rates from glacial lake outbursts | Injury rates from glacial lake outbursts | Number of people affected by glacial lake outbursts per 100,000 | Homelessness rate from glacial lake outbursts | Total number of people affected by glacial lake outbursts per 100,000 | Total economic damages from disasters as a share of GDP | Total economic damages from drought as a share of GDP | Total economic damages from earthquakes as a share of GDP | Total economic damages from extreme temperatures as a share of GDP | Total economic damages from floods as a share of GDP | Total economic damages from landslides as a share of GDP | Total economic damages from mass movements as a share of GDP | Total economic damages from storms as a share of GDP | Total economic damages from volcanic activity as a share of GDP | Total economic damages from volcanic activity as a share of GDP.1 | deaths_rate_per_100k_storm | injured_rate_per_100k_storm | total_affected_rate_per_100k_all_disasters | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 1950 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.000000 | 0 | 0.000000 | 210.0 | 200.0 | 0.0 | 0.0 | 200.0 | NaN | NaN | NaN | 2.572748 | 2.381236 | 0.000000 | 0.000000 | 2.381236 | 215.1 | 200.0 | 0.0 | 0.0 | 200.0 | NaN | NaN | NaN | 2.633470 | 2.381236 | 0.000000 | 0.000000 | NaN | 0.0 | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.1 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | 0.060722 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000000 | 0.0 | 2.381236 |
| 1 | Afghanistan | 1960 | 0.0 | 0.0 | 4800.0 | 0 | 4800.0 | 0.0 | 0.0 | 20.0 | 0.0 | 0.0 | 44.060951 | 0 | 44.060951 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 10.7 | 0.0 | 4800.0 | 0.0 | 4800.0 | 0.0 | 0.0 | 20.0 | 0.112124 | 0.000000 | 44.060951 | 0.000000 | NaN | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 10.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.112124 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.001420 | 0.00142 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 44.060951 |
| 2 | Afghanistan | 1970 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.000000 | 6.1 | 1.5 | 9000.0 | 0.0 | 9001.5 | 0.0 | 0.0 | 0.0 | 0.047960 | 0.012722 | 69.535656 | 0.000000 | 69.548378 | 48.2 | 15.5 | 68404.4 | 750.0 | 69169.9 | 0.0 | 0.0 | 5200.0 | 0.391674 | 0.117661 | 541.290447 | 5.621767 | NaN | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 32.1 | 14.0 | 59404.4 | 750.0 | 60168.4 | 0.0 | 0.0 | 5200.0 | 0.256567 | 0.104940 | 471.754790 | 5.621767 | 477.481497 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 10.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.087146 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.157576 | 0.00000 | 0.0 | 0.0 | 0.157576 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 547.029875 |
| 3 | Afghanistan | 1980 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.000000 | 51.3 | 351.8 | 6244.0 | 658.0 | 7253.8 | 0.0 | 0.0 | 900.0 | 0.398499 | 2.742558 | 49.053053 | 5.248046 | 57.043657 | 58.3 | 351.8 | 25344.0 | 658.0 | 26353.8 | 0.0 | 0.0 | 26900.0 | 0.458817 | 2.742558 | 210.091255 | 5.248046 | NaN | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 19100.0 | 0.0 | 19100.0 | 0.0 | 0.0 | 26000.0 | 0.000000 | 0.000000 | 161.038202 | 0.000000 | 161.038202 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 7.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.060319 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.000000 | 0.00000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 218.081859 |
| 4 | Afghanistan | 1990 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.000000 | 742.6 | 358.3 | 27168.5 | 7702.5 | 35229.3 | 0.0 | 0.0 | 2001.0 | 3.814559 | 1.835906 | 146.665685 | 38.701410 | 187.203001 | 1038.9 | 394.7 | 43624.0 | 9578.5 | 53597.2 | 0.0 | 20.0 | 8401.0 | 5.830222 | 2.023235 | 263.136415 | 50.991165 | NaN | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 199.0 | 30.0 | 16435.5 | 1765.0 | 18230.5 | 0.0 | 20.0 | 6400.0 | 1.414652 | 0.151991 | 116.320342 | 11.596787 | 128.069120 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | NaN | 0.0 | 0.0 | 0.0 | 73.9 | 6.4 | 0.0 | 111.0 | 117.4 | 0.0 | 0.0 | 0.0 | 0.418517 | 0.035338 | 0.0 | 0.692968 | 0.728305 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 22.4 | 0.0 | 20.0 | 0.0 | 20.0 | 0.0 | 0.0 | 0.0 | 0.176172 | 0.0 | 0.150387 | 0.0 | 0.150387 | 0 | 0 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | 0.000000 | 0.00000 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.006322 | 0.0 | 316.150814 |
# Check the shape of the dataframe
raw_data.shape
(1604, 171)
We have 1604 rows and 171 columns in our dataset.
Below are the data cleaning steps performed on the dataset:
# Check for missing values
missing_values = raw_data.isnull().sum()
# Calculate the percentage of missing values for each column
missing_percentage = (missing_values / len(raw_data)) * 100
# Display columns with missing values and their corresponding percentage
missing_data = pd.DataFrame({'Missing Values': missing_values, 'Percentage': missing_percentage})
missing_data = missing_data[missing_data['Missing Values'] > 0].sort_values(by='Percentage', ascending=False)
missing_data
| Missing Values | Percentage | |
|---|---|---|
| Number of people affected by glacial lake outbursts per 100,000 | 1604 | 100.000000 |
| Death rates from glacial lake outbursts | 1604 | 100.000000 |
| Injury rates from storms | 1604 | 100.000000 |
| Death rates from storms | 1604 | 100.000000 |
| Total number of people affected by disasters per 100,000 | 1604 | 100.000000 |
| Total number of people affected by glacial lake outbursts per 100,000 | 1604 | 100.000000 |
| Homelessness rate from glacial lake outbursts | 1604 | 100.000000 |
| Injury rates from glacial lake outbursts | 1604 | 100.000000 |
| Insured damages against wildfires | 480 | 29.925187 |
| Total economic damages from wildfires | 480 | 29.925187 |
| Reconstruction costs from extreme temperatures | 480 | 29.925187 |
| Insured damages against extreme temperatures | 480 | 29.925187 |
| Total economic damages from extreme temperatures | 480 | 29.925187 |
| Reconstruction costs from glacial lake outbursts | 480 | 29.925187 |
| Insured damages against glacial lake outbursts | 480 | 29.925187 |
| Total economic damages from glacial lake outbursts | 480 | 29.925187 |
| Reconstruction costs from drought | 480 | 29.925187 |
| Total economic damages from fog | 480 | 29.925187 |
| Total economic damages from disasters as a share of GDP | 480 | 29.925187 |
| Total economic damages from drought as a share of GDP | 480 | 29.925187 |
| Total economic damages from earthquakes as a share of GDP | 480 | 29.925187 |
| Total economic damages from extreme temperatures as a share of GDP | 480 | 29.925187 |
| Total economic damages from floods as a share of GDP | 480 | 29.925187 |
| Total economic damages from landslides as a share of GDP | 480 | 29.925187 |
| Total economic damages from mass movements as a share of GDP | 480 | 29.925187 |
| Total economic damages from storms as a share of GDP | 480 | 29.925187 |
| Total economic damages from volcanic activity as a share of GDP | 480 | 29.925187 |
| Reconstruction costs from wildfires | 480 | 29.925187 |
| Reconstruction costs from fog | 480 | 29.925187 |
| Insured damages against fog | 480 | 29.925187 |
| Reconstruction costs from floods | 480 | 29.925187 |
| Total economic damages from drought | 480 | 29.925187 |
| Reconstruction costs from earthquakes | 480 | 29.925187 |
| Insured damages against earthquakes | 480 | 29.925187 |
| Total economic damages from earthquakes | 480 | 29.925187 |
| Reconstruction costs from disasters | 480 | 29.925187 |
| Insured damages against disasters | 480 | 29.925187 |
| Total economic damages from disasters | 480 | 29.925187 |
| Reconstruction costs from volcanic activity | 480 | 29.925187 |
| Insured damages against volcanic activity | 480 | 29.925187 |
| Total economic damages from volcanic activity | 480 | 29.925187 |
| Insured damages against floods | 480 | 29.925187 |
| Insured damages against drought | 480 | 29.925187 |
| Total economic damages from floods | 480 | 29.925187 |
| Reconstruction costs from mass movements | 480 | 29.925187 |
| Insured damages against mass movements | 480 | 29.925187 |
| Total economic damages from mass movements | 480 | 29.925187 |
| Reconstruction costs from storms | 480 | 29.925187 |
| Insured damages against storms | 480 | 29.925187 |
| Total economic damages from storms | 480 | 29.925187 |
| Reconstruction costs from landslides | 480 | 29.925187 |
| Insured damages against landslides | 480 | 29.925187 |
| Total economic damages from landslides | 480 | 29.925187 |
| Total economic damages from volcanic activity as a share of GDP.1 | 480 | 29.925187 |
For columns with 100% missing values, it's advisable to drop them as they don't add any value to the analysis.
# Drop columns with 100% missing values
data = raw_data.drop(columns=missing_data[missing_data['Percentage'] == 100].index)
# Re-check for missing values
missing_values = data.isnull().sum()
# Calculate the percentage of missing values for each column
missing_percentage = (missing_values / len(data)) * 100
# Display columns with missing values and their corresponding percentage
missing_data = pd.DataFrame({'Missing Values': missing_values, 'Percentage': missing_percentage})
missing_data = missing_data[missing_data['Missing Values'] > 0].sort_values(by='Percentage', ascending=False)
missing_data
| Missing Values | Percentage | |
|---|---|---|
| Reconstruction costs from drought | 480 | 29.925187 |
| Insured damages against glacial lake outbursts | 480 | 29.925187 |
| Insured damages against fog | 480 | 29.925187 |
| Total economic damages from fog | 480 | 29.925187 |
| Reconstruction costs from wildfires | 480 | 29.925187 |
| Insured damages against wildfires | 480 | 29.925187 |
| Total economic damages from wildfires | 480 | 29.925187 |
| Reconstruction costs from extreme temperatures | 480 | 29.925187 |
| Insured damages against extreme temperatures | 480 | 29.925187 |
| Total economic damages from extreme temperatures | 480 | 29.925187 |
| Reconstruction costs from glacial lake outbursts | 480 | 29.925187 |
| Total economic damages from glacial lake outbursts | 480 | 29.925187 |
| Insured damages against drought | 480 | 29.925187 |
| Total economic damages from disasters as a share of GDP | 480 | 29.925187 |
| Total economic damages from drought as a share of GDP | 480 | 29.925187 |
| Total economic damages from earthquakes as a share of GDP | 480 | 29.925187 |
| Total economic damages from extreme temperatures as a share of GDP | 480 | 29.925187 |
| Total economic damages from floods as a share of GDP | 480 | 29.925187 |
| Total economic damages from landslides as a share of GDP | 480 | 29.925187 |
| Total economic damages from mass movements as a share of GDP | 480 | 29.925187 |
| Total economic damages from storms as a share of GDP | 480 | 29.925187 |
| Total economic damages from volcanic activity as a share of GDP | 480 | 29.925187 |
| Reconstruction costs from fog | 480 | 29.925187 |
| Total economic damages from landslides | 480 | 29.925187 |
| Insured damages against landslides | 480 | 29.925187 |
| Reconstruction costs from landslides | 480 | 29.925187 |
| Total economic damages from drought | 480 | 29.925187 |
| Reconstruction costs from earthquakes | 480 | 29.925187 |
| Insured damages against earthquakes | 480 | 29.925187 |
| Total economic damages from earthquakes | 480 | 29.925187 |
| Reconstruction costs from disasters | 480 | 29.925187 |
| Insured damages against disasters | 480 | 29.925187 |
| Total economic damages from disasters | 480 | 29.925187 |
| Reconstruction costs from volcanic activity | 480 | 29.925187 |
| Insured damages against volcanic activity | 480 | 29.925187 |
| Total economic damages from volcanic activity | 480 | 29.925187 |
| Reconstruction costs from floods | 480 | 29.925187 |
| Insured damages against floods | 480 | 29.925187 |
| Total economic damages from floods | 480 | 29.925187 |
| Reconstruction costs from mass movements | 480 | 29.925187 |
| Insured damages against mass movements | 480 | 29.925187 |
| Total economic damages from mass movements | 480 | 29.925187 |
| Reconstruction costs from storms | 480 | 29.925187 |
| Insured damages against storms | 480 | 29.925187 |
| Total economic damages from storms | 480 | 29.925187 |
| Total economic damages from volcanic activity as a share of GDP.1 | 480 | 29.925187 |
No more columns with 100% missing values, great!
# Check for duplicate rows
duplicate_rows = data.duplicated().sum()
duplicate_rows
0
The dataset doesn't have any duplicate rows.
We begin by examining the global impact of four major types of natural disasters: earthquakes, floods, storms, and wildfires.
# Relevant columns for each disaster type
columns_of_interest = {
"earthquake": [
"Number of deaths from earthquakes",
"Number of people injured from earthquakes",
"Number of people left homeless from earthquakes",
"Number of total people affected by earthquakes"
],
"flood": [
"Number of deaths from floods",
"Number of people injured from floods",
"Number of people left homeless from floods",
"Number of total people affected by floods"
],
"storm": [
"Number of deaths from storms",
"Number of people injured from storms",
"Number of people left homeless from storms",
"Number of total people affected by storms"
],
"wildfire": [
"Number of deaths from wildfires",
"Number of people injured from wildfires",
"Number of people left homeless from wildfires",
"Number of total people affected by wildfires"
]
}
# Extract the relevant columns
filtered_data = data[["Entity", "Year"] + [col for sublist in columns_of_interest.values() for col in sublist]]
# Display the first few rows of the filtered dataset
filtered_data.head()
| Entity | Year | Number of deaths from earthquakes | Number of people injured from earthquakes | Number of people left homeless from earthquakes | Number of total people affected by earthquakes | Number of deaths from floods | Number of people injured from floods | Number of people left homeless from floods | Number of total people affected by floods | Number of deaths from storms | Number of people injured from storms | Number of people left homeless from storms | Number of total people affected by storms | Number of deaths from wildfires | Number of people injured from wildfires | Number of people left homeless from wildfires | Number of total people affected by wildfires | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 1950 | 210.0 | 200.0 | 0.0 | 200.0 | 5.1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1 | Afghanistan | 1960 | 0.0 | 0.0 | 0.0 | 0.0 | 10.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | Afghanistan | 1970 | 6.1 | 1.5 | 0.0 | 9001.5 | 32.1 | 14.0 | 750.0 | 60168.4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | Afghanistan | 1980 | 51.3 | 351.8 | 658.0 | 7253.8 | 0.0 | 0.0 | 0.0 | 19100.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | Afghanistan | 1990 | 742.6 | 358.3 | 7702.5 | 35229.3 | 199.0 | 30.0 | 1765.0 | 18230.5 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
# Aggregate the data for each disaster type over the years
global_aggregated_data = {}
for disaster, columns in columns_of_interest.items():
if columns:
global_aggregated_data[disaster] = filtered_data[columns].sum()
# Transform the aggregated data to the desired format
transformed_data = {}
for disaster, stats in global_aggregated_data.items():
transformed_data[disaster.capitalize()] = {
"Deaths": f"{int(stats[0]):,}",
"People injured": f"{int(stats[1]):,}",
"People left homeless": f"{int(stats[2]):,}",
"Total people affected": f"{int(stats[3]):,}"
}
# Convert the dictionary to a DataFrame for display
transformed_df = pd.DataFrame(transformed_data).transpose()
transformed_df.T
| Earthquake | Flood | Storm | Wildfire | |
|---|---|---|---|---|
| Deaths | 905,683 | 2,793,814 | 559,132 | 1,750 |
| People injured | 1,117,359 | 545,990 | 556,190 | 4,455 |
| People left homeless | 9,954,741 | 36,963,619 | 21,588,466 | 96,016 |
| Total people affected | 80,044,254 | 1,536,563,304 | 474,296,923 | 6,860,637 |
# Metric names to match the column names
metrics = {
"Number of deaths": [
"Number of deaths from earthquakes",
"Number of deaths from floods",
"Number of deaths from storms",
"Number of deaths from wildfires"
],
"Number of people injured": [
"Number of people injured from earthquakes",
"Number of people injured from floods",
"Number of people injured from storms",
"Number of people injured from wildfires"
],
"Number of people left homeless": [
"Number of people left homeless from earthquakes",
"Number of people left homeless from floods",
"Number of people left homeless from storms",
"Number of people left homeless from wildfires"
]
}
# Function to plot data for specified column and region (Global/Canada)
def plot_disaster_data(columns, region, title_suffix):
plt.figure(figsize=(14, 7))
if region == "Global":
data_to_plot = filtered_data.groupby('Year').sum()
else:
data_to_plot = canada_data.groupby('Year').sum()
# Disaster types
disasters = ["earthquake", "flood", "storm", "wildfire"]
# Plotting data for each disaster type
for i, disaster in enumerate(disasters):
plt.plot(data_to_plot.index, data_to_plot[columns[i]], label=disaster.capitalize())
# Labeling the graph
metric = columns[0].split(" from ")[0]
plt.title(f"{metric} by Different Natural Disasters {title_suffix}", fontweight='bold')
plt.ylabel(metric, fontweight='bold')
plt.xlabel('Year', fontweight='bold')
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.tight_layout()
plt.show()
# Plotting data globally
for metric, columns in metrics.items():
plot_disaster_data(columns, "Global", "Globally Over the Years")
# Define the bars (natural disasters) and associated colors
bars = ["earthquake", "flood", "storm", "wildfire"]
colors = ["red", "blue", "green", "orange"]
# Aggregate the data for each disaster type globally
global_heights = [
data['Number of total people affected by earthquakes'].sum(),
data['Number of total people affected by floods'].sum(),
data['Number of total people affected by storms'].sum(),
data['Number of total people affected by wildfires'].sum()
]
# Sorting the data from most to least affected for global data
sorted_indices_global = sorted(range(len(global_heights)), key=lambda k: global_heights[k], reverse=True)
sorted_heights_global = [global_heights[i] for i in sorted_indices_global]
sorted_bars_global = [bars[i] for i in sorted_indices_global]
sorted_colors_global = [colors[i] for i in sorted_indices_global]
# Determine the timeframe from the global data
timeframe = f"{filtered_data['Year'].min()} - {filtered_data['Year'].max()}"
# Plotting the sorted data for global data with adjusted title and labels
plt.figure(figsize=(12, 6))
bars_plot_global = plt.bar(sorted_bars_global, sorted_heights_global, color=sorted_colors_global)
# Adding labels with rounded numbers to the bars for global data
for bar in bars_plot_global:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, f"{int(yval):,}", ha='center', va='bottom', fontsize=10)
plt.title(f'Number of Total People Affected by Different Natural Disasters Globally ({timeframe})', fontweight='bold')
plt.ylabel('Number of Total People Affected', fontweight='bold')
plt.xlabel('Disaster Type', fontweight='bold')
plt.tight_layout()
plt.show()
This global perspective is essential to understand the broader context of natural disasters. However, for specific applications, it might be vital to delve deeper into regional or country-specific data, as the intensity and frequency of these disasters can vary significantly based on geographical factors.
Therefore, with a global perspective established, we now turn our focus to Canada.
Given our project's emphasis on providing a mobile app for Canadians, understanding the impact of natural disasters in Canada is crucial. We'll examine the trends over time, the aggregated impact, and highlight significant events.
# Filter the data for Canada
canada_data = filtered_data[filtered_data['Entity'] == 'Canada']
# Aggregate the data for each disaster type for Canada only
canada_aggregated_data = {}
for disaster, columns in columns_of_interest.items():
if columns:
canada_aggregated_data[disaster] = canada_data[columns].sum()
# Transform the aggregated data to the desired format with disaster types in columns
transformed_canada_data = {}
for disaster, stats in canada_aggregated_data.items():
transformed_canada_data[disaster.capitalize()] = {
"Deaths": f"{int(stats[0]):,}",
"People injured": f"{int(stats[1]):,}",
"People left homeless": f"{int(stats[2]):,}",
"Total people affected": f"{int(stats[3]):,}"
}
# Convert the dictionary to a DataFrame for display
transformed_df_canada = pd.DataFrame(transformed_canada_data)
transformed_df_canada
| Earthquake | Flood | Storm | Wildfire | |
|---|---|---|---|---|
| Deaths | 2 | 5 | 30 | 11 |
| People injured | 0 | 0 | 88 | 0 |
| People left homeless | 0 | 1,200 | 453 | 1,823 |
| Total people affected | 0 | 33,078 | 1,651 | 22,007 |
# Using the metrics and plot_disaster_data function defined in the previous code section for global data
# Plotting data for Canada
for metric, columns in metrics.items():
plot_disaster_data(columns, "Canada", "in Canada Over the Years")
# Regenerate the canada_aggregated_df dataframe
canada_data_filtered = data[data['Entity'] == 'Canada']
canada_aggregated_data = {
'flood': canada_data_filtered['Number of total people affected by floods'].sum(),
'storm': canada_data_filtered['Number of total people affected by storms'].sum(),
'wildfire': canada_data_filtered['Number of total people affected by wildfires'].sum()
}
canada_aggregated_df = pd.DataFrame.from_dict(canada_aggregated_data, orient='index', columns=['Total Affected'])
# Data for the bar graph
heights = [
canada_aggregated_df['Total Affected']['flood'],
canada_aggregated_df['Total Affected']['storm'],
canada_aggregated_df['Total Affected']['wildfire']
]
bars = ['Floods', 'Storms', 'Wildfires']
colors = ['blue', 'green', 'orange']
# Sorting the data from most to least affected
sorted_indices = sorted(range(len(heights)), key=lambda k: heights[k], reverse=True)
sorted_heights = [heights[i] for i in sorted_indices]
sorted_bars = [bars[i] for i in sorted_indices]
sorted_colors = [colors[i] for i in sorted_indices]
# Plotting the sorted data
plt.figure(figsize=(12, 6))
bars_plot = plt.bar(sorted_bars, sorted_heights, color=sorted_colors)
# Adding labels with rounded numbers to the bars
for bar in bars_plot:
yval = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2, yval, f"{int(yval):,}", ha='center', va='bottom', fontsize=10)
plt.title(f'Number of Total People Affected by Different Natural Disasters in Canada ({timeframe})', fontweight='bold')
plt.ylabel('Number of Total People Affected', fontweight='bold')
plt.xlabel('Disaster Type', fontweight='bold')
plt.tight_layout()
plt.show()
The bar chart provides an aggregated view of the total number of people affected by each disaster type in Canada:
These insights provide a comprehensive view of the impact of natural disasters in Canada, informing the development and features of the mobile app.
The analysis provides valuable insights into the impact of natural disasters, both globally and in Canada. Understanding these patterns and trends can inform strategies for crisis response and management. The data emphasizes the significance of floods and wildfires, especially in affecting large populations in Canada. This information is crucial for prioritizing resources, designing preventive measures, and developing responsive solutions such as our proposed mobile app.