Global Animal Disease Surveillance
Build an interactive dashboard visually showcasing well-curated results of an advanced exploratory analysis conducted in Python.
Project Overview ------------------------------------------------------------------------------------------------------------------------------------------------------------
Data - Open Source
The data contains information on livestock and wildlife and the potential diseases they carry and succumb too globally. Mild amounts of information record human disease contracted by animals, or deaths accounted for by these diseases.
​
Data source from Kaggle.com
Further Information can be found here.
Geojson files were obtained from geogjson-maps
​
Time-series data was obtained from Quandl.
The data used in the time-series portion of this project is not related to the original project.
Objective
Perform exploratory visual analysis in Python before identifying variables that are worth further exploration. Form hypothesis and perform advanced analytical approaches to test them and build a Tableau visual storyboard.
Tools
Anaconda - Python
Excel
Tableau
Skills
Exploratory Analysis through visualizations
-
Scatterplots
-
Correlation heatmaps
-
Pair Plots
-
​Categorical Plots
​
-
Geospatial analysis using shapefile
-
Regression analysis
-
Cluster analysis
-
Time-Series analysis
​
Tableau Storyboard Visualization (interactive)
-------------------------------------------------------------------------- Process -------------
Understanding the Data
Data Profiling - Excel
Data Cleaning - Python/Excel
Quality Measures - Excel​
​
Data Exploration
Correlation Heatmaps
Choropleth Maps
K-means clusters
​
Data Visualization
Tableau Interactive Storyboard
​
Presentation
Utilizing Microsoft Word, Excel, and Tableau to present summarized data both as documentation and visualizations.

--------------------------------------------- Analysis -----------------------------------------
Questions
-
What disease has the most cases
-
​What region has the most disease prevalence?
-
How many humans succumb to diseases surveyed?
-
What disease has the highest death rate in animals?
-
Are wildlife or domesticated animals more commonly affected by disease?

Correlation Maps
Correlation maps helped to shape an understanding of how the data interacted with itself.
This image shows items closer to "1" are closely relying on each other.​​​​​​​​​​​


Choropleth Maps
These maps make use of showing geographic areas of interest. In this case where the sum of species at risk is lowest and highest.

K-Means Clusters
Using Python K-Means cluster algorithms are applied to the data. This groups data into clustered using a centroid and stabilizes the cluster.
​
Looking at spieces cases and deaths the clusters in the "0" portion represent reports that have lower deaths associated with cases.
Final Report -----------------------------------------------------------------------------
Project Reflections
​
-
This project required extensive cleaning, renaming, addressing mix-type data, and the addition of geojson files in order to properly build an accurate choropleth map.
-
There were data limitation with the column titles, human data and the way the species were entered into the dataset.​
​
-
Column Titles:
-
The column titles did not come with additional explanation of their roles.
-
This caused some confusion and renaming was performed.
-
-
Human Data:
-
The human data is limited and not consistent.​
-
Mild insight is extracted but not reliable due to its inconsistency.
-
-
Data Entry:
-
Species description was entered as long lists within the excel spreadsheet.​
-
This required careful cleaning in order to determine species specific disease, cases or deaths.
-
-
The JSON file created a challenge when trying to decipher the correct key to work with and when trying to sync it to the dataset.
​
​
Take Away​
-
The data provides fantastic insight on disease prevalence based on region and disease type.
-
Disease persistence in various animals was determine and the most common diseases that infect each species was identified.
-
Using the limited human data - it was determined what disease humans succumbed to most often in this survey.
​
Deliverables
.​
​
