top of page

Global Animal Disease Surveillance

Build an interactive dashboard visually showcasing well-curated results of an advanced exploratory analysis conducted in Python.

Project Overview ------------------------------------------------------------------------------------------------------------------------------------------------------------

Data - Open Source

The data contains information on livestock and wildlife and the potential diseases they carry and succumb too globally. Mild amounts of information record human disease contracted by animals, or deaths accounted for by these diseases.

​

Data source from Kaggle.com

Further Information can be found here.

Geojson files were obtained from geogjson-maps

​

Time-series data was obtained from Quandl.

   The data used in the time-series portion of this project     is not related to the original project.

Objective

Perform exploratory visual analysis in Python before identifying variables that are worth further exploration. Form hypothesis and perform advanced analytical approaches to test them and build a Tableau visual storyboard.

Tools

Anaconda - Python

Excel

Tableau

Skills

Exploratory Analysis through visualizations

  1. Scatterplots

  2. Correlation heatmaps

  3. Pair Plots

  4. ​Categorical Plots

​

  • Geospatial analysis using shapefile

  • Regression analysis

  • Cluster analysis

  • Time-Series analysis

​

Tableau Storyboard Visualization (interactive)

-------------------------------------------------------------------------- Process -------------

Understanding the Data

Data Profiling - Excel

Data Cleaning - Python/Excel

Quality Measures - Excel​

​

Data Exploration

Correlation Heatmaps

Choropleth Maps

K-means clusters

​

Data Visualization

Tableau Interactive Storyboard

​

Presentation

Utilizing Microsoft Word, Excel, and Tableau to present summarized data both as documentation and visualizations.

tableau-pic.jpeg

--------------------------------------------- Analysis -----------------------------------------

Questions

  1. What disease has the most cases

  2. ​What region has the most disease prevalence?

  3. How many humans succumb to diseases surveyed?

  4. What disease has the highest death rate in animals?

  5. Are wildlife or domesticated animals more commonly affected by disease?

tableau achievement 6 intro.png

Correlation Maps

Correlation maps helped to shape an understanding of how the data interacted with itself.

This image shows items closer to "1" are closely relying on each other.​​​​​​​​​​​

GADS Correlation with labels (python - screenshot).png
Choropleth Map 6.3 Screenshot.png

Choropleth Maps

These maps make use of showing geographic areas of interest. In this case where the sum of species at risk is lowest and highest.

K-Means Clustering without Potential Outliers.png

K-Means Clusters

Using Python K-Means cluster algorithms are applied to the data. This groups data into clustered using a centroid and stabilizes the cluster.

​

Looking at spieces cases and deaths the clusters in the "0" portion represent reports that have lower deaths associated with cases.

Final Report -----------------------------------------------------------------------------

Project Reflections

​

  • This project required extensive cleaning, renaming, addressing mix-type data, and the addition of geojson files in order to properly build an accurate choropleth map.

  • There were data limitation with the column titles, human data and the way the species were entered into the dataset.​

​

  • Column Titles:

    • The column titles did not come with additional explanation of their roles.

    • This caused some confusion and renaming was performed.

  • Human Data:

    • The human data is limited and not consistent.​

    • Mild insight is extracted but not reliable due to its inconsistency.

  • Data Entry:

    • Species description was entered as long lists within the excel spreadsheet.​

    • This required careful cleaning in order to determine species specific disease, cases or deaths. 

  • The JSON file created a challenge when trying to decipher the correct key to work with and when trying to sync it to the dataset.

​

​

Take Away​

  • The data provides fantastic insight on disease prevalence based on region and disease type.

  • Disease persistence in various animals was determine and the most common diseases that infect each species was identified.

  • Using the limited human data - it was determined what disease humans succumbed to most often in this survey. 

​

bottom of page