Ola Sanusi, PhD

Logo



Educator | Data Scientist | Researcher

View My LinkedIn Profile

View My GitHub Profile

Data Science Projects


Summary of some top projects completed using Python and R including Exploratory Data Analysis (EDA), Time Series analysis, and Natural Language Processing (NLP).

Title:
Fake News Detection using ML Classifiers *Natural Language Processing (NLP)*

ML classifiers

Description:

This project focused on using supervised ML classifiers to detect patterns in fake news articles. A very large fake news corpus was subsetted to obtain a raw dataset that consist of a balanced distribution of fake and real news articles (approximately 1 million instances each)

Technologies:

Pandas, NLTK, Gensim, Scikit-Learn, Keras, Yellowbrick, Regular Expression, Seaborn

Check out full codes and notebook on github


Title:
Analysis of Crime Types in Baltimore City, MD *Exploratory Data Analysis (EDA)*

baltimore

Description:

The purpose of this project is to use exploratory data analysis (EDA) to draw inferences on historical crime data of Baltimore City, MD. As a result of sparsity in data from 1963 to 2013, only data covering 7 years period (2014 to 2020) was considered. The investigation reveals that the top three crimes in the city are: larceny (22%), common assault (17%) and burglary (14%). The overall crime increased gradually from 2014 to the highest in 2017 and has been decreasing steadily. Most of the crimes occurs during the summer and the northeast police district has the highest occurrence with the most dangerous neighborhoods being Downtown and Frankford. Out of the 14 reported crime types, five crime types (agg. assault, homicide, rape, robbery-carjacking, shooting) shows increasing trends when comparing the counts from 2014 to 2020. For all the crime types with increasing trends, the following neighborhoods are very dangerous: Downtown, Sandtown-Winchester, Brooklyn, Frankford, Broadway East.

Technologies:

Python, NumPy, Pandas, Seaborn, Matplotlib, Calender, Squarify

Check out full codes and notebook on github


Title:
Predicting Baltimore City (MD) Crimes using SARIMAX and fbProphet *Time Series Forecasting*

time series

Description:

In this project, time series (S)ARIMA(X) model was used to predict crime occurring in Baltimore City, MD. The data obtained from Baltimore Open Data Website was subsetted to cover a 7 years period from January 2014 to December 2020. The overall crime with decreasing trend and three crime types (agg. assault, homicide, robbery-carjacking and shooting) with increasing trends were considered in this investigation.

Technologies:

Pandas, Numpy, Scikit learn, Seaborn, Statsmodels, Sktime, Pyramid Arima, Matplotlib, fbProphet

Check out full codes and notebook on github


Title:
Google Analytic Certificate - Cyclistic Bike Share Case Study *Exploratory Data Analysis (EDA) using R*

Number of weekly rides

Description:

In this case study, 12 months historical bike trips data covering September 2020 to August 2021 was analyzed using R programming to understand the difference between casual riders and annual members, and use this findings to design appropriate strategy to convert casual riders to annual members. The analysis reveals that annual members (55%) make up the majority of Cyclistic users. Most of the trips occurs during the weekend with casual riders using the bike for longer time than annual members. However, annual members have higher number of rides during the weekdays than casual riders. The study also shows that both user type prefer classic bike. The overall comparison shows that casual riders spend longer time using bikes than annual members.

Technologies:

R, Ggplot2, Tidyverse, Janitor, Openxlsx, Readxl, Lubridate, Plotrix

Check out full codes and notebook on github