Analyzing Bike riding data

Vaibhav Lodhi
5 min readJul 14, 2022

By Vaibhav Lodhi

Introduction

A scenerio is provided to us in which we work for fictional company named Cyclistic; it is bike-share Chicago based company. In this case the company wants us to analyze the data of riders whether they are casual riders or annual members, their usage, travel distance, etc. So that they want prepare their future acion plan for its execution to broaden their market segment.

Data Overview

The company have a fleet of 5,824 bicycles that can be geotracked and locked into a network of 692 stations across Chicago.

12 months of latest available data in csv format was used and the data has been made available by Motivate International Inc. under this liscence and bike rental data from May-2020 to April -2021 was used for the task. As data was generated directly from the bike sharing system and released by the company, the data is reliable, original, comprehensive and current and therefore suitable to use for the analysis. It includes ride_id, bike_type, start and end riding time, Start and end station name and station_id, start and end station geolocation, and membership or casual rider information.

Index of bucket “divvy-tripdata” (This link have database of bike ride)

12 csv files were imported into MySQL for further data cleaning and analysis and at the later stage transferred to Tableau for data visualization.

Note:- convert all data files into csv file type.

Down below query is to combine all the data of months and form a view.

(Convert data into respective datatype otherwise may face error during execution)

create view  bike_data AS
select*
from(
SELECT * FROM [dbo].[QQ] UNION ALL
SELECT * FROM [dbo].[04]UNION ALL
SELECT * FROM [dbo].[05] UNION ALL
SELECT * FROM [dbo].[06] UNION ALL
SELECT * FROM [dbo].[07] UNION ALL
SELECT * FROM [dbo].[08] UNION ALL
SELECT * FROM [dbo].[09] UNION ALL
SELECT * FROM [dbo].[10] UNION ALL
SELECT * FROM [dbo].[11] UNION ALL
SELECT * FROM [dbo].[12] ) AS trip_duration_table

Or use another method i.e. INSERT INTO…

--To combine all one by one...
Insert into [dbo].[Q1]
select*
from[dbo].[12]
-- To create view data of main datacreate view Bike_data As
select* , DATEDIFF(MINUTE, started_at, ended_at) AS duration
from Q1

IF we go through the data we will find the values or duration in negative, positive or zero…So time can’t be negative or zero in this case. Thereby, we will run a query to eliminate the negative and zero values of duration (cleaning the data for further analysis).

-- To filter incorrect duration having values 0 or negative...
delete from bike_data
where duration<=0 --45006 rows deleted

We will now figure out null values from the data…

--check all fields for NULL values, plus station_names for TEST valuesSELECT
SUM(CASE WHEN ride_id IS NULL THEN 1 ELSE 0 END) AS ride_id_null,
SUM(CASE WHEN rideable_type IS NULL THEN 1 ELSE 0 END) AS rideable_type_null,
SUM(CASE WHEN started_at IS NULL THEN 1 ELSE 0 END) AS started_at_null,
SUM(CASE WHEN ended_at IS NULL THEN 1 ELSE 0 END) AS ended_at_null,
SUM(CASE WHEN start_station_name IS NULL THEN 1 ELSE 0 END) AS start_station_null,
SUM(CASE WHEN end_station_name IS NULL THEN 1 ELSE 0 END) AS end_station_null,
SUM(CASE WHEN start_lat IS NULL THEN 1 ELSE 0 END) AS start_lat_null,
SUM(CASE WHEN end_lat IS NULL THEN 1 ELSE 0 END) AS end_lat_null,
SUM(CASE WHEN start_lng IS NULL THEN 1 ELSE 0 END) AS start_lng_null,
SUM(CASE WHEN end_lng IS NULL THEN 1 ELSE 0 END) AS end_lng_null,
SUM(CASE WHEN member_casual IS NULL THEN 1 ELSE 0 END) AS member_casual_null,
SUM(CASE WHEN UPPER(start_station_name) LIKE '%TEST%' THEN 1 ELSE 0 END) AS start_station_test,
SUM(CASE WHEN UPPER(end_station_name) LIKE '%TEST%' THEN 1 ELSE 0 END) AS end_station_test
FROM bike_data

And the results of the query will be like this…

Null values count have been marked in red color
-- To find duplicate ride_idselect ride_id , count(*) AS 'count'
from bike_data
group by ride_id
having count( ride_id)>1
-- To count of member_casual, rideable_typeSelect member_casual, rideable_type, COUNT(member_casual) As "Count"
from bike_data
group by member_casual,rideable_type;

Query Results…

Now, to count bike_ride duration more than a day

Query…

-- To count of member_casual, rideable_type more than a daySelect member_casual, rideable_type, COUNT(member_casual) As "Count"
from bike_data
where duration> 1440 --24 hrs
group by member_casual,rideable_type;

Results:-

We can calculate the average bike riding time

-- To gather data of average ride time (in minutes)select member_casual, AVG(duration) As 'Average riding time'
from bike_data
group by member_casual

Results:-

To calculate the most frequent

-- Route frequencycreate view route_freuency AS
With frequent_route AS (
select
rideable_type,
member_casual,
concat(start_station_id, ' to ' ,end_station_id) AS 'route'
from bike_data
where start_station_id is not null and end_station_id is not null
)
select rideable_type, member_casual, route, count(route) AS route_count
from frequent_route
group by rideable_type,member_casual, route
having count(route)>1000
order by route_count DESC

Results:-

Almost 43 rows are obtained in result but displaying only few…

Query for getting ride day of week:

-- ride in day of weekWITH day_of_week AS(
SELECT member_casual, datepart(dw FROM started_at) AS dow
FROM bike_data
)
SELECT member_casual,
CASE
WHEN dow = 0 THEN 'Sunday'
WHEN dow = 1 THEN 'Monday'
WHEN dow = 2 THEN 'Tuesday'
WHEN dow = 3 THEN 'Wednesday'
WHEN dow = 4 THEN 'Thursday'
WHEN dow = 5 THEN 'Friday'
WHEN dow = 6 THEN 'Saturday'
ELSE 'others'
END AS day_of_week, COUNT(dow)
FROM day_of_week
GROUP BY member_casual, dow
ORDER BY COUNT(dow) DESC;

Results:

Query to get ride hours:

--ride by hoursWITH hour_interval AS(
SELECT member_casual, datepart(HOUR FROM started_at) AS bike_hour
FROM bike_data
)
SELECT member_casual, bike_hour, COUNT(bike_hour) AS bike_count
FROM hour_interval
GROUP BY member_casual, bike_hour
ORDER BY COUNT(bike_hour) DESC;

Results:

Displayed limited results only…

Now, to calculate ride count in a month we have to use create view :

--ride_per_monthCREATE VIEW ride_count AS
select*,
CONCAT(datepart(year from started_at), ' - ' ,datepart(month from started_at)) AS MONTHS,
datepart(month from started_at) AS ride_per_month
from bike_data

Now, we will write query to get data…

select MONTHS,member_casual, count(ride_per_month)
from ride_count
group by member_casual,ride_per_month ,months
order by months

Results :

contains 24 rows (displayed only few)

Data visualization by using tool Tableau:-

https://public.tableau.com/app/profile/vaibhav.lodhi/viz/tab_leau/Dashboard1?publish=yes

--

--