(Data analysis with SQL/Tableau) How do people use bike-sharing service in the first quarter of 2022?

This dataset is from: https://divvy-tripdata.s3.amazonaws.com/index.html .
I did the same research via Excel based on the data from one month in another post. In this post, I will do a similar analysis based on the first quarter's data in 2022 via SQL and Tableau.
The sample csv file is like this :

This research is designed to ask the following questions:

1. How do members and casual riders use bike-sharing services differently?

2. Which type of bikes do riders prefer?

I will use Bigquery for data process and analysis. And Tableau for data visualization.

Prepare the data

1. Import data: I directly import data from csv files.

I created a dataset called bikes_use and uploaded the csv files from January to March.

2. Check for data integrity:

1) data type: when I imported the data to Bigquery, it directly showed the datatype of columns:

--We can also use the following code to show the column data type.
--Through this step, we know that these three tables have the same data type for each column. 

select
  column_name,data_type
From `evident-apricot-368322.bikes_use.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = '202201_tripdata'

--In addition, Since this dataset contains the trip data from January to March, I want to ensure that these three tables have the same column names.

SELECT
 column_name,
 COUNT(table_name)
FROM
 `evident-apricot-368322.bikes_use.INFORMATION_SCHEMA.COLUMNS`
GROUP BY
 1;

These three tables contain the same column names. By doing so, I may use the union all function to unite these three tables together.

--since I need to calculate how long people used bikes, therefore I want to ensure that the data type of the columns are in the same data type

SELECT
 CONCAT(table_catalog,".",table_schema,".",table_name) AS table_path,
 table_name,
 column_name
FROM
 `evident-apricot-368322.bikes_use.INFORMATION_SCHEMA.COLUMNS`
WHERE
 data_type IN ("TIMESTAMP","DATE","DATETIME");

2) Data range :

To check the Data range of each table, we can use functions like AVG(), MAX(), MIN(), etc. Here I skip the code writing in this part since it is simple.

3) Mandatory items:

--find out if there are null cells in the mandatory columns

select
  countif(started_at is null) 
From `evident-apricot-368322.bikes_use.202201_tripdata`

 4) Unique:
--Find out if there are duplicates : 
select
  (select count(1) 
   from (select distinct * from `evident-apricot-368322.bikes_use.202201_tripdata`)) as distinct_rows,
  (select count(1) from `evident-apricot-368322.bikes_use.202201_tripdata`) as total_rows

 
5) Mandatory patterns: 
--Since the date-related data is formatted in TIMESTAMP, I will not check for Mandatory patterns
 in this case. 

6) Accuracy:  I assume that the data provided by company A is credible since I have no way to check if the data is accurate or not. 




3. This dataset is from company A directly and is accessible to everyone who has the link. 


Process/Analyze:
1. get the data about how members and casual riders use bikes differently by the type of bike. 

With ridedata as 
(Select 
   ride_id, rideable_type,member_casual, 
   ROUND(TIMESTAMP_DIFF(ended_at,started_at,SECOND)/60,2) as ride_time,
   Format_date("%A",started_at) as day_of_week,extract(Month from started_at) as Month
from `evident-apricot-368322.bikes_use.202201_tripdata`
UNION ALL 
SELECT
   ride_id, rideable_type,member_casual, 
   ROUND(TIMESTAMP_DIFF(ended_at,started_at,SECOND)/60,2) as ride_time,
   Format_date("%A",started_at) as day_of_week,extract(Month from started_at) as Month
from `evident-apricot-368322.bikes_use.202202_tripdata`
UNION ALL 
SELECT
   ride_id, rideable_type,member_casual, 
   ROUND(TIMESTAMP_DIFF(ended_at,started_at,SECOND)/60,2) as ride_time,
   Format_date("%A",started_at) as day_of_week,extract(Month from started_at) as Month
from `evident-apricot-368322.bikes_use.202203_tripdata`)

Select distinct member_casual, rideable_type,Month, count(rideable_type) as ride_times

from ridedata

group by member_casual,rideable_type, Month

2. get the data about how many times people ride a bike during a week

Select member_casual, day_of_week, count(ride_id) as ride_times
from ridedata
group by member_casual, day_of_week 

3. get the data about how long people ride a bike each day of the week.

Select distinct member_casual, day_of_week, round(avg(ride_time),2) as average_ride_time

From ridedata

group by member_casual,day_of_week

4. After this step, I exported three tables and imported them to Tableau for 

Data visualization. 

1) Members: It seems that members have a pretty fixed environment for riding bikes since the fluctuation of their ride-length data on each day of the week is minor. Besides, members prefer to ride bikes during the weekdays.

2) Casual riders: the times they ride bikes during the week change slightly. However, they ride much longer than members and they prefer to use bikes on the weekend.

3) There is an obvious growth in the ride times in March.

4) It seems that casual riders prefer electric bikes more than classic bikes. On the contrary, members prefer classic bikes more.

Thanks for your reading.

Search This Blog

Sherwin's Data nest

(Data analysis with SQL/Tableau) How do people use bike-sharing service in the first quarter of 2022?

Prepare the data

Process/Analyze:

Comments

Post a Comment

Popular posts from this blog

(Power BI) Superstore Sales Dataset analysis

(Power BI & Python) Sales performance of different sales method on the same product