This dataset is from: https://divvy-tripdata.s3.amazonaws.com/index.html .
Brief introduction of the dataset :
This dataset is from an unknown bicycle-share company. Let's name it company A. All data is stored in CSV files.
The sample csv file is like this :
This research is designed to ask the following questions:
1. How do members and casual riders use bike-sharing services differently?
2. If company A wants to increase the sales data of membership purchases in a certain month. Are there any possible advice?
Prepare the data
Since I used Excel (which is a limited tool for dealing with a big amount of data), so I only focused on the data from a specific month. I chose the data from the last November. The same research for a whole year will be published soon by using SQL queries.
1. All the data stored in this dataset are in CSV files and organized in the wide format.
2. To check the integrity of the datasets, I did the following work:
1). Trim the data and check data types. To trim columns, I used Trim(). The data type for each column is clearly defined in the targeted table.
2). Check data range. I checked the data range by using statistical functions like MAX(), MIN(), etc.
3). Mandatory items. =isblank()There are no blank cells in the key columns.
Use Iferror()function to replace the error data.
4). Unique:. There is no duplicated data.
5). Expression Patterns. Use the "remove duplicates" function in "Data" tab to do so . The data is checked without unexpected format in each column.
6). Accuracy. I assume that the data provided by company A is credible since I have no way to check if the data is accurate or not.
3. This dataset is from company A directly and is accessible to everyone who has the link.
Process / Analyze
Since the data is in a CSV file, I first created a copy of this file and change it from CSV to XLSX since I used Excel for my research.
1. Transform data for better analysis:
1) I need the length of people's riding. So I added two columns :
ride_time : =D2-C2

I want the data to be shown as time duration. Therefore I customed the data type to [h]:mm:ss. Then I have the result :
2). I also need which day of the week did people rode a bike. Therefore I created a column called day_of_week:
= WEEKDAY(C2, 2)
2. I used pivot table for further analysis. (Since my MS office 365 cannot change the language to English, there will be some Chinese characters in the following images.)
1) The average ride length for members and casual riders:
Then I changed this pivot table to a normal table.
Casual riders rode much longer than members.
2) On which day of the week do people ride bikes and how long did they ride?
If I changed it to a normal table:
3) How many times do members and casual riders ride a bike each day?
If I changed it to a normal table:
3. After this step, I have the tables for further exploration.
Then I tried to visualize them.
1) average ride length for members and casual riders:
2) average ride length on each day of the week :
3) ride times on each day of the week:
4. According to the plots, the findings are as follows:
1). From the first plot, casual riders rode much longer than members.
2). From the second plot,
a). The ride length of members changes slightly on each day;
b). Casual riders tend to ride longer during the weekend.
3). From the third plot,
a). It seems that casual riders use company A's service equally on each day of the week;
b). members may use bikes more frequently during the weekdays.
Discussion
1. Casual riders tend to rent a bike for long-distance rides more than members.
2: Members: It seems that members have a pretty fixed environment for riding bikes since the fluctuation of their ride-length data on each day of the week is minor. Besides, members prefer to ride bikes during the weekdays.
3. Casual riders: the times they ride bikes during the whole week are similar on each day. However, they ride much longer than members and they prefer to use bikes on Wednesday and the weekend.
Therefore, I think maybe company A should cater to causal users' situations and bring out a new membership plan which allows them to use the bike more flexibly and give them a discount policy for people who need long-distance rides.
Comments
Post a Comment