Posts

Showing posts from February, 2023

(Data analysis with Power BI) The highest revenue moive and the highest rated movie in every decade

Image
This dataset is publicly shared on: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset Brief intro of the dataset:  "These files contain metadata for all 45,000 movies listed in the Full MovieLens Dataset. The dataset consists of movies released on or before July 2017. Data points include cast, crew, plot keywords, budget, revenue, posters, release dates, languages, production companies, countries, TMDB vote counts, and vote averages. This dataset also has files containing 26 million ratings from 270,000 users for all 45,000 movies. Ratings are on a scale of 1-5 and have been obtained from the official GroupLens website." Research purpose:  1. The highest revenue movie in every decade 2. The highest-rated movie in every decade The final dashboard is here. You may check the analysis process in the following. Data Validation Data cleaning via Power Query  1)  The data type of each column is directly shown here.  Change it if the data type is wrongly fo...

(Data analysis with SQL/Tableau) How do people use bike-sharing service in the first quarter of 2022?

Image
This dataset is from:  https://divvy-tripdata.s3.amazonaws.com/index.html  . I did the same research via Excel based on the data from one month in another post. In this post, I will do a similar analysis based on the first quarter's data in 2022 via SQL and Tableau.  The sample csv file is like this :  This research is designed to ask the following questions:  1. How do members and casual riders use bike-sharing services differently? 2. Which type of bikes do riders prefer?  I will use Bigquery for data process and analysis. And Tableau for data visualization.  Prepare the data  1. Import data: I directly import data from csv files.  I created a dataset called bikes_use and uploaded the csv files from January to March.  2. Check for data integrity:    1) data type: when I imported the data to Bigquery, it directly showed the datatype of columns:  --We can also use the following code to show the column data type. -- Through...

(Data Analysis with Spreadsheet/Excel) How do people use bike-sharing service in a month?

Image
This dataset is from:  https://divvy-tripdata.s3.amazonaws.com/index.html  . Brief introduction of the dataset :  This dataset is from an unknown bicycle-share company. Let's name it company A. All data is stored in CSV files.  The sample csv file is like this :  This research is designed to ask the following questions:  1. How do members and casual riders use bike-sharing services differently? 2. If company A wants to increase the sales data of membership purchases in a certain month. Are there any possible advice? Prepare the data  Since I used Excel (which is a limited tool for dealing with a big amount of data), so I only focused on the data from a specific month. I chose the data from the last November.  The same research for a whole year will be published soon by using SQL queries.  1. All the data stored in this dataset are in CSV files and organized in the wide   format.   2. To check the integrity of the datasets, I did the...

(DA with R) The connection between people's sleep condition and daily activity

Image
This dataset is from:  https://www.kaggle.com/datasets/arashnic/fitbit Brief intro of this dataset:  This dataset is from Fitbit, a dataset of people's physical activity shared the Fitbit wearable device users. This dataset includes all the data in CSV files.  This research is designed to know whether there is  any possible relationship between people's sleep time and their daily activity. Prepare the data  Since the research purpose is to explore the possible relationship between sleep time and daily activity, therefore I  use two csv files for my research:        The "dailyActivity_merged.csv":      and the "sleepDay_merged.csv".  1. All the data stored in this dataset are in CSV files and organized in the wide  format.   2. To check the integrity of the datasets, I did the following work:       1). Import         a)Upload the related files that need to be analy...