Video Game Sales with Ratings

데이터 소개
Video_Games_Sales_as_at_22_Dec_2016.csv

- 각 파일의 칼럼은 아래와 같습니다.
Name: 게임의 이름
Platform: 게임이 동작하는 콘솔
Year_of_Release: 발매 연도
Genre: 게임의 장르
Publisher: 게임의 유통사
NA_Sales: 북미 판매량 (Millions)
EU_Sales: 유럽 연합 판매량 (Millions)
JP_Sales: 일본 판매량 (Millions)
Other_Sales: 기타 판매량 (아프리카, 일본 제외 아시아, 호주, EU 제외 유럽, 남미) (Millions)
Global_Sales: 전국 판매량
Critic_Score: Metacritic 스태프 점수
Critic_Count: Critic_Score에 사용된 점수의 수
User_Score: Metacritic 구독자의 점수
User_Count: User_Score에 사용된 점수의 수
Developer: 게임의 개발사
Rating: ESRB 등급 (19+, 17+, 등등)

- 데이터 출처: https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings

Video game sales from Vgchartz and corresponding ratings from Metacritic

www.kaggle.com

Step 1. 데이터셋 준비하기

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import os

os.environ['KAGGLE_USERNAME'] = 'jhighllight'
os.environ['KAGGLE_KEY'] = 'xxxxxxxxxxxxxxxxxxxxxx'

!rm *.*
!kaggle datasets download -d rush4ratio/video-game-sales-with-ratings
!unzip '*.zip'

rm: cannot remove '*.*': No such file or directory

Downloading video-game-sales-with-ratings.zip to /content

0% 0.00/476k [00:00 <?,? B/s]

100% 476k/476k [00:00 <00:00, 68.8MB/s]

Archive: video-game-sales-with-ratings.zip

inflating: Video_Games_Sales_as_at_22_Dec_2016.csv

df = pd.read_csv('Video_Games_Sales_as_at_22_Dec_2016.csv')

df

Step 2. EDA 및 데이터 기초 통계 분석

df.head()

df.isna().sum()

Name 2

Platform 0

Year_of_Release 269

Genre 2

Publisher 54

NA_Sales 0

EU_Sales 0

JP_Sales 0

Other_Sales 0

Global_Sales 0

Critic_Score 8582

Critic_Count 8582

User_Score 6704

User_Count 9129

Developer 6623

Rating 6769

dtype: int64

df.dropna(inplace=True)

df

sns.histplot(x='Year_of_Release', data=df, bins=16)

sns.histplot(x='Global_Sales', data=df)

sns.rugplot(x='Global_Sales', data=df)

df[df['Global_Sales'] > 30]

gs = df['Global_Sales'].quantile(0.99)

df = df[df['Global_Sales'] < gs]

sns.histplot(x='Global_Sales', data=df)

fig = plt.figure(figsize=(20, 5))
sns.histplot(x='Global_Sales', hue='Genre', kde=True, data=df)

sns.histplot(x='Critic_Score', data=df, bins=16)

sns.histplot(x='Critic_Count', data=df, bins=16)

sns.histplot(data=df['User_Score'].apply(float), bins=16)

sns.histplot(x='User_Count', data=df, bins=16)

sns.rugplot(x='User_Count', data=df)

uc = df['User_Count'].quantile(0.99)

df = df[df['User_Count'] < uc]

sns.histplot(x='User_Count', data=df)

uc = df['User_Count'].quantile(0.97)
print(uc)

911.5599999999977

df = df[df['User_Count'] < uc]
sns.histplot(x='User_Count', data=df)

df['User_Score'] = df['User_Score'].apply(float)

sns.jointplot(x='Year_of_Release', y='Global_Sales', data=df, kind='hist')

sns.jointplot(x='Critic_Score', y='Global_Sales', data=df, kind='hist')

sns.jointplot(x='User_Score', y='Global_Sales', data=df, kind='hist')

sns.jointplot(x='Critic_Count', y='Global_Sales', data=df, kind='hist')

sns.jointplot(x='User_Count', y='Global_Sales', data=df, kind='hist')

fig = plt.figure(figsize=(15, 5))
sns.boxplot(x='Platform', y='Global_Sales', data=df)

fig = plt.figure(figsize=(15, 5))
sns.boxplot(x='Genre', y='Global_Sales', data=df)

sns.boxplot(y='Critic_Score', data=df)

sns.boxplot(y='User_Score', data=df)

critic_score = df[['Critic_Score']].copy()
critic_score.rename({'Critic_Score': 'Score'}, axis=1, inplace=True)
critic_score['ScoreBy'] = 'Critics'

user_score = df[['User_Score']].copy() *10
user_score.rename({'User_Score': 'Score'}, axis=1, inplace=True)
user_score['ScoreBy'] = 'Users'

scores = pd.concat([critic_score, user_score])
scores

sns.boxplot(x='ScoreBy', y='Score', data=scores)

sns.boxplot(x='Genre', y='Critic_Score', data=df)
plt.xticks(rotation=90)
plt.show()

sns.boxplot(x='Genre', y='User_Score', data=df)
plt.xticks(rotation=90)
plt.show()

fig = plt.figure(figsize=(8, 8))
sns.heatmap(df.corr(), annot=True, cmap='YlOrRd')

저작자표시

'Python > Kaggle' 카테고리의 다른 글

COVID-19 data from John Hopkins University (0)	2023.05.21
[패캠] 우리나라의 행복지수는 몇 위? 아니, 행복지수가 도대체 뭔데? (0)	2023.05.17
Part2. Chapter 2 - 뉴욕에서 방이 둘 딸린 집을 에어비엔비에 내놓으려 한다, 이 때 적당한 숙바.. (0)	2023.03.27
Part2. Chapter 1 - 자동으로 모은 데이터는 분석하기 어렵다면서? 자동으로 모은 중고 자동차 데ᄋ.. (0)	2023.03.26
Part1. Chapter 05 - 미국의 대통령은 어떻게 뽑힐까 (0)	2023.03.22