티스토리 뷰
* 데이터 출처: https://www.kaggle.com/shivamb/netflix-shows
import pandas as pd
import os
# os.environ을 이용하여 Kaggle API Username, Key 세팅하기
os.environ['KAGGLE_USERNAME'] = 'jhighllight'
os.environ['KAGGLE_KEY'] = 'xxxxxxxxxxxxxxxxxxxxxxxxxxx'
# Linux 명령어로 Kaggle API를 이용하여 데이터셋 다운로드하기 (!kaggle ~)
# Linux 명령어로 압축 해제하기
!kaggle datasets download -d shivamb/netflix-shows
!unzip '*.zip'
netflix-shows.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive: netflix-shows.zip
replace netflix_titles.csv? [y] es, [n] o, [A] ll, [N] one, [r] ename: y
inflating: netflix_titles.csv
df_net = pd.read_csv('/content/netflix_titles.csv')
df_net.head()
df_net.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 show_id 8807 non-null object
1 type 8807 non-null object
2 title 8807 non-null object
3 director 6173 non-null object
4 cast 7982 non-null object
5 country 7976 non-null object
6 date_added 8797 non-null object
7 release_year 8807 non-null int64
8 rating 8803 non-null object
9 duration 8804 non-null object
10 listed_in 8807 non-null object
11 description 8807 non-null object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB
df_net.describe
<bound method NDFrame.describe of show_id type title director \
0 s1 Movie Dick Johnson Is Dead Kirsten Johnson
1 s2 TV Show Blood & Water NaN
2 s3 TV Show Ganglands Julien Leclercq
3 s4 TV Show Jailbirds New Orleans NaN
4 s5 TV Show Kota Factory NaN
... ... ... ... ...
8802 s8803 Movie Zodiac David Fincher
8803 s8804 TV Show Zombie Dumb NaN
8804 s8805 Movie Zombieland Ruben Fleischer
8805 s8806 Movie Zoom Peter Hewitt
8806 s8807 Movie Zubaan Mozez Singh
cast country \
0 NaN United States
1 Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban... South Africa
2 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi... NaN
3 NaN NaN
4 Mayur More, Jitendra Kumar, Ranjan Raj, Alam K... India
.........
8802 Mark Ruffalo, Jake Gyllenhaal, Robert Downey J... United States
8803 NaN NaN
8804 Jesse Eisenberg, Woody Harrelson, Emma Stone, ... United States
8805 Tim Allen, Courteney Cox, Chevy Chase, Kate Ma... United States
8806 Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan... India
date_added release_year rating duration \
0 September 25, 2021 2020 PG-13 90 min
1 September 24, 2021 2021 TV-MA 2 Seasons
2 September 24, 2021 2021 TV-MA 1 Season
3 September 24, 2021 2021 TV-MA 1 Season
4 September 24, 2021 2021 TV-MA 2 Seasons
...............
8802 November 20, 2019 2007 R 158 min
8803 July 1, 2019 2018 TV-Y7 2 Seasons
8804 November 1, 2019 2009 R 88 min
8805 January 11, 2020 2006 PG 88 min
8806 March 2, 2019 2015 TV-14 111 min
listed_in \
0 Documentaries
1 International TV Shows, TV Dramas, TV Mysteries
2 Crime TV Shows, International TV Shows, TV Act...
3 Docuseries, Reality TV
4 International TV Shows, Romantic TV Shows, TV ...
......
8802 Cult Movies, Dramas, Thrillers
8803 Kids' TV, Korean TV Shows, TV Comedies
8804 Comedies, Horror Movies
8805 Children & Family Movies, Comedies
8806 Dramas, International Movies, Music & Musicals
description
0 As her father nears the end of his life, filmm...
1 After crossing paths at a party, a Cape Town t...
2 To protect his family from a powerful drug lor...
3 Feuds, flirtations and toilet talk go down amo...
4 In a city of coaching centers known to train I...
......
8802 A political cartoonist, a crime reporter and a...
8803 While living alone in a spooky town, a young g...
8804 Looking to survive in a world taken over by zo...
8805 Dragged from civilian life, a former superhero...
8806 A scrappy but poor boy worms his way into a ty...
[8807 rows x 12 columns]>
df_net['country']
0 United States
1 South Africa
2 NaN
3 NaN
4 India
...
8802 United States
8803 NaN
8804 United States
8805 United States
8806 India
Name: country, Length: 8807, dtype: object
한국 작품은 총 얼마나 있는가?
country column을 기준으로 한다.
"South Korea"인 경우만 인정한다. ("US, South Korea"는 인정하지 않음)
df_net['country'].value_counts()
199개
가장 많은 작품이 올라간 국가는 어디이고, 얼마나 많은 작품이 있는가?
country column을 기준으로 한다.
단일 국가인 경우를 기준으로 결과를 구해보자.
df_net_by_country = df_net.groupby('country')
df_net_by_country.count().sort_values(by='show_id', ascending=False).head(1)
United States, 2818개
'BOOTCAMP > 프로그래머스 인공지능 데브코스' 카테고리의 다른 글
[4주차 - Day2] 클라우드를 활용한 머신러닝 모델 (0) | 2023.04.06 |
---|---|
[4주차 - Day1] Web Application with Flask (0) | 2023.04.05 |
[3주차 - Day4] Python으로 시각화하기 - Matplotlib (0) | 2023.03.31 |
[3주차 - Day3] Python으로 데이터 다루기 - Pandas (0) | 2023.03.31 |
[3주차 DAY02] Numpy 실습 (0) | 2023.03.30 |
- Total
- Today
- Yesterday
- 부스트코스
- nlp
- 데이터 시각화
- 캐글
- ML
- Kaggle
- 인공지능
- 데이터 분석
- sql 테스트
- lv4
- 태블로
- Lv3
- EDA
- 프로그래머스
- mysql
- API
- 쿼리 테스트
- 프로그래밍
- LV2
- Python
- 알고리즘
- SQL
- 머신러닝
- 데이터사이언스
- 파이썬
- 데이터분석
- SQLD
- ai
- 딥러닝
- LV1
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |