[2주차 - Day4] 시각화로 결과 요약하기 - Seaborn

2023. 3. 24. 11:16BOOTCAMP/프로그래머스 인공지능 데브코스

시각화 라이브러리, Seaborn

Seaborn

matplotlib 기반으로 하는 시각화 라이브러리

import seaborn as sns

tips = sns.load_dataset("tips")
sns.relplot(
	data=tips,
    x="total_bill", y="tip", col="time",
    hue="smoker", style="smoker", size="size",
)

다양한 그래프를 고수준(high-level)에서 쉽게 그릴 수 있습니다.

 

스크래핑 결과 시각화하기 1 - Web Scraping 기초

%pip install selenium
%pip install webdriver_manager

 

Requirement already satisfied: selenium in ./opt/anaconda3/lib/python3.9/site-packages (4.8.2)
Requirement already satisfied: urllib3[socks]~=1.26 in ./opt/anaconda3/lib/python3.9/site-packages (from selenium) (1.26.11)
Requirement already satisfied: certifi>=2021.10.8 in ./opt/anaconda3/lib/python3.9/site-packages (from selenium) (2022.9.24)
Requirement already satisfied: trio~=0.17 in ./opt/anaconda3/lib/python3.9/site-packages (from selenium) (0.22.0)
Requirement already satisfied: trio-websocket~=0.9 in ./opt/anaconda3/lib/python3.9/site-packages (from selenium) (0.10.2)
Requirement already satisfied: exceptiongroup>=1.0.0rc9 in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.1.1)
Requirement already satisfied: sortedcontainers in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (2.4.0)
Requirement already satisfied: attrs>=19.2.0 in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (21.4.0)
Requirement already satisfied: async-generator>=1.9 in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.10)
Requirement already satisfied: sniffio in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.2.0)
Requirement already satisfied: idna in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (3.3)
Requirement already satisfied: outcome in ./opt/anaconda3/lib/python3.9/site-packages (from trio~=0.17->selenium) (1.2.0)
Requirement already satisfied: wsproto>=0.14 in ./opt/anaconda3/lib/python3.9/site-packages (from trio-websocket~=0.9->selenium) (1.2.0)
Requirement already satisfied: PySocks!=1.5.7,<2.0,>=1.5.6 in ./opt/anaconda3/lib/python3.9/site-packages (from urllib3[socks]~=1.26->selenium) (1.7.1)
Requirement already satisfied: h11<1,>=0.9.0 in ./opt/anaconda3/lib/python3.9/site-packages (from wsproto>=0.14->trio-websocket~=0.9->selenium) (0.14.0)
Note: you may need to restart the kernel to use updated packages.
Requirement already satisfied: webdriver_manager in ./opt/anaconda3/lib/python3.9/site-packages (3.8.5)
Requirement already satisfied: requests in ./opt/anaconda3/lib/python3.9/site-packages (from webdriver_manager) (2.28.1)
Requirement already satisfied: python-dotenv in ./opt/anaconda3/lib/python3.9/site-packages (from webdriver_manager) (1.0.0)
Requirement already satisfied: packaging in ./opt/anaconda3/lib/python3.9/site-packages (from webdriver_manager) (21.3)
Requirement already satisfied: tqdm in ./opt/anaconda3/lib/python3.9/site-packages (from webdriver_manager) (4.64.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in ./opt/anaconda3/lib/python3.9/site-packages (from packaging->webdriver_manager) (3.0.9)
Requirement already satisfied: idna<4,>=2.5 in ./opt/anaconda3/lib/python3.9/site-packages (from requests->webdriver_manager) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./opt/anaconda3/lib/python3.9/site-packages (from requests->webdriver_manager) (1.26.11)
Requirement already satisfied: charset-normalizer<3,>=2 in ./opt/anaconda3/lib/python3.9/site-packages (from requests->webdriver_manager) (2.0.4)
Requirement already satisfied: certifi>=2017.4.17 in ./opt/anaconda3/lib/python3.9/site-packages (from requests->webdriver_manager) (2022.9.24)
Note: you may need to restart the kernel to use updated packages.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get("https://www.weather.go.kr/w/weather/forecast/short-term.do")
driver.implicitly_wait(1)

temps = driver.find_element(By.ID, "my-tchart").text

temps = [int(i) for i in temps.replace("℃","").split("\n")]

print(temps)

 

[12, 14, 14, 15, 16, 16, 15, 14, 13, 12, 11, 11, 10, 9, 9, 8, 8]

 

# 받아온 데이터를 통해 꺾은선 그래프를 그려봅니다.
# x = Elapsed Time(0~len(temperatures))
# y = temperatures

import seaborn as sns

sns.lineplot(
    x = [i for i in range(len(temps))],
    y = temps
)
 
<AxesSubplot:>

 

# 받아온 데이터의 ylim을 조금 더 길게 잡아봅니다.

import matplotlib.pyplot as plt

plt.ylim(min(temps) - 2, max(temps) + 2)
plt.title("Expected Temperature from now on")

sns.lineplot(
    x = [i for i in range(len(temps))],
    y = temps
)

plt.show()

 

스크래핑 결과 시각화하기2 - 프로그래머스 질문태그 빈도 시각화

  • bs4와 Seaborn을 이용해서 질문의 주제 빈도를 보여주는 시각화를 진행해 봅니다.

Target: 프로그래머스 질문 태그와 빈도 확인

https://qna.programmers.co.kr/ 사이트에서 다양한 질문 중, 질문 제목 아래에 '태그'가 있습니다. 이를 스크래핑한 후 시각화해 봅니다.

# 다음 User-Agent를 추가해봅니다.

user_agent = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"}
# 필요한 라이브러리를 불러온 후, 요청을 진행해봅니다.
# 질문의 빈도를 체크하는 dict를 만든 후, 빈도를 체크해봅니다.

import time

frequency = {}

import requests
from bs4 import BeautifulSoup

for i in range(1, 11):
    res = requests.get("https://qna.programmers.co.kr/".format(i), user_agent)
    soup = BeautifulSoup(res.text, "html.parser")

# 1. ul 태그를 모두 찾기
# 2. 1번 안에 있는 li 태그의 text를 추출

    ul_tags = soup.find_all("ul", "question-tags")
    for ul in ul_tags:
        li_tags = ul.find_all("li")
        for li in li_tags:
            tag = li.text.strip()
            if tag not in frequency:
                frequency[tag] = 1
            else:
                frequency[tag] += 1
    time.sleep(0.5)

print(frequency)

 

{'pandas': 20, 'dataframe': 20, 'ai': 10, 'gui': 10, 'tkinter': 10, 'software_development': 10, 'java': 60, 'javac': 10, 'python': 190, 'json': 10, 'return': 10, 'asp.net': 10, 'c': 40, 'ubuntu': 10, 'vmware': 10, 'multithreading': 10, 'algorithm': 20, 'coding-test': 20, 'bfs': 10, 'react': 10, 'javascript': 50, 'arduino': 10, 'node.js': 30, 'regex': 20, 'multiprocessing': 10, 'pygame': 10, 'html': 20, 'css': 20, 'application-development': 10, 'logistic-regression': 10, 'logistic': 10, 'error': 10, 'csv': 10, 'class': 10, 'instance': 10, 'hashmap': 10, 'object': 10, 'beautifulsoup': 10, 'windows': 10, 'for': 10, 'selenium-webdrive': 10, 'c++': 10}

 

# Counter를 사용해 가장 빈도가 높은 value들을 추출합니다.

from collections import Counter
counter = Counter(frequency)
counter.most_common(10)
[('python', 190),
 ('java', 60),
 ('javascript', 50),
 ('c', 40),
 ('node.js', 30),
 ('pandas', 20),
 ('dataframe', 20),
 ('algorithm', 20),
 ('coding-test', 20),
 ('regex', 20)]

 

# Seaborn을 이용해 이를 Barplot으로 그립니다.
x = [elem[0] for elem in counter.most_common(10)]
y = [elem[1] for elem in counter.most_common(10)]

sns.barplot(x=x, y=y)

 

<AxesSubplot:>

겹쳐있어 제대로 읽을 수 없는 x축 글자를 figsize를 통해 해결합니다.

# figsize, xlabel, ylabel, titl을 적절하게 설정해서 시각화를 완성해봅니다.

import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))
plt.title("Frequency of question in programmers")
plt.xlabel("Tag")
plt.ylabel("Frequency")

sns.barplot(x=x, y=y)
<AxesSubplot:title={'center':'Frequency of question in programmers'}, xlabel='Tag', ylabel='Frequency'>