Python으로 웹스크래핑(크롤링)

파이썬 2021. 12. 17. 15:47

패키지 준비

import requests //requests, 라이브러리 설치 필요

r = requests.get('url')

rjson = r.json()

print(rjson)

크롤링 기본 세팅

import requests

from bs4 import BeautifulSoup // 크롤링해야 할 사이트에서 크롤링 할 데이터를 쉽게 찾아줌

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) // Chrome/73.0.3683.86 Safari/537.36'}

data = requests.get('https://movie.naver.com/movie/sdb/rank/rmovie.naver',headers=headers)

// url은 크롤링할 사이트

soup = BeautifulSoup(data.text, 'html.parser')

# 코딩 시작

BeautifulSoup 사용법

원하는 제목 우클릭 후 검사클릭 -> 표시되어있는 HTML코드 우클릭 후 Copy selector

하나 가져오기

ex) title = soup.select_one('#old_content > table > tbody > tr:nth-child(2) > td.title > div > a')

print(title.text)

여러개 가져오기

ex) movie = soup.select('#old_content > table > tbody > tr')

// 공통된 부분만 HTML코드 삽입

print(movie)

EPL 순위 크롤링 하기

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) //  Chrome/73.0.3683.86 Safari/537.36'}
data = requests.get('https://sports.news.naver.com/wfootball/index.nhn',headers=headers)
soup = BeautifulSoup(data.text, 'html.parser')

title = soup.select('#_team_rank_epl > table > tbody >tr')

for lank in title:
    a = lank.select_one('div > div.info > span')
    b = lank.select_one('th > span > em > span')
    c = lank.select_one('td:nth-child(7) > span')
    title = a.text
    rank = b.text
    win = c.text
    print(rank,title,win+'점')

'파이썬' 카테고리의 다른 글

리그오브레전드: 이즈리얼vs몬스터 Game (0)	2021.12.26
배스킨라빈스 31 Game (0)	2021.12.26
Python Up and Down Game (0)	2021.12.26
Pymongo Code (0)	2021.12.17
Python Flask (0)	2021.12.17

ABOUT ME

장원이의 개발일지 장원이의 개발일지

패키지 준비

크롤링 기본 세팅

BeautifulSoup 사용법

EPL 순위 크롤링 하기

'파이썬' 카테고리의 다른 글

티스토리툴바

ABOUT ME

패키지 준비

크롤링 기본 세팅

BeautifulSoup 사용법

EPL 순위 크롤링 하기

'파이썬' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바