파이썬으로 웹페이지 MP3 크롤링하기

안녕하세요. 제로윈코딩입니다.

요즘 한창 영어 공부 중입니다. 듣기 / 말하기 위주로 하는데 차에서 CD로 들으면 좋겠다고 생각했습니다.

그래서 아래 웹페이지의 MP3 파일을 저장하고 싶어 어떤 프로그래밍을 할까 생각하다가

파이썬이란 언어가 웹페이지를 읽고 크롤링하기 좋다고 해서 프로그래밍해봅니다.

피시에서 할까 하다가 라즈베리파이에서 파이썬 프로그래밍을 쉽게 할 수 있어 라즈베리파이로 선택했습니다.

크롤링(crawling) / 스크래이핑(scraping)

: 웹 페이지 읽어 데이터 추출

파이썬 프로그래밍은 익숙하지 않아 구글링을 하면서 프로그래밍 합니다. ^^

작업환경은 파이썬 in 라즈베리파이입니다.

영어 듣기 웹페이지 입니다. 너무 좋습니다.!

위 웹페이지 소스코드입니다.

이부분을 파이썬 언어를 이용하여 파싱해야 합니다.

추출된 MP3 파일들입니다.

프로그래밍 실행후 완료 화면입니다.(라즈베리파이에서)

소스코드

#!/usr/bin/env python

import requests

from bs4 import BeautifulSoup

import urllib2

import os

def get_html(url):

_html = ""

resp = requests.get(url)

if resp.status_code == 200:

_html = resp.text

return _html

def parse_html(html):

mp3_list = list()

soup = BeautifulSoup(html, 'html.parser')

#ct > section > div > div.list_area2 > div.list_area > ul > li:nth-child(1) > button

mp3_area = soup.find_all("button", {"type": "button"} )

#soup.find("button", {"type": "button"} )

#.find_all("td", {"class":"title"})

#print(webtoon_area)

#<button class="replay_btn" onclick="popAOD('/2015/05/', '201505010732492589.mp3', '0333')" type="button"></button>

for mp3_line in mp3_area:

temp = str(mp3_line)

index = temp.find("popAOD")

if 0 < index:

print('line=', temp[index:])

index += 8

len = index+9

date = temp[index:len]

print('date=', date)

index += 8+5

len = index+22

file = temp[index:len]

print('file=', file)

#http://aod.ytnradio.kr/ytnradio/aod/2015/05/201505190738361480.mp3

mp3_url = "http://aod.ytnradio.kr/ytnradio/aod/"+date+file

print("MP3=", mp3_url);

#text = "TEST:" +

#print(text)

mp3_list.append(mp3_url)

return mp3_list

if __name__ == "__main__":

for x in range(1, 33):

URL = 'http://m.ytnradio.kr/program_view.php?page='+ str(x) +'&s_mcd=0333'

html = get_html(URL)

#print(html)

res_parse = parse_html(html)

#print(res_parse)

for mp3_url in res_parse:

print(mp3_url)

mp3fileName = mp3_url[45:]

if os.path.isfile(mp3fileName):

print("exist="+mp3fileName)

else:

mp3file = urllib2.urlopen(mp3_url)

with open(mp3fileName, 'wb') as output:

output.write(mp3file.read())

Colored by Color Scripter

Breaking News

Sports

sponsor

Tags

이 블로그 검색

블로그 보관함

태그

신고하기

프로필

안드로이드 박스 화면 회전

파이썬으로 웹페이지 MP3 크롤링하기

영어 듣기 웹페이지 입니다. 너무 좋습니다.!

위 웹페이지 소스코드입니다.

추출된 MP3 파일들입니다.

소스코드

댓글 없음

Popular Posts

Recent Posts

Comments

Facebook

Featured Posts

Recent Posts

Recent in Sports

Breaking News

Sports

sponsor

Tags

이 블로그 검색

블로그 보관함

태그

신고하기

프로필

안드로이드 박스 화면 회전

파이썬으로 웹페이지 MP3 크롤링하기

영어 듣기 웹페이지 입니다. 너무 좋습니다.!

위 웹페이지 소스코드입니다.

추출된 MP3 파일들입니다.

소스코드

댓글 없음

Social Counter

Popular Posts

Recent Posts

Comments

Facebook

Featured Posts

Recent Posts

Recent in Sports