鍍金池/ 問(wèn)答/數(shù)據(jù)分析&挖掘  Python  HTML/ Python3 爬取小豬短租出錯(cuò)

Python3 爬取小豬短租出錯(cuò)

剛接觸python,按照https://blog.csdn.net/mtbaby/...
想爬取小豬短租信息,但之后IP被封。
于是看起了代理ip的問(wèn)題,但是仍無(wú)法獲得信息

import requests
from lxml import etree
import time
proxies = {
    'http': 'http://61.135.217.7:80',
}
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36'
url = 'http://hz.xiaozhu.com/'
headers = {'User-Agent': user_agent}
data = requests.get(url, headers=headers, proxies=proxies).text
h = etree.HTML(data)
home = h.xpath('//*[@id="page_list"]/ul/li')
time.sleep(2)
for div in home:
    title = h.xpath('./div[2]/div/a/span/text()')[0]  # 標(biāo)題
    price = h.xpath('./div[2]/span[1]/i/text()')[0]  # 價(jià)格
    print("{}-->{}}".format(title, price))

運(yùn)行結(jié)果如下
圖片描述
希望能夠幫忙解決,不勝感激!

回答
編輯回答
咕嚕嚕

并不是每個(gè)代理IP都有效,你要先確認(rèn)代理是否有效再去使用

import requests
from pyquery import PyQuery as Q

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'}
proxies = {
    "http": "http://103.235.245.35:8080"
}

r = requests.get('http://hz.xiaozhu.com/', headers=headers, proxies=proxies)
for _ in Q(r.text)('#page_list li'):
    title = Q(_).find('.result_title').text()
    price = Q(_).find('.result_price').text()

    print title, price
2018年5月3日 17:57