鍍金池/ 問(wèn)答/人工智能  Python/ 如何在start_requests里面設(shè)置post后的頁(yè)面循環(huán)的次數(shù)

如何在start_requests里面設(shè)置post后的頁(yè)面循環(huán)的次數(shù)

我要怎么知道網(wǎng)站篩選條件后有多少頁(yè),因?yàn)橛惺畮讉€(gè)條件循環(huán)爬取,不想每個(gè)都去翻開(kāi)看。這是網(wǎng)站http://www.landchina.com/defa...
圖片描述

然后我的代碼中???部分就是頁(yè)數(shù)。我該怎么獲取頁(yè)數(shù),求助!
圖片描述

假設(shè)有7頁(yè)時(shí)的代碼如下:而不知道有多少頁(yè)時(shí)該怎么寫??

class ListSpider(CrawlSpider):
    # 爬蟲名稱
    name = "tudiwang"
    # 允許域名
    allowed_domains = ["landchina.com"]
    def start_requests(self):
        cities = {'4a611fc4-42b1-4861-ac43-8d25b002dc2b%3A31%A8%88%7E%C9%CF%BA%A3%CA%D0',
                  '4a611fc4-42b1-4861-ac43-8d25b002dc2b%3A2101%A8%88%7E%C9%F2%D1%F4%CA%D0'}
        for city in cities:
            for i in range(1, 8):
                yield FormRequest(url="http://www.landchina.com/default.aspx?tabid=262&ComName=default",
                      formdata={'TAB_QuerySubmitPagerData': str(i),
                                'hidComName': 'default',
                                '__VIEWSTATE': '/wEPDwUJNjkzNzgyNTU4D2QWAmYPZBYIZg9kFgICAQ9kFgJmDxYCHgdWaXNpYmxlaGQCAQ9kFgICAQ8WAh4Fc3R5bGUFIEJBQ0tHUk9VTkQtQ09MT1I6I2YzZjVmNztDT0xPUjo7ZAICD2QWAgIBD2QWAmYPZBYCZg9kFgJmD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmDxYEHwEFIENPTE9SOiNEM0QzRDM7QkFDS0dST1VORC1DT0xPUjo7HwBoFgJmD2QWAgIBD2QWAmYPDxYCHgRUZXh0ZWRkAgEPZBYCZg9kFgJmD2QWAmYPZBYEZg9kFgJmDxYEHwEFhwFDT0xPUjojRDNEM0QzO0JBQ0tHUk9VTkQtQ09MT1I6O0JBQ0tHUk9VTkQtSU1BR0U6dXJsKGh0dHA6Ly93d3cubGFuZGNoaW5hLmNvbS9Vc2VyL2RlZmF1bHQvVXBsb2FkL3N5c0ZyYW1lSW1nL3hfdGRzY3dfc3lfamhnZ18wMDAuZ2lmKTseBmhlaWdodAUBMxYCZg9kFgICAQ9kFgJmDw8WAh8CZWRkAgIPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmD2QWAmYPZBYEZg9kFgJmDxYEHwEFIENPTE9SOiNEM0QzRDM7QkFDS0dST1VORC1DT0xPUjo7HwBoFgJmD2QWAgIBD2QWAmYPDxYCHwJlZGQCAg9kFgJmD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCZg9kFgJmD2QWAmYPFgQfAQUgQ09MT1I6I0QzRDNEMztCQUNLR1JPVU5ELUNPTE9SOjsfAGgWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAICD2QWBGYPZBYCZg9kFgJmD2QWAmYPZBYCAgEPZBYCZg8WBB8BBYYBQ09MT1I6I0QzRDNEMztCQUNLR1JPVU5ELUNPTE9SOjtCQUNLR1JPVU5ELUlNQUdFOnVybChodHRwOi8vd3d3LmxhbmRjaGluYS5jb20vVXNlci9kZWZhdWx0L1VwbG9hZC9zeXNGcmFtZUltZy94X3Rkc2N3X3p5X2RrZ3NfMDEuZ2lmKTsfAwUCNDYWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAIBD2QWAmYPZBYCZg9kFgJmD2QWAgIBD2QWAmYPFgQfAQUgQ09MT1I6I0QzRDNEMztCQUNLR1JPVU5ELUNPTE9SOjsfAGgWAmYPZBYCAgEPZBYCZg8PFgIfAmVkZAIDD2QWAgIDDxYEHglpbm5lcmh0bWwFgwc8cCBhbGlnbj0iY2VudGVyIj48c3BhbiBzdHlsZT0iZm9udC1zaXplOiB4LXNtYWxsIj4mbmJzcDs8YnIgLz4NCiZuYnNwOzxhIHRhcmdldD0iX3NlbGYiIGhyZWY9Imh0dHA6Ly93d3cubGFuZGNoaW5hLmNvbS8iPjxpbWcgYm9yZGVyPSIwIiBhbHQ9IiIgd2lkdGg9IjI2MCIgaGVpZ2h0PSI2MSIgc3JjPSIvVXNlci9kZWZhdWx0L1VwbG9hZC9mY2svaW1hZ2UvdGRzY3dfbG9nZS5wbmciIC8+PC9hPiZuYnNwOzxiciAvPg0KJm5ic3A7PHNwYW4gc3R5bGU9ImNvbG9yOiAjZmZmZmZmIj5Db3B5cmlnaHQgMjAwOC0yMDE0IERSQ25ldC4gQWxsIFJpZ2h0cyBSZXNlcnZlZCZuYnNwOyZuYnNwOyZuYnNwOyA8c2NyaXB0IHR5cGU9InRleHQvamF2YXNjcmlwdCI+DQp2YXIgX2JkaG1Qcm90b2NvbCA9ICgoImh0dHBzOiIgPT0gZG9jdW1lbnQubG9jYXRpb24ucHJvdG9jb2wpID8gIiBodHRwczovLyIgOiAiIGh0dHA6Ly8iKTsNCmRvY3VtZW50LndyaXRlKHVuZXNjYXBlKCIlM0NzY3JpcHQgc3JjPSciICsgX2JkaG1Qcm90b2NvbCArICJobS5iYWlkdS5jb20vaC5qcyUzRjgzODUzODU5YzcyNDdjNWIwM2I1Mjc4OTQ2MjJkM2ZhJyB0eXBlPSd0ZXh0L2phdmFzY3JpcHQnJTNFJTNDL3NjcmlwdCUzRSIpKTsNCjwvc2NyaXB0PiZuYnNwOzxiciAvPg0K54mI5p2D5omA5pyJJm5ic3A7IOS4reWbveWcn+WcsOW4guWcuue9kSZuYnNwOyZuYnNwO+aKgOacr+aUr+aMgTrmtZnmsZ/oh7vlloTnp5HmioDogqHku73mnInpmZDlhazlj7gmbmJzcDvkupHlnLDnvZE8YnIgLz4NCuWkh+ahiOWPtzog5LqsSUNQ5aSHMDkwNzQ5OTLlj7cg5Lqs5YWs572R5a6J5aSHMTEwMTAyMDAwNjY2KDIpJm5ic3A7PGJyIC8+DQo8L3NwYW4+Jm5ic3A7Jm5ic3A7Jm5ic3A7PGJyIC8+DQombmJzcDs8L3NwYW4+PC9wPh8BBWRCQUNLR1JPVU5ELUlNQUdFOnVybChodHRwOi8vd3d3LmxhbmRjaGluYS5jb20vVXNlci9kZWZhdWx0L1VwbG9hZC9zeXNGcmFtZUltZy94X3Rkc2N3MjAxM195d18xLmpwZyk7ZGRj0Fiy7GPyRpmTRnlY5U0IOrZGzSKlbJQP2TxpPz5bSg==',
                                '__EVENTVALIDATION': '/wEWAgLRs7z/DgLN3cj/BExRVsB2t10osIeeCUJDCHk68eSfFrDyuUmn8ownEC0+',
                                'TAB_QuerySortItemList': '20da3312-3b36-4e96-9398-fc8c5174b02c:False',
                                'TAB_QuerySubmitConditionData': unquote(city),
                                'TAB_QuerySubmitOrderData': '20da3312-3b36-4e96-9398-fc8c5174b02c:False',
                                'TAB_QuerySubmitSortData': '',
                                'TAB_RowButtonActionControl': ''
                                }
                      )
# 解析內(nèi)容函數(shù)
    def parse(self, response):
        for sel in response.xpath('//tr[@onmouseout="this.className=rowClass"]'):
            item = TudiItem()
            no = sel.xpath('td[@class="gridTdNumber"]/text()')[0].extract()
            item['no'] = no
            district = sel.xpath('td[@class="queryCellBordy"]/text()')[0].extract()
            item['district'] = district
            time = sel.xpath('td[@class="queryCellBordy"]/text()')[1].extract()
            item['time'] = time
            yield item
回答
編輯回答
硬扛

可以try一下,沒(méi)有頁(yè)面的時(shí)候catch到回調(diào)回來(lái)

try
    ...
catch
    ...
2018年8月11日 05:51
編輯回答
任她鬧

那你可以不把它放在start_requests里啊。另外,CrawlSpider最好不要重寫parse方法,否則會(huì)失去它的功能。

2017年5月8日 07:58