鍍金池/ 問答/Python/ raise ValueError("No <form> e

raise ValueError("No <form> element found in %s" % response)問題

from scrapy.spiders import CrawlSpider, Rule, Request 
from scrapy.linkextractors import LinkExtractor 
from haoduofuli.items import HaoduofuliItem
from scrapy import FormRequest 
 
account = '你的帳號(hào)'
password = '你的密碼'
 
class myspider(CrawlSpider):
 
    name = 'haoduofuli'
    allowed_domains = ['haoduofuli.wang']
    start_urls = ['http://www.haoduofuli.wang/wp-login.php']
 
    def parse_start_url(self, response):

        formdate = {
                'log': account,
                'pwd': password,
                'rememberme': "forever",
                'wp-submit': "登錄",
                'redirect_to': "http://www.haoduofuli.wang/wp-admin/",
                'testcookie': "1"
         }
        return [FormRequest.from_response(response, formdata=formdate, callback=self.after_login)]
 
 
    def after_login(self, response):

        lnk = 'http://www.haoduofuli.wang'
        return Request(lnk)
 
    rules = (
        Rule(LinkExtractor(allow=('\.html',)), callback='parse_item', follow=True),
    )
 
    def parse_item(self, response):
        item = HaoduofuliItem()
        try:
            item['category'] = response.xpath('//*[@id="content"]/div[1]/div[1]/span[2]/a/text()').extract()[0]
            item['title'] = response.xpath('//*[@id="content"]/div[1]/h1/text()').extract()[0]
            item['imgurl'] = response.xpath('//*[@id="post_content"]/p/img/@src').extract()
            item['yunlink'] = response.xpath('//*[@id="post_content"]/blockquote/a/@href').extract()[0]
            item['password'] = response.xpath('//*[@id="post_content"]/blockquote/font/text()').extract()[0]
            return item
        except:
            item['category'] = response.xpath('//*[@id="content"]/div[1]/div[1]/span[2]/a/text()').extract()[0]
            item['title'] = response.xpath('//*[@id="content"]/div[1]/h1/text()').extract()[0]
            item['imgurl'] = response.xpath('//*[@id="post_content"]/p/img/@src').extract()
            item['yunlink'] = response.xpath('//*[@id="post_content"]/blockquote/p/a/@href').extract()[0] 
            item['password'] = response.xpath('//*[@id="post_content"]/blockquote/p/span/text()').extract()[0] 
            return item

按照教程的說法:return Request(lnk) 這一個(gè)請(qǐng)求也算作 初始URL 只不過 不是 start_urls 的返回response 所以不會(huì)調(diào)用 parse_start_url 函數(shù)哦!

但實(shí)際運(yùn)行中,發(fā)現(xiàn)還是調(diào)用了 parse_start_url, 然后到:

return [FormRequest.from_response(response, formdata=formdate, callback=self.after_login)]

就出現(xiàn)了:

raise ValueError("No <form> element found in %s" % response)

問題。求怎么解決。

回答
編輯回答
薔薇花

這樣確實(shí)可以調(diào)用parse_item,但是怎么用rules來循環(huán)呢?

2017年5月10日 23:09
編輯回答
涼汐
return Request(lnk, callback=self.parse_item)
2017年8月11日 16:13