鍍金池/ 問答/Python  數(shù)據(jù)庫/ python爬取數(shù)據(jù)后不能寫入到數(shù)據(jù)庫中

python爬取數(shù)據(jù)后不能寫入到數(shù)據(jù)庫中

目標:去爬當當網(wǎng)某頁內(nèi)容中的書籍的名稱,鏈接和評論數(shù),然后寫入數(shù)據(jù)庫中

pipelins內(nèi)容如下:

import pymysql
class DangdangPipeline(object):

def process_item(self, item, spider):
    conn = pymysql.connect(host="127.0.0.1",port=3306, user="root", passwd="", db="dd")#連接數(shù)據(jù)庫
    cur = conn.cursor()#創(chuàng)建游標對象
    for i in range(0,len(item["title"])):
        title=item["title"][i]
        link=item["link"][i]
        comment=item["comment"][i]
        sql="insert into books(title,link,comment) values('title','link','comment')"#構(gòu)造sql語句;插入值
        conn.commit()#提交事務
        cur.execute(sql)#執(zhí)行sql
    conn.close()#關閉數(shù)據(jù)庫連接狀態(tài)
    return item

爬蟲文件就不貼了,因為單獨運行爬蟲,注釋掉寫入數(shù)據(jù)庫的步驟后運行是沒有問題的,就是寫入數(shù)據(jù)庫中出了問題;
另外,我的數(shù)據(jù)庫里面已經(jīng)建好了名為dd的databese,而且建立了名為books的table如下:

mysql> select * from books;
title link comment
1 2 3
title link 0
title1 link1 0

3 rows in set (0.00 sec)
里面的一些內(nèi)容是我之前在python的shell模式下測試能否連接和寫入時寫入的;命令和執(zhí)行結(jié)果如下:

import pymysql
conn=pymysql.connect(host="127.0.0.1",user="root",password="",database="dd")
cur=conn.cursor()
cur.execute("insert into books(title,link,comment) values('1','2','3')")
1
conn.commit()
cur.execute("select * from books")
2
cur.fetchall()
(('1', '2', 3), ('title1', 'link1', 0))
conn.close()

我在pipelines里面使用的連接和寫入數(shù)據(jù)庫的語句都是用得之前在python的shell模式下運行成功的語句,為啥不行呢,
下面是在cmd中運行中出現(xiàn)的錯誤:


2018-05-02 12:48:05 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: dangdang)
2018-05-02 12:48:05 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 17.9.0, Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0h 27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.16299-SP0
2018-05-02 12:48:05 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'dangdang', 'NEWSPIDER_MODULE': 'dangdang.spiders', 'SPIDER_MODULES': ['dangdang.spiders']}
2018-05-02 12:48:05 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.corestats.CoreStats']
2018-05-02 12:48:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-05-02 12:48:06 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-05-02 12:48:06 [scrapy.middleware] INFO: Enabled item pipelines:
['dangdang.pipelines.DangdangPipeline']
2018-05-02 12:48:06 [scrapy.core.engine] INFO: Spider opened
2018-05-02 12:48:06 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-05-02 12:48:06 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-05-02 12:48:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.dangdang.com/&gt; (referer: None)
2018-05-02 12:48:06 [scrapy.core.scraper] DEBUG: Scraped from <200 http://www.dangdang.com/&gt;
{'comment': [], 'link': [], 'title': []}
2018-05-02 12:48:06 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://category.dangdang.com/...; (referer: http://www.dangdang.com/)
2018-05-02 12:48:07 [scrapy.core.scraper] ERROR: Error processing {'comment': ['5470條評論',

         '120條評論',
         '6257條評論',
         '7448條評論',
         '1275條評論',
         '7518條評論',
         '25163條評論',
         '10043條評論',
         '6621條評論',
         '8796條評論',
         '8546條評論',
         '22721條評論',
         '15684條評論',
         '4726條評論',
         '677條評論',
         '1829條評論',
         '5966條評論',
         '1506條評論',
         '7959條評論',
         '14494條評論',
         '339條評論',
         '21條評論',
         '86條評論',
         '4208條評論',
         '14399條評論',
         '1951條評論',
         '853條評論',
         '4585條評論',
         '3559條評論',
         '7243條評論',
         '465條評論',
         '2832條評論',
         '9721條評論',
         '217條評論',
         '5659條評論',
         '4279條評論',
         '2977條評論',
         '10422條評論',
         '26條評論',
         '1086條評論',
         '4526條評論',
         '18232條評論',
         '139條評論',
         '217條評論',
         '12190條評論',
         '325條評論',
         '1579條評論',
         '743條評論',
         '4089條評論',
         '400條評論',
         '9條評論',
         '3024條評論',
         '1448條評論',
         '2273條評論',
         '14271條評論',
         '250條評論',
         '3255條評論',
         '45條評論',
         '845條評論',
         '4288條評論'],

'link': ['http://product.dangdang.com/2...',

      'http://product.dangdang.com/25218035.html',
      'http://product.dangdang.com/22880871.html',
      'http://product.dangdang.com/23961748.html',
      'http://product.dangdang.com/23997502.html',
      'http://product.dangdang.com/24042210.html',
      'http://product.dangdang.com/22783504.html',
      'http://product.dangdang.com/23259731.html',
      'http://product.dangdang.com/23958142.html',
      'http://product.dangdang.com/23254747.html',
      'http://product.dangdang.com/23368089.html',
      'http://product.dangdang.com/9265169.html',
      'http://product.dangdang.com/22628333.html',
      'http://product.dangdang.com/24035306.html',
      'http://product.dangdang.com/25108303.html',
      'http://product.dangdang.com/23617284.html',
      'http://product.dangdang.com/23734636.html',
      'http://product.dangdang.com/23928893.html',
      'http://product.dangdang.com/22606835.html',
      'http://product.dangdang.com/22722790.html',
      'http://product.dangdang.com/25071204.html',
      'http://product.dangdang.com/25227022.html',
      'http://product.dangdang.com/25159352.html',
      'http://product.dangdang.com/23594855.html',
      'http://product.dangdang.com/23321562.html',
      'http://product.dangdang.com/23926195.html',
      'http://product.dangdang.com/23954702.html',
      'http://product.dangdang.com/23931418.html',
      'http://product.dangdang.com/23794057.html',
      'http://product.dangdang.com/23532609.html',
      'http://product.dangdang.com/25111311.html',
      'http://product.dangdang.com/24142436.html',
      'http://product.dangdang.com/20459091.html',
      'http://product.dangdang.com/25107162.html',
      'http://product.dangdang.com/24144166.html',
      'http://product.dangdang.com/23964002.html',
      'http://product.dangdang.com/23918741.html',
      'http://product.dangdang.com/20165942.html',
      'http://product.dangdang.com/25218232.html',
      'http://product.dangdang.com/23834026.html',
      'http://product.dangdang.com/21063086.html',
      'http://product.dangdang.com/23473514.html',
      'http://product.dangdang.com/25163118.html',
      'http://product.dangdang.com/25089827.html',
      'http://product.dangdang.com/20255354.html',
      'http://product.dangdang.com/25086349.html',
      'http://product.dangdang.com/24188655.html',
      'http://product.dangdang.com/24214704.html',
      'http://product.dangdang.com/23371791.html',
      'http://product.dangdang.com/25071121.html',
      'http://product.dangdang.com/25249602.html',
      'http://product.dangdang.com/23462067.html',
      'http://product.dangdang.com/24159761.html',
      'http://product.dangdang.com/24011143.html',
      'http://product.dangdang.com/21110580.html',
      'http://product.dangdang.com/25123327.html',
      'http://product.dangdang.com/23800641.html',
      'http://product.dangdang.com/25195904.html',
      'http://product.dangdang.com/25060860.html',
      'http://product.dangdang.com/20569834.html'],

'title': [' Python編程 從入門到實踐',

       ' Python基礎教程(第3版)',
       ' 算法(第4版)【Sedgewick之巨著,與高德納TAOCP一脈相承】',
       ' Python核心編程 第3版',
       ' Python編程快速上手 讓繁瑣工作自動化(Python3編程從入門到實踐 新手學習必備用書)',
       ' Java從入門到精通(第4版)(附光盤)',
       ' C++ Primer Plus(第6版)中文版(暢銷30年C++必讀經(jīng)典教程全新升級,蔡學鏞、孟巖、高博傾力推薦)',
       ' 深入理解Java虛擬機:JVM高級特性與最佳實踐(第2版)',
       ' C Primer Plus 第6版 中文版 C語言入門經(jīng)典教程',
       ' 機器學習實戰(zhàn)【python基礎教程指南,python核心編程實例指導,對wxpython數(shù)據(jù)庫充分的講解,不可錯過的編',
       ' 利用Python進行數(shù)據(jù)分析',
       ' Head First Java(中文版)(JAVA經(jīng)典暢銷書 生動有趣 輕松學好JAVA)',
       ' JavaScript高級程序設計(第3版)(JavaScript技術(shù)名著,html+css+javascript教程精',
       ' Java核心技術(shù) 卷I:基礎知識(原書第10版)',
       ' HTML5+CSS3+JavaScript從入門到精通(標準版) web開發(fā)技術(shù)基礎視頻講解與案例實戰(zhàn),8大素材庫、源',
       ' Python編程(第四版)',
       ' 重構(gòu) 改善既有代碼的設計',
       ' 教孩子學編程 Python語言版',
       ' Java并發(fā)編程實戰(zhàn)(第16屆Jolt大獎提名圖書,Java并發(fā)編程必讀佳作)',
       ' JavaScript權(quán)威指南(第6版)(淘寶前端團隊傾情翻譯!經(jīng)典權(quán)威的JavaScript犀牛書!第6版特別涵蓋了H',
       ' 微信小程序開發(fā)圖解案例教程(附精講視頻)',
       ' 精通Python爬蟲框架Scrapy',
       ' 區(qū)塊鏈原理、設計與應用',
       ' “笨辦法”學Python(第3版)',
       ' C++ Primer中文版(第5版)',
       ' JavaEE開發(fā)的顛覆者: Spring Boot實戰(zhàn)',
       ' 程序員的數(shù)學1+2+3 數(shù)學思維+概率統(tǒng)計+線性代數(shù)(套裝共3冊)',
       ' Spring實戰(zhàn)(第4版)',
       ' 動手玩轉(zhuǎn)Scratch2.0編程—STEAM創(chuàng)新教育指南',
       ' 瘋狂Java講義(第4版)',
       ' Java EE互聯(lián)網(wǎng)輕量級框架整合開發(fā)— —SSM框架(Spring MVC+Spring+MyBatis)和Redi',
       ' MATLAB R2016a完全自學一本通',
       ' Effective Java中文版(第2版)',
       ' 區(qū)塊鏈核心算法解析',
       ' 第一行代碼 Android 第2版',
       ' R語言實戰(zhàn) 第2版',
       ' 數(shù)據(jù)結(jié)構(gòu)與算法分析:Java語言描述(原書第3版)',
       ' C 和指針(C指針的經(jīng)典之作,讀者好評如潮)',
       ' Python程序設計 第3版',
       ' Python算法教程',
       ' Python學習手冊:第4版(涵蓋Python2.6和3.X)',
       ' Python基礎教程(第2版·修訂版)',
       ' Java從入門到精通(項目案例版)掃碼看視頻重印80次銷售50萬冊',
       ' Android Studio開發(fā)實戰(zhàn):從零基礎到App上線',
       ' Python核心編程(第二版)(Python領域經(jīng)典作品、非常有名的開發(fā)指南)',
       ' PHP從入門到精通(第4版)',
       ' C語言從入門到精通(第3版)',
       ' 算法圖解',
       ' 深入淺出Node.js【深入講解Node的圖書,詳細講述基于JavaScript運行時所建立的平臺原理,與mongod',
       ' 流暢的Python',
       ' Python 3網(wǎng)絡爬蟲開發(fā)實戰(zhàn)',
       ' 圖解HTTP',
       ' Python機器學習 預測分析核心算法',
       ' C#入門經(jīng)典第7版',
       ' 用戶體驗要素:以用戶為中心的產(chǎn)品設計(原書第2版)(決戰(zhàn)大數(shù)據(jù)時代!IT技術(shù)人員不得不讀?。?,
       ' Python 3.5從零開始學',
       ' Redis實戰(zhàn)',
       ' 數(shù)據(jù)結(jié)構(gòu) Python語言描述',
       ' Scratch少兒趣味編程2',
       ' 程序員的自我修養(yǎng)—鏈接、裝載與庫( 網(wǎng)易云風力薦:莫到用時再讀書?。?]}

Traceback (most recent call last):
File "d:python3.5libsite-packagestwistedinternetdefer.py", line 653, in _runCallbacks

current.result = callback(current.result, *args, **kw)

File "E:python第7次課dangdangdangdangpipelines.py", line 20, in process_item

cur.execute(sql)#執(zhí)行sql

File "d:python3.5libsite-packagespymysqlcursors.py", line 165, in execute

result = self._query(query)

File "d:python3.5libsite-packagespymysqlcursors.py", line 321, in _query

conn.query(q)

File "d:python3.5libsite-packagespymysqlconnections.py", line 860, in query

self._affected_rows = self._read_query_result(unbuffered=unbuffered)

File "d:python3.5libsite-packagespymysqlconnections.py", line 1061, in _read_query_result

result.read()

File "d:python3.5libsite-packagespymysqlconnections.py", line 1349, in read

first_packet = self.connection._read_packet()

File "d:python3.5libsite-packagespymysqlconnections.py", line 1018, in _read_packet

packet.check_error()

File "d:python3.5libsite-packagespymysqlconnections.py", line 384, in check_error

err.raise_mysql_exception(self._data)

File "d:python3.5libsite-packagespymysqlerr.py", line 107, in raise_mysql_exception

raise errorclass(errno, errval)

pymysql.err.IntegrityError: (1062, "Duplicate entry 'title' for key 'PRIMARY'")
2018-05-02 12:48:07 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET http://category.dangdang.com/...; - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)
2018-05-02 12:48:07 [scrapy.core.engine] INFO: Closing spider (finished)
2018-05-02 12:48:07 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 498,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 72998,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'dupefilter/filtered': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 5, 2, 4, 48, 7, 274283),
'item_scraped_count': 1,
'log_count/DEBUG': 5,
'log_count/ERROR': 1,
'log_count/INFO': 7,
'request_depth_max': 2,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2018, 5, 2, 4, 48, 6, 44732)}
2018-05-02 12:48:07 [scrapy.core.engine] INFO: Spider closed (finished)


完全自學,還請各位老師指點,另外我之前學c語言的時候,我記得執(zhí)行都是按順序執(zhí)行的,為啥這里先爬取comment呢?
我的程序里它是在最后啊,title是在前面的啊?

回答
編輯回答
影魅

你貼的代碼是:先commit,后execute?

2018年1月24日 00:37