小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

分享

[簡(jiǎn)單的python爬蟲(chóng)實(shí)戰(zhàn)] 獲取1688網(wǎng)頁(yè)上的商品信息

 huowufenghuang 2019-05-16

語(yǔ)言:python 3.6 / 框架: Scrapy 1.5 /  數(shù)據(jù)庫(kù):Mysql 8.0 / IDE: pycharm

1. 生成項(xiàng)目

首先,安裝好基本的軟件。之后到項(xiàng)目文件夾處 按住 shift+右鍵,打開(kāi)命令行模式。

執(zhí)行 scrapy startproject [項(xiàng)目名] 生成項(xiàng)目文件。cd [項(xiàng)目名] 進(jìn)入到項(xiàng)目文件夾中后執(zhí)行 scrapy genspider <爬蟲(chóng)名> <domain/域名(起始頁(yè))> 生成爬蟲(chóng)文件。

2. 創(chuàng)建數(shù)據(jù)對(duì)象 items.py。在mysql中建立對(duì)應(yīng)的數(shù)據(jù)表單。(注意表的字符編碼,這里設(shè)置的數(shù)據(jù)編碼是CHARACTER SET utf8 COLLATE utf8_general_ci)

class A1688Item_selloffer(scrapy.Item):
    title = scrapy.Field()     #標(biāo)題
    company = scrapy.Field()   #公司
    price = scrapy.Field()     #售價(jià)
    sell = scrapy.Field()      #30天成交量
    method = scrapy.Field()    #銷(xiāo)售模式
    rebuy = scrapy.Field()     #回頭率
    address = scrapy.Field()   #地址
    subicon = scrapy.Field()   #服務(wù)保障

3.編寫(xiě)爬蟲(chóng)邏輯

import scrapy
from bs4 import BeautifulSoup
from A1688.items import A1688Item_selloffer #導(dǎo)入item類(lèi)

KEYWORD="關(guān)鍵詞"
PAGE='1'

class A1688SellofferSpider(scrapy.Spider):
    name = 'A1688-selloffer'
    # allowed_domains = ['www.1688.com']     #爬蟲(chóng)允許爬取的網(wǎng)址域名。
    # start_urls = ['https://www.1688.com/'] #需要爬取的鏈接的列表,response對(duì)象默認(rèn)傳遞給 self.parse 函數(shù)處理。
    def start_requests(self):
        for page in range(1, int(PAGE)+1):
            url = 'https://s.1688.com/selloffer/offer_search.htm?keywords=%s&beginPage=%s' % (KEYWORD, page)
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        #實(shí)例化一個(gè)數(shù)據(jù)模型        
        item = A1688Item_selloffer()
        for tag in response.css('.sw-dpl-offer-item').extract():
            try:
                # 從response中利用css選擇器提取出來(lái)的標(biāo)簽是文本形式,需要利用 BeautifulSoup 轉(zhuǎn)換成BeautifulSoup.Tag 對(duì)象進(jìn)行進(jìn)一步提取。
                soup = BeautifulSoup(tag, 'lxml')

                item['title'] = soup.select(".sw-dpl-offer-photo img")[0].attrs['alt']
                item['company'] = soup.select(".sw-dpl-offer-companyName")[0].attrs['title']
                item['price'] = soup.select(".sw-dpl-offer-priceNum")[0].attrs['title']
                item['sell'] = soup.select(".sm-offer-tradeBt")[0].attrs['title']
                item['rebuy'] =soup.select(".sm-widget-offershopwindowshoprepurchaserate span")[2].string
                item['method']  = soup.select(".sm-widget-offershopwindowshoprepurchaserate i")[0].string
                #對(duì)于不一定能獲取的數(shù)據(jù),需要判斷數(shù)據(jù)存在與否。
                if soup.select(".sm-offer-location")[0].attrs['title']:
                    address= soup.select(".sm-offer-location")[0].attrs['title']
                else:
                    address = " "
                item['address'] =address

                if soup.select(".sm-offer-subicon a"):
                    subicon = []
                    for i in soup.select(".sm-offer-subicon a"):
                        subicon.append(i.attrs['title'] + ',')
                    print(subicon)
                    item['subicon']=subicon
                else:
                    item['subicon'] = ' '
                #返回這個(gè)數(shù)據(jù)模型,交給 ITEM_PIPELINES 處理
                yield item
            except Exception as error:
                yield item
                print("出錯(cuò)了:", error)
                continue

4.建立數(shù)據(jù)管道

#連接數(shù)據(jù)庫(kù),獲取cursor以便之后對(duì)數(shù)據(jù)就行增刪查改
import pymysql
from A1688 import settings

# 創(chuàng)建DBPipeline類(lèi),在其中進(jìn)行對(duì)數(shù)據(jù)庫(kù)的操作。
class DBPipeline(object):
    def __init__(self):
        # 連接數(shù)據(jù)庫(kù)??梢栽趕ettings中對(duì)數(shù)據(jù)庫(kù)連接需要的參數(shù)進(jìn)行設(shè)置
        self.connect = pymysql.connect(
            host=settings.MYSQL_HOST,
            port=settings.MYSQL_PORT,
            db=settings.MYSQL_DBNAME,
            user=settings.MYSQL_USER,
            passwd=settings.MYSQL_PASSWD,
            charset='utf8', #注意這里charset屬性為 ‘utf8’,中間沒(méi)有-,是因?yàn)閿?shù)據(jù)庫(kù)中的字符名稱就是'utf8'
            use_unicode=True)

        # 創(chuàng)建數(shù)據(jù)庫(kù)游標(biāo)對(duì)象,通過(guò)cursor對(duì)象對(duì)數(shù)據(jù)庫(kù)執(zhí)行增刪查改
        self.cursor = self.connect.cursor()
        print("數(shù)據(jù)庫(kù)鏈接成功mysql connect succes")
        
    #重載方法process_item(self, item, spider):
    #返回一個(gè)具有數(shù)據(jù)的dict,或者item對(duì)象,或者拋出DropItem異常,被丟棄的item將不會(huì)被之后的pipeline組件所處理
    #利用這個(gè)方法,可以對(duì)item采集的數(shù)據(jù)進(jìn)行增刪查改,利用cursor執(zhí)行sql語(yǔ)句,然后使用self.connect.commit()提交sql語(yǔ)句
    def process_item(self, item, spider):
        try:
            # 插入數(shù)據(jù)。利用execute()方法執(zhí)行sql語(yǔ)句對(duì)數(shù)據(jù)庫(kù)進(jìn)行操作。這里執(zhí)行的是寫(xiě)入語(yǔ)句 "insert into 表名(列名) value (數(shù)據(jù))"
            # 需要注意的是。需要注意的是數(shù)據(jù)庫(kù)的編碼格式,以及sql語(yǔ)句的使用規(guī)范。(如varchar 類(lèi)型的數(shù)據(jù)接收的是字符串,要帶'')
            self.cursor.execute(
                "insert into a1688item_selloffer (title,company,price,sell,method,rebuy,address,subicon) value (%s,%s,%s,%s,%s,%s,%s,%s)"
                [item['title'],item['company'],item['price'],item['sell'],item['method'],item['rebuy'],item['address'],item['subicon']]
            )
            print("insert success")
            # 提交sql語(yǔ)句
            self.connect.commit()
        except Exception as error:
            # 出現(xiàn)錯(cuò)誤時(shí)打印錯(cuò)誤日志
            print('Insert error:', error)
        return item
5.修改settings
# 修改USER_AGENT,讓服務(wù)器識(shí)別爬蟲(chóng)為瀏覽器
USER_AGENT =  "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"

#設(shè)置爬蟲(chóng)執(zhí)行過(guò)程中使用的數(shù)據(jù)處理管道
ITEM_PIPELINES = {
    'A1688.pipelines.DBPipeline': 300, #這個(gè)數(shù)的范圍是0-1000,這個(gè)數(shù)值確定了pipelines的運(yùn)行順序,數(shù)字越小越優(yōu)先
}

# 設(shè)置與數(shù)據(jù)庫(kù)連接相關(guān)的變量
MYSQL_HOST = 'localhost' # 主機(jī)域名,默認(rèn)為本地鏈接
MYSQL_PORT = 3306        # 數(shù)據(jù)庫(kù)端口,mysql一般默認(rèn)用3306
MYSQL_DBNAME = 'a1688'   # 數(shù)據(jù)庫(kù)名字
MYSQL_USER = 'root'      # 用戶名
MYSQL_PASSWD = '******'  # 用戶密碼

    本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊一鍵舉報(bào)。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶 評(píng)論公約

    類(lèi)似文章 更多