0%

crawler爬虫

爬虫入门

pip3 install requests selenium beautifulsoup4 pyquery pymysql pymongo redis flask django jupyter

安装各种库,安装MongoDB,redis,anaconda,pycharm,Python3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import requests
#带参数get请求
>>> response=requests.get('https://httpbin.org/get?name=jackson&age=100')
>>> print(response.text)

>>> data={'name':'ap','age':99}
>>> response=requests.get('https://httpbin.org/get',params=data)
>>> print(response.text)

#解析json
import json
>>> response=requests.get('https://httpbin.org/get',params=data)
>>> print(response.json)
>>> print(json.loads(response.text))#以上两个打印一样
>>> print(type(response.json()))
<class 'dict'>

#获取二进制数据,可以保持图片视频
>>> response=requests.get("https://github.com/favicon.ico")
>>> print(type(response.text),type(response.content))
<class 'str'> <class 'bytes'>
>>> print(response.text)#一堆乱码
>>> print(response.content)#一堆16进制数字
#保存二进制图片视频
>>> response=requests.get("https://github.com/favicon.ico")
>>> with open('favicon.ico','wb') as f:#命名为favicoc
... f.write(response.content)
... f.close()
#添加headers
>>> headers={'User-Agent':'。。。一堆码读出来的'}
>>> response=requests.get("https://www.zhihu.com/explore",headers=headers)
>>> print(response.text)

#基本post请求
>>> data={'name':'ap','age':99}
>>> response=requests.post('https://httpbin.org/post',data=data)
>>> print(response.text)
#headers post,报表表单
>>> data={'name':'ap','age':99}
headers={'User-Agent':'。。。一堆码读出来的'}
>>> response=requests.post("https://httpbin.org/post",data=data,headers=headers)
>>> print(response.json())

#response属性

#找headers https://mkyong.com/computer-tips/how-to-view-http-headers-in-google-chrome/

找到最下面的User-Agent: Mozilla 。。。

看到python非常全资料/python3爬虫实战/课时09