爬虫入门
pip3 install requests selenium beautifulsoup4 pyquery pymysql pymongo redis flask django jupyter
安装各种库,安装MongoDB,redis,anaconda,pycharm,Python3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
| import requests
>>> response=requests.get('https://httpbin.org/get?name=jackson&age=100') >>> print(response.text)
>>> data={'name':'ap','age':99} >>> response=requests.get('https://httpbin.org/get',params=data) >>> print(response.text)
import json >>> response=requests.get('https://httpbin.org/get',params=data) >>> print(response.json) >>> print(json.loads(response.text)) >>> print(type(response.json())) <class 'dict'>
#获取二进制数据,可以保持图片视频 >>> response=requests.get("https://github.com/favicon.ico") >>> print(type(response.text),type(response.content)) <class 'str'> <class 'bytes'> >>> print(response.text)#一堆乱码 >>> print(response.content)#一堆16进制数字 #保存二进制图片视频 >>> response=requests.get("https://github.com/favicon.ico") >>> with open('favicon.ico','wb') as f: ... f.write(response.content) ... f.close()
>>> headers={'User-Agent':'。。。一堆码读出来的'} >>> response=requests.get("https://www.zhihu.com/explore",headers=headers) >>> print(response.text)
>>> data={'name':'ap','age':99} >>> response=requests.post('https://httpbin.org/post',data=data) >>> print(response.text)
>>> data={'name':'ap','age':99} headers={'User-Agent':'。。。一堆码读出来的'} >>> response=requests.post("https://httpbin.org/post",data=data,headers=headers) >>> print(response.json())
|
#找headers https://mkyong.com/computer-tips/how-to-view-http-headers-in-google-chrome/
找到最下面的User-Agent: Mozilla 。。。
看到python非常全资料/python3爬虫实战/课时09