Nagisaのにちじょう

Created2022-11-08|StudyNote|python•爬虫•note•实用库

selenium简单食用导入123from selenium import webdriver # 导入引擎方法from selenium.webdriver.common.by import By from selenium.webdriver.chrome.service import Service # 解决版本更新，方法过时后报错学会的方法–持续更新设置渲染规则1234567891011121314151617181920op = webdriver.ChromeOptions()# op.add_argument("--headless") 添加规则--无头模式（不显示浏览器窗口）op.headless = True # 后面发现无头模式，这种写法也行driver_path = Service(r'浏览器引擎的绝对路径') # 导入浏览器引擎driver = webdriver.Chrome(service = driver_path, options = op)driver.get("") # ...

爬虫--BeautifulSoup解析库使用

Created2022-11-06|StudyNote|python•爬虫•note•实用库

解析提取网页数据用的库 12345678910from bs4 in BeautifulSoup # 引入bs库。换代可能变，记得看文档find (tag, attributes, recursive, text, keywords) # 这是官方文档全的方法find_all (tag, attributes, recursive, text, keywords) # 这是官方文档全的方法find_all()变量 = BeautifulSoup( 要解析的字符串, ' 解释器 ' )# 解释器，用到一个内置库：html.parser 这个不是唯一的，是比较简单的实例讲解12345678910111213141516171819202122232425262728293031323334353637import requests from bs4 import BeautifulSoup# 请求&转换成字符串数据url = ' 'res = requests.get(url)htmltxt = res.text# ...

爬虫--Requests库使用

Created2022-10-04|StudyNote|python•爬虫•note•实用库

最基本123requests.get(url, headers, params) # 链接 # 请求头 # 参数(连接)requests.post(url, headers, params) 返回数据类型为：Response Response对象的常用属性12345678910111213response.status_code # 检查请求是否成功打印会返回状态码response.content # 把response对象转换为二进制数据一般抓取图片用response.text # 把response对象转换为字符串数据response.encoding # 定义response对象的编码response.cookies # 服务器发回的cookiesxxxjson = response.json() # 将返回的json格式数据转为字典例： res = requests.get(url) # 打印变量res的响应状态码，以检查请求是否成功 print(res.status_code) # 返回状态码具体查笔记使用 ...