python保存动态网页,python将网页保存为图片

本文目录一览：

1、python 如何抓取动态页面内容？
2、python怎么获取动态网页链接？
3、如何用Python爬取动态加载的网页数据

python 如何抓取动态页面内容？

输入url，得到html，我早就写了函数了

自己搜：

getUrlRespHtml

就可以找到对应的python函数：

#------------------------------------------------------------------------------

def getUrlResponse(url, postDict={}, headerDict={}, timeout=0, useGzip=False, postDataDelimiter="") :

"""Get response from url, support optional postDict,headerDict,timeout,useGzip

Note:

1. if postDict not null, url request auto become to POST instead of default GET

2 if you want to auto handle cookies, should call initAutoHandleCookies() before use this function.

then following urllib2.Request will auto handle cookies

"""

# makesure url is string, not unicode, otherwise urllib2.urlopen will error

url = str(url);

if (postDict) :

if(postDataDelimiter==""):

postData = urllib.urlencode(postDict);

else:

postData = "";

for eachKey in postDict.keys() :

postData += str(eachKey) + "=" + str(postDict[eachKey]) + postDataDelimiter;

postData = postData.strip();

logging.info("postData=%s", postData);

req = urllib2.Request(url, postData);

logging.info("req=%s", req);

req.add_header('Content-Type', "application/x-www-form-urlencoded");

else :

req = urllib2.Request(url);

defHeaderDict = {

'User-Agent' : gConst['UserAgent'],

'Cache-Control' : 'no-cache',

'Accept' : '*/*',

'Connection' : 'Keep-Alive',

};

# add default headers firstly

for eachDefHd in defHeaderDict.keys() :

#print "add default header: %s=%s"%(eachDefHd,defHeaderDict[eachDefHd]);

req.add_header(eachDefHd, defHeaderDict[eachDefHd]);

if(useGzip) :

#print "use gzip for",url;

req.add_header('Accept-Encoding', 'gzip, deflate');

# add customized header later - allow overwrite default header

if(headerDict) :

#print "added header:",headerDict;

for key in headerDict.keys() :

req.add_header(key, headerDict[key]);

if(timeout 0) :

# set timeout value if necessary

resp = urllib2.urlopen(req, timeout=timeout);

else :

resp = urllib2.urlopen(req);

#update cookies into local file

if(gVal['cookieUseFile']):

gVal['cj'].save();

logging.info("gVal['cj']=%s", gVal['cj']);

return resp;

#------------------------------------------------------------------------------

# get response html==body from url

#def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=False) :

def getUrlRespHtml(url, postDict={}, headerDict={}, timeout=0, useGzip=True, postDataDelimiter="") :

resp = getUrlResponse(url, postDict, headerDict, timeout, useGzip, postDataDelimiter);

respHtml = resp.read();

#here, maybe, even if not send Accept-Encoding: gzip, deflate

#but still response gzip or deflate, so directly do undecompress

#if(useGzip) :

#print "---before unzip, len(respHtml)=",len(respHtml);

respInfo = resp.info();

# Server: nginx/1.0.8

# Date: Sun, 08 Apr 2012 12:30:35 GMT

# Content-Type: text/html

# Transfer-Encoding: chunked

# Connection: close

# Vary: Accept-Encoding

# ...

# Content-Encoding: gzip

# sometime, the request use gzip,deflate, but actually returned is un-gzip html

# - response info not include above "Content-Encoding: gzip"

# eg:

# - so here only decode when it is indeed is gziped data

#Content-Encoding: deflate

if("Content-Encoding" in respInfo):

if("gzip" == respInfo['Content-Encoding']):

respHtml = zlib.decompress(respHtml, 16+zlib.MAX_WBITS);

elif("deflate" == respInfo['Content-Encoding']):

respHtml = zlib.decompress(respHtml, -zlib.MAX_WBITS);

return respHtml;

及示例代码：

url = "";

respHtml = getUrlRespHtml(url);

完全库函数，自己搜：

crifanLib.py

关于抓取动态页面，详见：

Python专题教程：抓取网站，模拟登陆，抓取动态网页

（自己搜标题即可找到）

python怎么获取动态网页链接？

四中方法：

'''

得到当前页面所有连接

'''

import requests

import re

from bs4 import BeautifulSoup

from lxml import etree

from selenium import webdriver

url = ''

r = requests.get(url)

r.encoding = 'gb2312'

# 利用 re

matchs = re.findall(r"(?=href=\").+?(?=\")|(?=href=\').+?(?=\')" , r.text)

for link in matchs:

print(link)

print()

# 利用 BeautifulSoup4 （DOM树）

soup = BeautifulSoup(r.text,'lxml')

for a in soup.find_all('a'):

link = a['href']

print(link)

print()

# 利用 lxml.etree （XPath）

tree = etree.HTML(r.text)

for link in tree.xpath("//@href"):

print(link)

print()

# 利用selenium（要开浏览器！）

driver = webdriver.Firefox()

driver.get(url)

for link in driver.find_elements_by_tag_name("a"):

print(link.get_attribute("href"))

driver.close()

如何用Python爬取动态加载的网页数据

动态网页抓取都是典型的办法

直接查看动态网页的加载规则。如果是ajax，则将ajax请求找出来给python。如果是js去处后生成的URL。就要阅读JS，搞清楚规则。再让python生成URL。这就是常用办法

办法2，使用python调用webkit内核的，IE内核，或者是firefox内核的浏览器。然后将浏览结果保存下来。通常可以使用浏览器测试框架。它们内置了这些功能

办法3，通过http proxy，抓取内容并进行组装。甚至可以嵌入自己的js脚本进行hook. 这个方法通常用于系统的反向工程软件

python保存动态网页,python将网页保存为图片

2022-11-17

js保存网页布局（js保存网页布局图）

本文目录一览： 1、网上下载的js素材怎麼加到网页的特定位置。 2、保存网页到本地后，再打开js没有效果了，CSS样式布局也出问题了 3、利用JS做页面跳转时，如何才能保留原有的URL地址啊？ 4、1

2023-12-08

怎么抽取网页整理,怎么抽取网页整理数据

2023-01-08

关于python将视频帧保存为图片的信息

2022-11-15

使用Python保存图片

2023-05-10

Python保存——从多个方面详解

2023-05-21

pythoncsv保存成图片,python保存成csv文件

2023-01-04

python爬虫保存为csv,python爬虫保存图片

2022-11-19

Python保存图像

2023-05-18

网页python编辑（用python做网页）

2022-11-09

Python网页爬虫

2023-05-10

基于Python的图片保存技巧

2023-05-10

网页搬运python,视频搬运的网站

2022-11-19

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

python保存动态网页,python将网页保存为图片

本文目录一览：

python 如何抓取动态页面内容？

python怎么获取动态网页链接？

如何用Python爬取动态加载的网页数据

python保存动态网页,python将网页保存为图片

python异步保存图片,python 批量保存网页为图片

如何保存网页图片？

python将网页保存到本地（python将数据保存到本地）

python网页存储为pdf（python网页另存为）

用Python保存图像

Chrome保存网页为图片的全解析

python之爬取网页贴吧图片,python爬网站图片

js保存网页布局（js保存网页布局图）

怎么抽取网页整理,怎么抽取网页整理数据

关于python将视频帧保存为图片的信息

使用Python保存图片

Python保存——从多个方面详解

pythoncsv保存成图片,python保存成csv文件

python爬虫保存为csv,python爬虫保存图片

Python保存图像

网页python编辑（用python做网页）

Python网页爬虫

基于Python的图片保存技巧

网页搬运python,视频搬运的网站

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

python保存动态网页,python将网页保存为图片

本文目录一览：

python 如何抓取动态页面内容？

python怎么获取动态网页链接？

如何用Python爬取动态加载的网页数据

python保存动态网页,python将网页保存为图片

python异步保存图片,python 批量保存网页为图片

如何保存网页图片？

python将网页保存到本地（python将数据保存到本地）

python网页存储为pdf（python网页另存为）

用Python保存图像

Chrome保存网页为图片的全解析

python之爬取网页贴吧图片,python爬网站图片

js保存网页布局（js保存网页布局图）

怎么抽取网页整理,怎么抽取网页整理数据

关于python将视频帧保存为图片的信息

使用Python保存图片

Python保存——从多个方面详解

pythoncsv保存成图片,python保存成csv文件

python爬虫保存为csv,python爬虫保存图片

Python保存图像

网页python编辑（用python做网页）

Python网页爬虫

基于Python的图片保存技巧

网页搬运python,视频搬运的网站

人机检测，请谅解