php下curl与file,phpcurl

本文目录一览：

1、php中curl爬虫怎么样通过网页获取所有链接
2、php获取数据为什么curl获取不完整？而用file_get_contents能获取完整？
3、php如何获取通过CURL或file_get_contents抓取者的IP地址
4、php curl 为什么比file

php中curl爬虫怎么样通过网页获取所有链接

本文承接上面两篇，本篇中的示例要调用到前两篇中的函数，做一个简单的URL采集。一般php采集网络数据会用file_get_contents、file和cURL。不过据说cURL会比file_get_contents、file更快更专业，更适合采集。今天就试试用cURL来获取网页上的所有链接。示例如下：

?php

* 使用curl 采集hao123.com下的所有链接。

include_once('function.php');

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, '');

// 只需返回HTTP header

curl_setopt($ch, CURLOPT_HEADER, 1);

// 页面内容我们并不需要

// curl_setopt($ch, CURLOPT_NOBODY, 1);

// 返回结果，而不是输出它

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$html = curl_exec($ch);

$info = curl_getinfo($ch);

if ($html === false) {

echo "cURL Error: " . curl_error($ch);

}

curl_close($ch);

$linkarr = _striplinks($html);

// 主机部分，补全用

$host = '';

if (is_array($linkarr)) {

foreach ($linkarr as $k = $v) {

$linkresult[$k] = _expandlinks($v, $host);

}

printf("p此页面的所有链接为：/ppre%s/pren", var_export($linkresult , true));

function.php内容如下（即为上两篇中两个函数的合集）：

?php

function _striplinks($document) {

preg_match_all("'s*as.*?hrefs*=s*(["'])?(?(1) (.*?)\1 | ([^s]+))'isx", $document, $links);

// catenate the non-empty matches from the conditional subpattern

while (list($key, $val) = each($links[2])) {

if (!empty($val))

$match[] = $val;

} while (list($key, $val) = each($links[3])) {

if (!empty($val))

$match[] = $val;

}

// return the links

return $match;

}

/*===================================================================*

Function: _expandlinks

Purpose: expand each link into a fully qualified URL

Input: $links the links to qualify

$URI the full URI to get the base from

Output: $expandedLinks the expanded links

*===================================================================*/

function _expandlinks($links,$URI)

{

$URI_PARTS = parse_url($URI);

$host = $URI_PARTS["host"];

preg_match("/^[^?]+/",$URI,$match);

$match = preg_replace("|/[^/.]+.[^/.]+$|","",$match[0]);

$match = preg_replace("|/$|","",$match);

$match_part = parse_url($match);

$match_root =

$match_part["scheme"]."://".$match_part["host"];

$search = array( "|^http://".preg_quote($host)."|i",

"|^(/)|i",

"|^(?!http://)(?!mailto:)|i",

"|/./|",

"|/[^/]+/../|"

);

$replace = array( "",

$match_root."/",

$match."/",

"/",

"/"

);

$expandedLinks = preg_replace($search,$replace,$links);

return $expandedLinks;

}

php下curl与file,phpcurl

php获取数据为什么curl获取不完整？而用file_get_contents能获取完整？

因为，PHP CURL库默认1024字节的长度不等待数据的返回，所以你那段代码需增加一项配置：

curl_setopt($ch, CURLOPT_HTTPHEADER, array('Expect:'));

给你一个更全面的封装方法：

function req_curl($url, $status = null, $options = array())

{

$res = '';

$options = array_merge(array(

'follow_local' = true,

'timeout' = 30,

'max_redirects' = 4,

'binary_transfer' = false,

'include_header' = false,

'no_body' = false,

'cookie_location' = dirname(__FILE__) . '/cookie',

'useragent' = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1',

'post' = array() ,

'referer' = null,

'ssl_verifypeer' = 0,

'ssl_verifyhost' = 0,

'headers' = array(

'Expect:'

) ,

'auth_name' = '',

'auth_pass' = '',

'session' = false

) , $options);

$options['url'] = $url;

$s = curl_init();

if (!$s) return false;

curl_setopt($s, CURLOPT_URL, $options['url']);

curl_setopt($s, CURLOPT_HTTPHEADER, $options['headers']);

curl_setopt($s, CURLOPT_SSL_VERIFYPEER, $options['ssl_verifypeer']);

curl_setopt($s, CURLOPT_SSL_VERIFYHOST, $options['ssl_verifyhost']);

curl_setopt($s, CURLOPT_TIMEOUT, $options['timeout']);

curl_setopt($s, CURLOPT_MAXREDIRS, $options['max_redirects']);

curl_setopt($s, CURLOPT_RETURNTRANSFER, true);

curl_setopt($s, CURLOPT_FOLLOWLOCATION, $options['follow_local']);

curl_setopt($s, CURLOPT_COOKIEJAR, $options['cookie_location']);

curl_setopt($s, CURLOPT_COOKIEFILE, $options['cookie_location']);

if (!empty($options['auth_name']) is_string($options['auth_name']))

{

curl_setopt($s, CURLOPT_USERPWD, $options['auth_name'] . ':' . $options['auth_pass']);

}

if (!empty($options['post']))

{

curl_setopt($s, CURLOPT_POST, true);

curl_setopt($s, CURLOPT_POSTFIELDS, $options['post']);

//curl_setopt($s, CURLOPT_POSTFIELDS, array('username' = 'aeon', 'password' = '111111'));

}

if ($options['include_header'])

{

curl_setopt($s, CURLOPT_HEADER, true);

}

if ($options['no_body'])

{

curl_setopt($s, CURLOPT_NOBODY, true);

}

if ($options['session'])

{

curl_setopt($s, CURLOPT_COOKIESESSION, true);

curl_setopt($s, CURLOPT_COOKIE, $options['session']);

}

curl_setopt($s, CURLOPT_USERAGENT, $options['useragent']);

curl_setopt($s, CURLOPT_REFERER, $options['referer']);

$res = curl_exec($s);

$status = curl_getinfo($s, CURLINFO_HTTP_CODE);

curl_close($s);

return $res;

}

php如何获取通过CURL或file_get_contents抓取者的IP地址

百度搜一下防止采集方面的知识，CURL或file_get_contents可以模拟用户行为，获取ip跟普通用户ip其实是一样的，关键是怎么去区别他们，这就需要在客户端做手脚，一般都用js来在客户端做手脚来区别。

php curl 为什么比file

curl为什么比file_get_contents慢？

还是啥问题？

呃，问太多，我也不懂，反正用就是了，听从大神的测试结果，或者自己不服了，也去跑个测试，实在还想问，那就读源代码~

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

php下curl与file,phpcurl

本文目录一览：

php中curl爬虫怎么样通过网页获取所有链接

php获取数据为什么curl获取不完整？而用file_get_contents能获取完整？

php如何获取通过CURL或file_get_contents抓取者的IP地址

php curl 为什么比file

php下curl与file,phpcurl

php的curl可以模拟ip,curl_init php

php下的curl库,php curl 下载文件

phpcurllinux的简单介绍

php错误类型500,php报错500

php执行与curl区别（php curlfile）

php教程笔记复习1（细说php读书笔记）

PHP Curl 教学

什么叫php的curl操作（php curl）

php基础学习笔记下,细说php读书笔记

php利用curl发送文件,php curl 下载文件

php里的file,php里的类

php用curl时,php curl超时时间

php防止curl访问,php curl请求

php第三节笔记,php读书笔记

php中利用file,php运用

php爬虫学习笔记1（php怎么爬数据）

php通过curl上传文件,php curl post请求

怎么使用php中的curl呢（php curl教程）

学习php随笔,PHP笔记

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

php下curl与file,phpcurl

本文目录一览：

php中curl爬虫 怎么样通过网页获取所有链接

php获取数据为什么curl获取不完整？而用file_get_contents能获取完整？

php如何获取通过CURL或file_get_contents抓取者的IP地址

php curl 为什么比file

php下curl与file,phpcurl

php的curl可以模拟ip,curl_init php

php下的curl库,php curl 下载文件

phpcurllinux的简单介绍

php错误类型500,php报错500

php执行与curl区别（php curlfile）

php教程笔记复习1（细说php读书笔记）

PHP Curl 教学

什么叫php的curl操作（php curl）

php基础学习笔记下,细说php读书笔记

php利用curl发送文件,php curl 下载文件

php里的file,php里的类

php用curl时,php curl超时时间

php防止curl访问,php curl请求

php第三节笔记,php读书笔记

php中利用file,php运用

php爬虫学习笔记1（php怎么爬数据）

php通过curl上传文件,php curl post请求

怎么使用php中的curl呢（php curl教程）

学习php随笔,PHP笔记

人机检测，请谅解

php中curl爬虫怎么样通过网页获取所有链接