本文目录一览:
- 1、Jsoup发送http请求,get和post两种方式,分别带参数和不带参数
- 2、求真正有效的可以模拟登录新浪微博的java代码,后续可以用Jsoup进行抓取。急急!!登录成功马上给分!
- 3、java的Jsoup登录有验证码网页获取登录后的cookie
- 4、Jsoup或者HttpClient抓取web页面时,data,userAgent,cookie(),timeout(),post();为什么要设置这些?
Jsoup发送http请求,get和post两种方式,分别带参数和不带参数
dependency
groupIdorg.jsoup/groupId
artifactIdjsoup/artifactId
version1.10.3/version
/dependency
public void JsoupGet() throws Exception{
Connection connect = Jsoup.connect(";password=lisi");
// 带参数开始
connect.data("username","zhangsan");
connect.data("password","lisi");
// 带参数结束
Document document = connect.get();
System.out.println(document.toString());
}
public void JsoupPost() throwsException{
Connection connect = Jsoup.connect(";password=lisi");
// 带参数开始
connect.data("username","zhangsan");
connect.data("password","lisi");
// 带参数结束
Document document = connect.post();
System.out.println(document.toString());
}
求真正有效的可以模拟登录新浪微博的java代码,后续可以用Jsoup进行抓取。急急!!登录成功马上给分!
package jsoupTest;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.jsoup.Connection.Method;
import org.jsoup.Connection.Response;
import org.jsoup.Jsoup;
public class JsoupTest {
public static void main(String[] args) throws IOException {
MapString, String map = new HashMap();
//map.put请根据自己的微博cookie得到
Response res = Jsoup.connect("别人的主页id")
.cookies(map).method(Method.GET).execute();
String s = res.body();
System.out.println(s);
String[] ss = s.split("scriptFM.view");
int i = 0;
// pl_content_homeFeed
// pl.content.homeFeed.index
ListString list = new ArrayList();
for (String x : ss) {
// System.out.println(i++ + "======================================");
// System.out.println(x.substring(0,
// x.length() 200 ? 200 : x.length()));
// System.out.println("===========================================");
if (x.contains("\"html\":\"")) {
String value = getHtml(x);
list.add(value);
System.out.println(value);
}
}
// content=ss[8].split("\"html\":\"")[1].replaceAll("(\\\\t|\\\\n)",
// "").replaceAll("\\\\\"", "\"").replaceAll("\\\\/", "/");
// content=content.substring(0,
// content.length()=13?content.length():content.length()-13);
// System.out.println(Native2AsciiUtils.ascii2Native(content));
}
public static String getHtml(String s) {
String content = s.split("\"html\":\"")[1]
.replaceAll("(\\\\t|\\\\n)", "").replaceAll("\\\\\"", "\"")
.replaceAll("\\\\/", "/");
content = content.substring(0,
content.length() = 13 ? content.length()
: content.length() - 13);
return Native2AsciiUtils.ascii2Native(content);
}
java的Jsoup登录有验证码网页获取登录后的cookie
首先是jar仓库:
dependency
groupIdorg.seleniumhq.selenium/groupId
artifactIdselenium-java/artifactId
version[3.0.1,)/version//获取最新的版本库
/dependency
dependency
groupIdorg.jsoup/groupId
artifactIdjsoup/artifactId
version1.8.2/version
typejar/type
/dependency
代码:
public static void getIndex2() {
//之前运行程序发现生成了N多个chrome driver进程,搞不懂为什么会有那么多进程产生,网上查了下,说起这个service有用,拿来试下,效果未知
ChromeDriverService service = new
ChromeDriverService.Builder().usingDriverExecutable(new
File("./driver/chromedriver.exe")).usingAnyFreePort().build();
try {
service.start();
} catch (IOException ex) {
Logger.getLogger(kechengbiaoIndex.class.getName()).log(Level.SEVERE, null, ex);
}
//end
//正式开始
//先定义浏览器驱动,我用chrome浏览器,网上下载一个chromedriver.exe,启动时需要加载
System.getProperties().setProperty("webdriver.chrome.driver", "./driver/chromedriver.exe");
Jsoup或者HttpClient抓取web页面时,data,userAgent,cookie(),timeout(),post();为什么要设置这些?
userAgent让服务器感觉访问者更像一个真实的浏览器在访问,cookie是看服务器需不需要,timeout还需要解释吗,你不设置有一个默认的超时时间