java读取pdf,java读取pdf文件

本文目录一览：

1、java 如何读取PDF文件内容
2、如何用java读取pdf文档的部分内容
3、怎么用java读取pdf文件内容
4、用Java 读取 PDF 遇到中文标签该怎么处理
5、怎么用java读取pdf中的表格

java 如何读取PDF文件内容

import java.io.File;

import java.io.FileOutputStream;

import java.io.OutputStreamWriter;

import java.io.Writer;

import java.net.MalformedURLException;

import java.net.URL;

import org.pdfbox.pdmodel.PDDocument;

import org.pdfbox.util.PDFTextStripper;

public class PdfReader {

public void readFdf(String file) throws Exception {

// 是否排序

boolean sort = false;

// pdf文件名

String pdfFile = file;

// 输入文本文件名称

String textFile = null;

// 编码方式

String encoding = "UTF-8";

// 开始提取页数

int startPage = 1;

// 结束提取页数

int endPage = Integer.MAX_VALUE;

// 文件输入流，生成文本文件

Writer output = null;

// 内存中存储的PDF Document

PDDocument document = null;

try {

// 首先当作一个URL来装载文件，如果得到异常再从本地文件系统//去装载文件

URL url = new URL(pdfFile);

//注意参数已不是以前版本中的URL.而是File。

document = PDDocument.load(pdfFile);

// 获取PDF的文件名

String fileName = url.getFile();

// 以原来PDF的名称来命名新产生的txt文件

if (fileName.length() 4) {

File outputFile = new File(fileName.substring(0, fileName

.length() - 4)

+ ".txt");

textFile = outputFile.getName();

}

} catch (MalformedURLException e) {

// 如果作为URL装载得到异常则从文件系统装载

//注意参数已不是以前版本中的URL.而是File。

document = PDDocument.load(pdfFile);

if (pdfFile.length() 4) {

textFile = pdfFile.substring(0, pdfFile.length() - 4)

+ ".txt";

}

// 文件输入流，写入文件倒textFile

output = new OutputStreamWriter(new FileOutputStream(textFile),

encoding);

// PDFTextStripper来提取文本

PDFTextStripper stripper = null;

stripper = new PDFTextStripper();

// 设置是否排序

stripper.setSortByPosition(sort);

// 设置起始页

stripper.setStartPage(startPage);

// 设置结束页

stripper.setEndPage(endPage);

// 调用PDFTextStripper的writeText提取并输出文本

stripper.writeText(document, output);

} finally {

if (output != null) {

// 关闭输出流

output.close();

}

if (document != null) {

// 关闭PDF Document

document.close();

}

/**

* @param args

public static void main(String[] args) {

// TODO Auto-generated method stub

PdfReader pdfReader = new PdfReader();

try {

// 取得E盘下的SpringGuide.pdf的内容

pdfReader.readFdf("E://SpringGuide.pdf");

} catch (Exception e) {

e.printStackTrace();

}

java读取pdf,java读取pdf文件

如何用java读取pdf文档的部分内容

你需要用到PDFbox api

例子如下

import java.io.File;

import java.io.IOException;

import org.apache.pdfbox.pdmodel.PDDocument;

import org.apache.pdfbox.text.PDFTextStripper;

import org.apache.pdfbox.text.PDFTextStripperByArea;

try {

PDDocument document = null;

document = PDDocument.load(new File("test.pdf"));

document.getClass();

if (!document.isEncrypted()) {

PDFTextStripperByArea stripper = new PDFTextStripperByArea();

stripper.setSortByPosition(true);

PDFTextStripper Tstripper = new PDFTextStripper();

String st = Tstripper.getText(document);

System.out.println("Text:" + st);

}

} catch (Exception e) {

e.printStackTrace();

}

怎么用java读取pdf文件内容

你可以把pdf转成word在进行读取

推荐使用转转大师pdf转word转换器，免费的在线工具

百度搜索下，在线免费转换就行了，不用下载注册，很方便

用Java 读取 PDF 遇到中文标签该怎么处理

直接使用系统字体读取或创建带中文的pdf，需要注意jar的版本。

dependency

groupIdcom.itextpdf/groupId

artifactIditextpdf/artifactId

version5.5.8/version

/dependency

dependency

groupIdcom.itextpdf/groupId

artifactIditext-asian/artifactId

version5.2.0/version

/dependency

dependency

groupIdcom.itextpdf.tool/groupId

artifactIdxmlworker/artifactId

version5.5.6/version

/dependency123456789101112131415

代码如下，覆写XMLWorkerFontProvider$getFont即可读取中文

public void createPdf(String src, String dest) throws IOException, DocumentException {

Document document = new Document();

PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));

document.open();

XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(src), null, new XMLWorkerFontProvider(){ public Font getFont(final String fontname, final String encoding,

final boolean embedded, final float size, final int style,

final BaseColor color) {

BaseFont bf = null;

try {

bf = BaseFont.createFont("C:/Windows/Fonts/SIMYOU.TTF",BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED);

} catch (Exception e) {

e.printStackTrace();

}

Font font = new Font(bf, size, style, color);

font.setColor(color);

return font;

}

});

document.close();

}1234567891011121314151617181920212223

创建时，使用系统（windows下）的字体即可

BaseFont baseFont = BaseFont.createFont("C:/Windows/Fonts/SIMYOU.TTF",BaseFont.IDENTITY_H,BaseFont.NOT_EMBEDDED);

Font font = new Font(baseFont);

怎么用java读取pdf中的表格

ITEXT插件方法

/**

* @param pdf PDF文件路径

* @param txt 输出文本文件路径

* @throws IOException

public void parsePdf(String pdf, String txt) throws IOException {

PdfReader reader = new PdfReader(pdf);

PrintWriter out = new PrintWriter(new FileOutputStream(txt));

Rectangle rect = new Rectangle(70, 80, 490, 580);

RenderFilter filter = new RegionTextRenderFilter(rect);

TextExtractionStrategy strategy;

for (int i = 1; i = reader.getNumberOfPages(); i++) {

strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);

out.println(PdfTextExtractor.getTextFromPage(reader, i, strategy));

}

out.flush();

out.close();

reader.close();

}

PDFBOX插件方法

PDDocument document = PDDocument.load( args[0] );

if( document.isEncrypted() )

{

document.decrypt( "" );

}

PDFTextStripperByArea stripper = new PDFTextStripperByArea();

stripper.setSortByPosition( true );

Rectangle rect = new Rectangle( 10, 280, 275, 60 );

stripper.addRegion( "class1", rect );

List allPages = document.getDocumentCatalog().getAllPages();

PDPage firstPage = (PDPage)allPages.get( 0 );

stripper.extractRegions( firstPage );

System.out.println( "Text in the area:" + rect );

System.out.println( stripper.getTextForRegion( "class1" ) );

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

java读取pdf,java读取pdf文件

本文目录一览：

java 如何读取PDF文件内容

如何用java读取pdf文档的部分内容

怎么用java读取pdf文件内容

用Java 读取 PDF 遇到中文标签该怎么处理

怎么用java读取pdf中的表格

java读取pdf,java读取pdf文件

java读取pdf,JAVA读取文件

Java读取PDF

使用Java读取PDF内容完全指南

java读取doc文本,java读取doc文件内容

java在线打开pdf文档,java显示pdf文件

Python读取PDF文件

php读取pdf内容,php读取pdf文件内容

python读取pdf文件尺寸,python读取pdf内容

java转pdf,java转pdf插件

java下载pdf,java下载pdf文件到本地

java生成pdf,java生成pdf表格

php采集网站pdf文件,php pdf读取

Python读取PDF文件的多方面阐述

Java PDF合并的实现

java教程pdf（java教程pdf百度网盘）

java转pdf,java转pdf乱码

Java Excel 转PDF

java教程pdf,java教程

java面试宝典,java面试宝典pdf

Windows 软件

Linux 软件

Mac 软件

安卓软件

各类文章

java读取pdf,java读取pdf文件

本文目录一览：

java 如何读取PDF文件内容

如何用java读取pdf文档的部分内容

怎么用java读取pdf文件内容

用Java 读取 PDF 遇到中文标签该怎么处理

怎么用java读取pdf中的表格

java读取pdf,java读取pdf文件

java读取pdf,JAVA读取文件

Java读取PDF

使用Java读取PDF内容完全指南

java读取doc文本,java读取doc文件内容

java在线打开pdf文档,java显示pdf文件

Python读取PDF文件

php读取pdf内容,php读取pdf文件内容

python读取pdf文件尺寸,python读取pdf内容

java转pdf,java转pdf插件

java下载pdf,java下载pdf文件到本地

java生成pdf,java生成pdf表格

php采集网站pdf文件,php pdf读取

Python读取PDF文件的多方面阐述

Java PDF合并的实现

java教程pdf（java教程pdf百度网盘）

java转pdf,java转pdf乱码

Java Excel 转PDF

java教程pdf,java教程

java面试宝典,java面试宝典pdf

人机检测，请谅解