您的位置:

Python 中的语法和拼写检查器

在下面的教程中,我们将讨论一个名为语言工具的 Python 包,并了解如何使用 Python 编程语言创建一个简单的语法和拼写检查器。

所以,让我们开始吧。

理解 Python 语言工具库

LanguageTool 是一个用于语法和拼写检查的开源工具,也被称为 OpenOffice 的拼写检查器。这个包允许程序员通过 Python 代码片段或命令行界面来检测语法和拼写错误。

如何安装语言工具库?

要安装 Python 库,我们需要‘pip’,一个管理从可信公共存储库中安装模块所需的包的框架。一旦我们有了【画中画】,我们就可以使用来自窗口命令提示符(CMD)或终端的命令安装语言工具库,如下所示:

语法:


$ pip install language-tool-python

language_tool_python 库会默认下载一个 LanguageTool 服务器作为 JAR 文件,并在后台执行,在本地检测语法错误。但是 LanguageTool 也提供了支持的公共 HTTP 校对 API 然而,通话次数是有限制的。

验证安装

一旦安装了库,我们可以通过创建一个空的 Python 程序文件并编写一个 import 语句来验证它,如下所示:

文件:验证. py


import language_tool_python

现在,保存上述文件,并在终端中使用以下命令执行它:

语法:


$ python verify.py

如果上述 Python 程序文件没有返回任何错误,则库安装正确。但是,在出现异常的情况下,请尝试重新安装库,并且还建议参考模块的官方文档。

使用 Python 语言工具库

在下一节中,我们将使用一个实际的例子来理解 Python 中语言工具库的工作。下面的 Python 脚本演示了语法错误的检测和纠正。我们将使用以下文本:

以上文本包含一些用粗体突出显示的语法和拼写错误。让我们考虑下面的 Python 脚本来理解语言工具**实用程序的工作原理:

示例:


# importing the package
import language_tool_python

# using the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021""" 

# getting the matches
my_matches = my_tool.check(my_text)

# printing matches
print(my_matches)

输出:

[Match({'ruleId': 'ENGLISH_WORD_REPEAT_RULE', 'message': 'Possible typo: you repeated a word', 'replacements': ['for'], 'offsetInContext': 43, 'context': "...Text' button. Click the colored phrases for for information on potential errors. or we ...", 'offset': 165, 'errorLength': 7, 'category': 'MISC', 'ruleIssueType': 'duplication', 'sentence': 'Click the colored phrases for for information on potential errors.'}), Match({'ruleId': 'UPPERCASE_SENTENCE_START', 'message': 'This sentence does not start with an uppercase letter.', 'replacements': ['Or'], 'offsetInContext': 43, 'context': '...or for information on potential errors. or we can use this text too see an some of...', 'offset': 206, 'errorLength': 2, 'category': 'CASING', 'ruleIssueType': 'typographical', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'TOO_TO', 'message': 'Did you mean "to see"?', 'replacements': ['to see'], 'offsetInContext': 43, 'context': '...tential errors. or we can use this text too see an some of the issues that LanguageTool...', 'offset': 230, 'errorLength': 7, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'EN_A_VS_AN', 'message': 'Use "a" instead of 'an' if the following word doesn't start with a vowel sound, e.g. 'a sentence', 'a university'.', 'replacements': ['a'], 'offsetInContext': 43, 'context': '...errors. or we can use this text too see an some of the issues that LanguageTool ca...', 'offset': 238, 'errorLength': 2, 'category': 'MISC', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['detect', 'defect', 'deduct', 'deject'], 'offsetInContext': 43, 'context': '...ome of the issues that LanguageTool can dedect. Whot do someone thinks of grammar chec...', 'offset': 282, 'errorLength': 6, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'or we can use this text too see an some of the issues that LanguageTool can dedect.'}), Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['Who', 'What', 'Shot', 'Whom', 'Hot', 'WHO', 'Whet', 'Whit', 'Whoa', 'Whop', 'WHT', 'Wot', 'W hot'], 'offsetInContext': 43, 'context': '...he issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? ...', 'offset': 290, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Whot do someone thinks of grammar checkers?'}), Match({'ruleId': 'PLEASE_NOT_THAT', 'message': 'Did you mean "note"?', 'replacements': ['note'], 'offsetInContext': 43, 'context': '...eone thinks of grammar checkers? Please not that they are not perfect. Style proble...', 'offset': 341, 'errorLength': 3, 'category': 'TYPOS', 'ruleIssueType': 'misspelling', 'sentence': 'Please not that they are not perfect.'}), Match({'ruleId': 'PM_IN_THE_EVENING', 'message': 'This is redundant. Consider using "P.M."', 'replacements': ['P.M.'], 'offsetInContext': 43, 'context': '...yle problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 Nov...', 'offset': 414, 'errorLength': 19, 'category': 'REDUNDANCY', 'ruleIssueType': 'style', 'sentence': 'Style problems get a blue marker: It is 7 P.M. in the evening.'})]

说明:

在上面的代码片段中,我们已经导入了所需的库,并定义了一个工具,该工具使用 LanguageTool 实用程序来检查文本中的语法和拼写错误。然后,我们定义了另一个字符串变量来存储我们想要检查的文本段落。然后,我们使用 check() 功能检索匹配,并为用户打印它们。

因此,我们可以观察到我们有一个详细的字典,显示了规则标识、消息、替换、偏移上下文、上下文、偏移等等。我们可以在语言工具社区找到每个规则标识的详细解释。

既然我们已经发现了错误,是时候纠正它们了。让我们考虑下面演示相同内容的 Python 脚本:

示例:


# importing the package
import language_tool_python

# using the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = """LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for for information on potential errors. or we can use this text too see an some of the issues that LanguageTool can dedect. Whot do someone thinks of grammar checkers? Please not that they are not perfect. Style problems get a blue marker: It is 7 P.M. in the evening. The weather was nice on Monday, 22 November 2021""" 

# getting the matches
my_matches = my_tool.check(my_text)

# defining some variables
myMistakes = []
myCorrections = []
startPositions = []
endPositions = []

# using the for-loop
for rules in my_matches:
    if len(rules.replacements) > 0:
        startPositions.append(rules.offset)
        endPositions.append(rules.errorLength + rules.offset)
        myMistakes.append(my_text[rules.offset : rules.errorLength + rules.offset])
        myCorrections.append(rules.replacements[0])

# creating new object
my_NewText = list(my_text) 

# rewriting the correct passage
for n in range(len(startPositions)):
    for i in range(len(my_text)):
        my_NewText[startPositions[n]] = myCorrections[n]
        if (i > startPositions[n] and i < endPositions[n]):
            my_NewText[i] = ""

my_NewText = "".join(my_NewText)

# printing the text
print(my_NewText)

输出:

LanguageTool provides utility to check grammar and spelling errors. We just have to paste the text here and click the 'Check Text' button. Click the colored phrases for information on potential errors. Or we can use this text to see a some of the issues that LanguageTool can detect. Who do someone thinks of grammar checkers? Please note that they are not perfect. Style problems get a blue marker: It is 7 P.M.. The weather was nice on Monday, 22 November 2021

说明:

在上面的代码片段中,我们包含了一些新的变量来处理错误、更正、开始位置和结束位置。然后,我们使用循环的来遍历 my_matches 中的规则,并用它们的更正替换错误。然后,我们将这些更正的文本存储在一个列表中。最后,我们再次使用的循环来遍历列表中的字符串元素,将它们连接在一起,并为用户打印结果文本。

因此,我们成功地纠正了在前面的代码片段中发现的错误。

现在,让我们使用下面的 Python 脚本来观察我们之前捕获的错误以及它们各自的更正:

示例:


print(list(zip(myMistakes, myCorrections)))

输出:

[('for for', 'for'), ('or', 'Or'), ('too see', 'to see'), ('an', 'a'), ('dedect', 'detect'), ('Whot', 'Who'), ('not', 'note'), ('P.M. in the evening', 'P.M.')]

说明:

在上面的代码片段中,我们打印了文本中的错误列表及其相应的更正。

自动将建议应用于文本

让我们考虑一个简单的例子,演示如何使用 Python 中的语言工具库将建议自动应用于文本。

示例:


# importing the library
import language_tool_python

# creating the tool
my_tool = language_tool_python.LanguageTool('en-US')

# given text
my_text = 'A quick broun fox jumpps over a a little lazy dog.'

# correction
correct_text = my_tool.correct(my_text)

# printing some texts
print("Original Text:", my_text)
print("Text after correction:", correct_text)

输出:

Original Text: A quick broun fox jumpps over a a little lazy dog.
Text after correction: A quick brown fox jumps over a little lazy dog.

说明:

在上面的代码片段中,我们已经导入了所需的库,并为指定语言为美国英语的语言工具定义了工具。然后我们定义了一个字符串变量,并在其中存储了一些文本。然后,我们使用工具的纠正()功能自动纠正文本中的错误,并为用户打印结果文本。