Skip to content

tomwallace1990/Machete

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machete

Machete is a small program which helps chop down overly-long sentences in long documents. It takes in a .docx or .txt file containing prose (structured text) and outputs two results files. The main output is a .csv file containing a list of all the sentences in the input document ranked by length (decending). This allows the user to easily edit the longest sentences in the original document. The second output is a .txt file containing basic statistics such as average sentence length.

Please see 'Install_readme.txt' for more information on using Machete.

Written in Python 3.6.4.

Change-log

1.0. - Initial version.

1.1. - Changed splitter from simple '. ' to regex which significantly increases sentence detection accuracy.

1.2. - Changed splitter regex to increase accuracy and parse non-English unicode characters (which occur in authors names). New regex '(?<!\b\p{Lu}.)(?<=.|?|!)\s+(?=\p{Lu}|"|‘|'|“)' should be near optimal.

1.3. - Added (?<!St.) to the regex string to catch St. such as in 'St. Andrews'.

About

A small program to help cut down long sentences in word documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages