Skip to content

Comments

New feature: uppercase first letter in sentence, code cleanup#20

Open
ivoras wants to merge 2 commits intooliverguhr:mainfrom
ivoras:main
Open

New feature: uppercase first letter in sentence, code cleanup#20
ivoras wants to merge 2 commits intooliverguhr:mainfrom
ivoras:main

Conversation

@ivoras
Copy link

@ivoras ivoras commented Aug 19, 2024

Major change: making the first letter in the word following a "." or "?" uppercase (optional, defaults to off).
Minor changes: code cleanup, whitespace removal.

@oliverguhr
Copy link
Owner

Thanks for the PR! I always had the idea to add true casing to the model.

However, I see an issue here. For example, given the following text:

this is an test my name is oliver

the output would be:

this is an test. My name is oliver.

This true casing would only work after a "." or "?" not at the beginning of a sentence and not with "!" as we don't detect them.

@ivoras
Copy link
Author

ivoras commented Aug 21, 2024

This true casing would only work after a "." or "?" not at the beginning of a sentence and not with "!" as we don't detect them.

I don't know what you mean with "!", as the patch doesn't use it, but I've also noticed it doesn't capitalise the starting sentence of the text, so I've updated the patch.

I know this is not proper true-casing as that would probably involve also applying it to possible names inside sentences, but it's good enough for my needs. There's a model on HF that attempts to do that (1-800-BAD-CODE/xlm-roberta_punctuation_fullstop_truecase) but it's too buggy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants