Skip to content

Inconsistent sentence splitting when sentence boundary is defined by r'[}\]]\.\s[A-Z]' #16

@chguenther

Description

@chguenther

Here is an example text that fails:
'''Ms. X was admitted on date]. Ultrasound
at the time of admission demonstrated pancreatic duct dilatation and
edematous gallbladder. She was admitted to the ICU.'''
This text should be split into three sentences. However, it is only split into two sentences. There is no sentence split between the first two sentences, i.e., between "]." and "Ultrasound".
The same behavior was observed when the "]" was replaced by a "}". It works properly when the "]" is replaced by a ")". No other characters were tried.

Here are some observations that might help to pinpoint the issue.

  1. It works properly with this text
    '''Ms. X was admitted on date]. Ultrasound
    at the time of admission demonstrated pancreatic duct dilatation and
    edematous gallbladder.'''
    Note that this text does not contain the third sentence.
  2. It works properly when a "The" is inserted at the beginning of the second sentence before "Ultrasound" to obtain this text:
    '''Ms. X was admitted on date]. The Ultrasound
    at the time of admission demonstrated pancreatic duct dilatation and
    edematous gallbladder. She was admitted to the ICU.'''

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions