We're updating the issue view to help you get more done. 

Sentence break iterator, poor results with period followed by semicolon.

Description

Sentence break iterator does not give correct results in case period ‘.’ is superseded by a semi colon, a space and a Capital letter

Text to be used:
Give me you phone no.; I will call up and check on you tomorrow. Sam and I will go to office today.

Sentence Breaker does not evaluate correctly in case period ‘.’ is superseded by a semi colon, a space and a Capital letter. In this case, "I will call up and check on you tomorrow." is considered as the 2nd sentence.

Expectation: Sentence Breaker should consider "Sam and I will go to office today." as the 2nd sentence.

Sample Code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 #include <stdio.h> #include <unicode/brkiter.h> #include <stdlib.h> U_CFUNC int c_main(void); /* Creating and using Sentence boundaries */ int main( void ) { puts("ICU Break Iterator Sample Program\n"); BreakIterator* boundary; UnicodeString stringToExamine("Give me you phone no.; I will call up and check on you tomorrow. Sam and I will go to office today."); // print each sentence in forward and reverse order UErrorCode status = U_ZERO_ERROR; boundary = BreakIterator::createSentenceInstance( Locale::getUS(), status ); if (U_FAILURE(status)) { printf("failed to create sentence break iterator. status = %s", u_errorName(status)); exit(1); } boundary->setText(stringToExamine); puts("\n Sentence Boundaries... "); printEachForward(*boundary); delete boundary; puts("\nEnd C++ Break Iteration"); // Call the C version return c_main(); } /* Print each element in order: */ void printEachForward( BreakIterator& boundary) { int32_t start = boundary.first(); for (int32_t end = boundary.next(); end != BreakIterator::DONE; start = end, end = boundary.next()) { printTextRange( boundary, start, end ); } }

Environment

Status

Assignee

Andy Heninger

Reporter

TracBot

Labels

Time Needed

Days

tracCc

mark,ovinod_nair@50cd1a9a18375803,pedberg

tracCreated

Mar 31, 2011, 8:48 AM

tracOwner

andy

tracProject

all

tracReporter

rupichd@50cd1a9a18375803

tracStatus

accepted

tracWeeks

0.5

Components

Priority

medium