I want to split up a text into sentences using javascript.
(我想使用javascript将文本拆分为句子。)
I used this Code so far:(到目前为止,我已使用此代码:)
str.replace(/([.?!])s*(?=[A-Z])/g, "$1|").split("|");
I had the problem that eg a text like:
(我有这样的问题,例如:)
This application claims the benefit of U.S. Prov. Pat. App. No.. This is a nice day. Today we are having a lot of fun.
Was split into:
(分为:)
This application claims the benefit of U.,S., Prov., Pat., App., No.,., This is a nice day., Today we are having a lot of fun.,
due to the dots contained in US , Prov., App.
(由于US,Prov。,App。中包含的点。)
and No.(和不。)
So i used this instead:
(所以我改用了这个:)
str.replace(/([.?!])s+(?=[A-Z])/g, "$1|").split("|");
s+ instead of s* solved only partially my problem.
(s +而不是s *仅部分解决了我的问题。)
US was not split up anymore but the rest was.(美国不再分裂,其余的分裂了。)
This application claims the benefit of U.S. Prov., Pat., App., No.,., This is a nice day., Today we are having a lot of fun.,
Is there a way to recognize abbreviations?
(有没有办法识别缩写?)
I can not define them all manually as my input text varies and i could get anything as input...(我无法手动定义它们,因为我的输入文本有所不同,我可以得到任何东西作为输入...)
ask by Mauritius translate from so
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…