You can use preg_split()
combined with a PCRE lookahead condition to split the string after each occurance of .
, ;
, :
, ?
, !
, .. while keeping the actual punctuation intact:
Code:
$subject = 'abc sdfs. def ghi; this is [email protected]! asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<=[.?!;:])s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => def ghi;
[2] => this is [email protected]!
[3] => asdasdasd?
[4] => abc xyz
)
You can also add a blacklist for abbreviations (Mr., Mrs., Dr., ..) that should not be split into own sentences by inserting a negative lookbehind assertion:
$subject = 'abc sdfs. Dr. Foo said he is not a sentence; asdasdasd? abc xyz';
// split on whitespace between sentences preceded by a punctuation mark
$result = preg_split('/(?<!Mr.|Mrs.|Dr.)(?<=[.?!;:])s+/', $subject, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
Result:
Array
(
[0] => abc sdfs.
[1] => Dr. Foo said he is not a sentence;
[2] => asdasdasd?
[3] => abc xyz
)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…