vendredi 26 décembre 2014

JS RegEx to split text into sentences [duplicate]



This question already has an answer here:




I'm having a little difficulty with a regex for javascript;


Heres my fiddle: http://ift.tt/1GVDtZA


The function I have created is:



var splitSentences = function(text) {
var messy = text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g);
var clean = [];
for(var i = 0; i < messy.length; i++) {
var s = messy[i];
var sTrimmed = s.trim();
if(sTrimmed.length > 0) {
if(sTrimmed.indexOf(' ') >= 0) {
clean.push(sTrimmed);
} else {
var d = clean[clean.length - 1];
d = d + s;

var e = messy[i + 1];
if(e.trim().indexOf(' ') >= 0) {
d = d + e;
i++;
}
clean[clean.length - 1] = d;
}
}
}
return clean;
};


I get really good results with text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g); my big issue is that if a string has a quote after the period it is added to the next sentence.


So for example the following:



"Hello friend. My name is Mud." Said Mud.


Should be split into the following array:



['"Hello friend.', 'My name is Mud."', 'Said Mud.']


But instead it is the following:



['"Hello friend.', 'My name is Mud.', '" Said Mud.']


(See the quote in the 'Said Mud' string)


Can anyone help me with this OR point me to a good JavaScript library that can split text into Paragraphs, Sentences and Words? I found blast.js but I am using Angular.js and it did not integrate well at all.





Aucun commentaire:

Enregistrer un commentaire