Search this blog

Thursday 1 January 2015

English stop words

I wondered recently what is the exact word list used by the Filter Stopwords (English) operator.

I consulted the code for version 5.3 and I made the following file. It contains 395 words in total including some interesting ones ones like "wert" - the imperfect subjunctive of "were" found in Shakespearean English and "summat" - Yorkshire dialect for "something". I'm not sure these would always be stop words.

I assume the operator hasn't changed in the latest version but if it has, the list can be used with the Filter Stopwords (Dictionary) operator to make a facsimile of version 5.3. It would also be possible to use the list as the basis for your own stop word filtering operator and publishing the list would make research more reproducible.

No comments:

Post a Comment