It does this by using the "cut document" operator with the following regular expression.
This splits the document into words and each word is returned as a document based on the result of the capturing group; the brackets define the capturing group; everything inside these is returned as a value. The "Documents To Data" operator converts all the documents, one for each word, into examples in an example set. The text field name is set to "word" and this is used in the later operators.
An "Extract Macro" operator obtains the number of examples. This is the same as the count of all words. An aggregation is performed to count words and another "Extract Macro" operator determines the number of unique words from the resulting example set. These macros are reported as log values using the "Provide Macro As Log Value" operators and the log file is converted to an example set using "Log To Data".
Other regular expressions could be used if you want to ignore numbers and inside the "Cut Document" operator it is possible to have other filtering operators such as stemming.