Search this blog

Thursday 7 July 2011

Using regular expressions with the Replace (Dictionary) operator

The "Replace (Dictionary)" operator replaces occurrences of one nominal in one example set with another looked up from another example set.

By default, this replaces a continuous sequence regardless of its position in the nominal.

For example, if an attribute in the main example set contains the value "network" and the dictionary example set contains the value pair "work", "banana", the result of the operation would be "netbanana".

This is fine but if you want to limit to whole words only then you can use the "use regular expressions" parameter in the replace operator. To make this work, you also have to change the text within the dictionary for the nominal to be replaced with "\b" at the beginning and the end. In regular expression speak, this means match a whole word only.

In addition, if the word to be replaced contains reserved characters (from a regular expressions perspective) then "\Q" and "\E" have to be placed around the word.

One way to do this is to use a "generate attributes" operator and create a new attribute in the dictionary example set using the following expression.

"\\b\\Q"+word+"\\E\\b"
In this case, "word" is the attribute containing the word to be replaced. The "\" must be escaped with an additional "\" in order for it all to come out correctly.

The new attribute would then be used in the "from attribute" parameter of the "Replace (Dictionary)" operator. The "to attribute" would be set to the attribute within the example set dictionary that is the replacement value.

No comments:

Post a Comment