The PMML file in this case is a model created by running a decision tree algorithm on the iris data set. Here's a snapshot - the highlighted parts are to help explain later.
The process writes this to c:\temp\Iris.xml.pmml and then reads it back in (the subprocess operator allows this to be synchronised). The file is handled as a document by RapidMiner.
Next, the process uses the following XPath to split the document into chunks (the PMML is probably not required).
/xmlns:PMML//xmlns:SimplePredicate
The xmlns is a namespace and this is provided by the following name value pair in the "cut document" operator.
This value is provided in the raw XML and it is important to get this correct. The "assume html" checkbox is unchecked.
The XPath itself simply finds all XML nodes somewhere beneath the PMML node that correspond to "SimplePredicate". By inspection of the PMML, this looked to be the correct way of determining the fields used on the decision tree.
Within the "cut document" operator, the inner operator extracts information using more XPath. This time, the XPath is looking for an attribute named "field" and the XPath to do this is as follows.
A namespace is not required here because it looks like the document fragments don't refer to one.
In the example, attributes a3 and a4 can be seen on the decision tree and these are also output as an example set.
No comments:
Post a Comment