Search this blog

Saturday 21 May 2011

Worked example using the "Pivot" operator

Here is an example process using the "Pivot" operator that converts this input data

into this output data.

In the input data, each row corresponds to a transaction and sub transaction with an attribute value.

In the output data, the number of rows corresponds to the number of unique transactions. All the sub transactions for a transaction are combined. New attributes are created so that the attribute for a given sub transaction can be found by looking at the name. For example, "attribute_subTransaction2" is the attribute in the input data which comes from the "subTransaction2" rows.

This is done by setting the "group attribute" parameter of the "Pivot" operator to "transactionId" and the "index attribute" parameter to "subTransactionId". This causes grouping by "transactionId" and the values of the subTransactionId are used to create the names of the new attributes in the output data.

The "consider weights" check box allows weights to be handled although it's not clear how these would get fed into the operator; a problem for another day.
Edit: If there is a numeric attribute whose role is "weight", setting the check box above allows this to be aggregated within the group. Refer to myexperiment.org for an example.

The "skip constant attributes" check box should be left unchecked. If checked, an attribute that does not vary within the group, is ignored and will not appear in the output data. The observed behaviour seems to be a little different but that's also for another day.

The process also uses the "Generate Data by User Specification" and "Append" operators to make the fake data.

4 comments:

  1. Thank you for the post.

    ReplyDelete
  2. Thanks for the example. In answer to your "note to self", to use "check weights" you only need to apply the weight role to a numerical attribute.

    ReplyDelete
  3. Hi , do you know how to make de inverse process? unpivot?

    Thanks

    ReplyDelete
  4. Hello Thomas

    I made a brand new blog entry that does a simple de-pivot

    ReplyDelete