Search this blog

Wednesday, 29 October 2014

Using Groovy to extract the last part of a folder structure

Imagine you are using "Loop Files" to find files one by one and import them perhaps using the "Read CSV" operator. The "Loop Files" operator provides macros such as file_path, file_name and so on to allow you to create meta data with the example set.

So if you have a folder name like this...
c:\users\andrew\bigdata\lotsofdata\subregion\
where each subregion contains many files and there are many different subregions. It makes sense to label all the files for a subregion. This can be done by using the folder name which is contained in the parent_path macro provided by the "Loop Files" operator. There is a lot of redundant information that it would be sensible to get rid of and I suppose it would be possible using some heavy combination of macro and attribute manipulation operators but I decided to write some Groovy to do it. The resulting script is simple.
String filePath = operator.getProcess().macroHandler.getMacro("parent_path")
String lastPart = filePath.tokenize('\\').last()
operator.getProcess().getMacroHandler().addMacro("subregion", lastPart);
It assumes a macro called parent_path which contains the folder name. The tokenize function splits this into tokens separated by "\" and the last one is returned using the last function. A macro called subregion is then created. This can be used as a normal macro.

No comments:

Post a Comment