Search this blog

Saturday 5 October 2013

Parsing macros with values containing backslashes

Let's imagine we are looping through some files and we happen to know that the files are buried somewhere in folders called chapterNN where NN is a number. How can we extract information about the location for each file individually?

Within a Loop Files operator, the parent_path macro would take values like this within the loop (I'm assuming Windows obviously).

c:\myGiantFolderStructure\mybook\chapter01
c:\myGiantFolderStructure\mybook\chapter02
...
c:\myGiantFolderStructure\mybook\chapter11

If we wanted to extract the last part of the folder name including the number to use in the loop we could try the Generate Macro operator with the replaceAll function.

The basic idea would be to configure the operator like so.

chapter = replaceAll("%{parent_path}", ".*(chapter.*)", "$1")

This matches the entire macro with a regular expression but locates the word "chapter" and whatever follows it inside a capturing group. This replaces the entire value of the parent_path macro. The result should be a new macro called chapter with values from chapter01 upwards.

Unfortunately, this doesn't work because the backslashes cause trouble. I'm guessing but I think the replaceAll function (and its siblings, replace and concat) try to parse the string within the macro and get confused by treating the backslash as an escape character.

Fortunately, there is a solution: a little known function called macro. This function simply returns the string representation of a named macro.

The expression would then look like this.

chapter = replaceAll(macro("parent_path"), ".*(chapter.*)", "$1")

Knowing that backslashes are processed enables to us work out how to pass them simply by escaping them. If we felt we needed a more sophisticated match inside the capturing group to ensure we picked up numbers, we would do the following.

chapter = replaceAll(macro("parent_path"), ".*(chapter\\d+)", "$1")

This matches only if there is at least one number after the word "chapter". The double backslash becomes a single backslash when the regular expression is evaluated and \d+ means one or more numbers.



2 comments:

  1. Thank you so much, that did the trick! :)

    ReplyDelete
  2. Oh my! You saved my day!! Many thanks!

    ReplyDelete