DOCX generation improvements

Recently we encountered some issues with DOCX templates created in Word 2010. Unlike its predecessor Word 2010 uses more complicated constructs for merge fields we use for placeholders and we did not recognize such definitions properly. We revised merge fields identification algorithm, fixed few other issues by the way and step by step ended up with completely new document generation algorithm.

Our old client-side .DOC generation module used Word Automation to expand the template. When we started working on server-side code, we simply replicated old algorithm with all the quirks – this helped us to ensure that movement from old to new technology will produce (nearly) identical results.

Yet, one part definitely needed improvement: the detail records generation. Due to lack of suitable methods in the automation our code had somehow sophisticated algorithm that tracked row and column index of each placeholder in the detail table and performed replacement of whole cells, not just the placeholders. This caused some formatting and layout restrictions, for example, the font and color should be defined for the cell as whole, not for the content of the cell; nested tables were not allowed.

Now, generator works more straightforward from user’s perspective. First we replace all master-level placeholder in the document wherever they are placed. Next, we generate detail tables one by one. Here we have to repeat some part of template for each detail record. Due to lack of non-visual markers in a Word documents and for backward compatibility we define repeatable item the table row. We find and locate all the placeholder for detail table and find innermost table row they have in common. This row will be repeatable block. For each record we clone the row with all placeholders, replace the placeholders and insert now-expanded content back into the table. Here are illustrations:

Column 1 Column 2 * Column 3
«Detail/Column 1» «Detail/Column 2» some text «Detail/Column 3»

The row marked gray will be repeatable block since it’s a common row for all detail placeholders. Since we now operate on a field level, single table cell can contain several placeholders and/or text around the placeholders.

Now, more complicated layout:

Master Info Detail
«Master Info»
Column 1 Column 2
«Detail/Column 1» «Detail/Column 2»

Here we nest detail table into another table. The repeatable block is the row in the inner table since this one is innermost common to both detail placeholders. And now even more complicated:

Column 1 Mixed Content
«Detail/Column 1»
Master Info Column 2
«Master Info» «Detail/Column 2»

Here, the row in outer table will be repeated since it’s common for both detail placeholders. There is no problem for inner table to contain master information – it will be simply repeated for each row in the detail table.

We are trying to preserve the text formatting wherever possible, but formatting should be applied to placeholder as a whole. Many Word’s formatting commands apply to a word the cursor placed on. As we replace the placeholder completely, we do not preserve the styling defined within the placeholder. Moreover, if multiple styles are defined for placeholder content, it’s not clear what style to use for content. For example, what font, size, style and color we should use when replacing this?

«Master Info»

As a bonus, we’ve added column usage information for the documents. Make any change in your app, save and columns will start to report their usage in the DOCX documents.

Tweet about this on TwitterShare on Facebook0Share on Google+0

2 thoughts on “DOCX generation improvements

Comments are closed.