Drupal Book XML import/export

Over the past few days I've been revisiting XML export and import from Drupal book module. I've introduced a simpler XML structure, to support the basic task of export and import. The definition is still in flux as I experiment with what information needs to to be saved, and how it is represented. Eventually we will need to be able to specify what information is to be exported/imported.

Another set of questions revolves around how this functionality should be packaged and presented. How much should live in node.module, how much in book.module, in another module? Some of the functionality can be applied to other types of Drupal content. Some of what I'm doing overlaps with the existing XML import/export module.

At the moment we have export to DocBook XML, with content stored as CDATA. I've been able to edit this in oXygen, and I've been able to produce PDF output via PassiveTeX and FOP. Of course the output is pretty ugly, because we're rendering CDATA - everything is treated as pre-formatted.

I also have been able to transform HTML into DocBook via the HTML2DocBook XSL stylesheets from within a Drupal PHP page. This opens the door to full DocBook export. Because HTML is not intended as a print markup language, the result of the transformation will in general have to be manually edited to produce a useful DocBook manuscript. Because this XSLT transformation must be applied to each node of a book during export, this is likely to be a compute-intensive operation.

Finally, I have various programs to do things with the exported 'Drupal' XML. One such program imports the XML back into Drupal, either as an entirely new book hierarchy, or as an update of an existing book.

Another program explodes the XML export file into a directory hierarchy, with one HTML file per node, child nodes represented as subdirectories, and a metadata file per directory. This directory can be edited/manipulated with ordinary text editors and file utilities. Given a program to traverse such a structure and generate an XML file for import/update, this gives a way to do bulk desktop editing of a book, and move to a new Drupal site, or update an existing Drupal book.