Converting RedPen documentation to AsciiDoc

We recently converted the RedPen documentation from RST (reStructured Text) to AsciiDoc, primarily to provide a working example of AsciiDoc markup to use as a test-case for RedPen.

This initial attempt to convert the RST documents involved using pandoc v1.13.2, an automated markup translator.

We processed the resulting .adoc files using AsciiDoctor.

Pandoc correctly converted in-line formatting, and automatically provided anchors for all headings. It quickly enabled us to get basic AsciiDoc versions of the existing files.

However, although pandoc is a very useful and powerful tool, it encountered a few problems converting our RST files to AsciiDoc. This was not totally unexpected, since there are markup options that cannot be directly translated between reStructured Text and AsciiDoc. However, some of the problems encountered meant that a significant amount of text had to to be reconverted by hand.

The issues we encountered were:

Missing tables

Several of the tables in the source documents were totally absent in the AsciiDoc files pandoc created. For example, this RST text:

SentenceLength validator checks the length of sentences in the input document. If the length of the sentence is greater than the specified maximum length, the validator generates a warning.

.. table::

  ============== ============= ============================
  Property       Default Value Description
  ============== ============= ============================
  ``"max_len"``  50            Maximum length of sentence.
  ============== ============= ============================

was translated by pandoc to:

 
SentenceLength validator checks the length of sentences in the input
document. If the length of the sentence is greater than the specified
maximum length, the validator generates a warning.

The table is completely absent. The correct AsciiDoc table is as follows:

[options="header"]
|====
|Property        |Default Value  |Description
|``max_len``     |50             |Maximum length of sentence.
|====

Source blocks

Source blocks in AsciiDoc should be formatted as follows:

[ source,xml]
----
<validators>
    <validator name="SentenceLength">
        <property name="max_len" value="200"/>
    </validator>
    <validator name="InvalidSymbol" />
    <validator name="SpaceWithSymbol" />
    <validator name="SectionLength">
        <property name="max_num" value="2000"/>
    </validator>
    <validator name="ParagraphNumber" />
 </validators>
---- 

However, the converted version was translated as:

code,sourceCode,xml------------------------------------------------------------------------------------------------
code,sourceCode,xml
<validators>
    <validator name="SentenceLength">
        <property name="max_len" value="200"/>
    </validator>
    <validator name="InvalidSymbol" />
    <validator name="SpaceWithSymbol" />
    <validator name="SectionLength">
        <property name="max_num" value="2000"/>
    </validator>
    <validator name="ParagraphNumber" />
 </validators>
------------------------------------------------------------------------------------------------

The format did not render properly when processed with AsciiDoctor.

Unsupported :ref: and :doc:

The RST source files included :ref: and :doc: references, as in:

RedPen supports default symbols for "en" and "ja", which are 
described in :ref:`en-default-symbol-setting` 
and :ref:`ja-default-symbol-setting`.

was converted to:

RedPen supports default symbols for ``en'' and ``ja'', which are
described in en-default-symbol-setting 
and ja-default-symbol-setting.

Although not 100% correct, we were hoping for something more like the following, which would preserve the notion that a link was involved:

RedPen supports default symbols for "en" and "ja", which are 
described in <<en-default-symbol-setting>> 
and <<ja-default-symbol-setting>>.

Heading levels

Although pandoc converted our headings perfectly correctly, it used the AsciiDoc heading style which is arguably the least flexible. Pandoc chose the following (correct) translation:

Heading Level 1
---------------
Heading Level 2
~~~~~~~~~~~~~~~
Heading Level 3
^^^^^^^^^^^^^^^

However, we would have preferred the easier-to-edit alternate format:

= Heading Level 1
== Heading Level 2
=== Heading Level 3
==== Heading Level 4

Other considerations

AsciiDoctor does not currently support the creation of a table of contents that spans multiple source documents. Given this limitation, we decided to combine all RedPen documents into a single HTML page. Although the page is larger, it is easier to navigate on some devices, and does not require any additional tools to build a surrounding multi-document index or menu.

Summary

Converting documents between different markup formats is reasonably straightforward, and tools such as pandoc are excellent choices for making headway quickly and easily. However, such tools do not always support all markup formats equally, and there may still be plenty of manual validation and editing to get your documents in good order.

Although pandoc v1.13.2 is not the current version, it was the version available at the time on Fedora 23. However, at the time of writing, the latest version available via http://pandoc.org/try/ still appears to produce the same translation as we encountered.

Converting RedPen documentation to AsciiDoc