czwartek, 16 sierpnia 2012

(Not so) simple app configuration under Linux with xml schema and xslt

Configuration files for application in embedded Linux. The point is to create simple, flexible and error-prone mechanism for configurable values to be applied for application.

Requirements:
- default values of configuration items compiled into app (to avoid unnecessary read from file and consistency problems),
- possibility for configuration items to be changed using some kind of human-readable flat (no value nesting) text file,
- types of configuration items: string, int, bool, enum, ...
- check of items (type safety) during compile-time,
- different variants of settings stored and used for appropriate build target,
- basic linux tools shall be used (here: xmllint, xsltproc).

Solution:
- configuration metadata stored in xml schema file,
- configuration values for different targets stored in separate xml files,
- xml schema validates xml files in compile time,
- xslt creates header files and human-readable text file to be used on target.

As configuration shall be defined for different compilation targets with potentially different configuration items, base configuration schema with base types definitions is required. Here is the file:

<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.mycorp.com/path"
        xmlns:es="http://www.mycorp.com/path"
        elementFormDefault="qualified">

   <annotation>
      <documentation xml:lang="en">
         Default configuration file.
         The file provides base types allowed for configuration items.
         Author: Andrzej Polanski (andrzej.polanski(at)gmail.com)
      </documentation>
   </annotation>

   <complexType name="uint32">
      <simpleContent>
         <extension base="es:uint32Internal">
            <attribute name="type" type="string" use="required" fixed="uint32"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <!-- type of uint32 that allows for hex, octal or decimal input -->
   <!-- type is realized as string therefore is not ordered value space (does not allow for minInclusive, etc.) -->
   <complexType name="uint32hod">
      <simpleContent>
         <extension base="es:uint32hodInternal">
            <attribute name="type" type="string" use="required" fixed="uint32hod"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <complexType name="uint16">
      <simpleContent>
         <extension base="es:uint16Internal">
            <attribute name="type" type="string" use="required" fixed="uint16"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <!-- type of uint16 that allows for hex, octal or decimal input -->
   <!-- type is realized as string therefore is not ordered value space (does not allow for minInclusive, etc.) -->
   <complexType name="uint16hod">
      <simpleContent>
         <extension base="es:uint16hodInternal">
            <attribute name="type" type="string" use="required" fixed="uint16hod"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <complexType name="uint8">
      <simpleContent>
         <extension base="es:uint8Internal">
            <attribute name="type" type="string" use="required" fixed="uint8"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <!-- type of uint8 that allows for hex, octal or decimal input -->
   <!-- type is realized as string therefore is not ordered value space (does not allow for minInclusive, etc.) -->
   <complexType name="uint8hod">
      <simpleContent>
         <extension base="es:uint8hodInternal">
            <attribute name="type" type="string" use="required" fixed="uint8hod"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <complexType name="bool">
      <simpleContent>
         <extension base="es:boolInternal">
            <attribute name="type" type="string" use="required" fixed="bool"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <complexType name="string">
      <simpleContent>
         <extension base="string">
            <attribute name="type" type="string" use="required" fixed="string"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>

   <complexType name="filePathCleanAbs">
      <simpleContent>
         <extension base="es:cleanAbsPathInernal">
            <attribute name="type" type="string" use="required" fixed="filePathCleanAbs"/>
            <attribute name="comment" type="string" use="optional"/>
         </extension>
      </simpleContent>
   </complexType>


   <simpleType name="uint32Internal">
      <restriction base="unsignedInt">
         <pattern value="[1-9][0-9]*|0"/>
      </restriction>
   </simpleType>

   <simpleType name="uint32hodInternal">
      <union>
         <!--decimal-->
         <simpleType>
            <restriction base="unsignedInt">
               <pattern value="[1-9][0-9]*"/>
            </restriction>
         </simpleType>
         <!--hexadecimal-->
         <simpleType>
            <restriction base="string">
               <pattern value="0[xX]0*[A-Fa-f0-9]{1,8}"/>
            </restriction>
         </simpleType>
         <!--octal-->
         <simpleType>
            <restriction base="nonNegativeInteger">
               <pattern value="0[0-7]*"/>
               <maxInclusive value="37777777777"/>
            </restriction>
         </simpleType>
      </union>
   </simpleType>

   <simpleType name="uint16Internal">
      <restriction base="unsignedShort">
         <pattern value="[1-9][0-9]*|0"/>
      </restriction>
   </simpleType>

   <simpleType name="uint16hodInternal">
      <union>
         <!--decimal-->
         <simpleType>
            <restriction base="unsignedShort">
               <pattern value="[1-9][0-9]*"/>
            </restriction>
         </simpleType>
         <!--hexadecimal-->
         <simpleType>
            <restriction base="string">
               <pattern value="0[xX]0*[A-Fa-f0-9]{1,4}"/>
            </restriction>
         </simpleType>
         <!--octal-->
         <simpleType>
            <restriction base="nonNegativeInteger">
               <pattern value="0[0-7]*"/>
               <maxInclusive value="177777"/>
            </restriction>
         </simpleType>
      </union>
   </simpleType>

   <simpleType name="uint8Internal">
      <restriction base="unsignedByte">
         <pattern value="[1-9][0-9]*|0"/>
      </restriction>
   </simpleType>

   <simpleType name="uint8hodInternal">
      <union>
         <!--decimal-->
         <simpleType>
            <restriction base="unsignedByte">
               <pattern value="[1-9][0-9]*"/>
            </restriction>
         </simpleType>
         <!--hexadecimal-->
         <simpleType>
            <restriction base="string">
               <pattern value="0[xX]0*[A-Fa-f0-9]{1,2}"/>
            </restriction>
         </simpleType>
         <!--octal-->
         <simpleType>
            <restriction base="nonNegativeInteger">
               <pattern value="0[0-7]*"/>
               <maxInclusive value="377"/>
            </restriction>
         </simpleType>
      </union>
   </simpleType>

   <simpleType name="boolInternal">
      <restriction base="string">
         <enumeration value="true"/>
         <enumeration value="false"/>
      </restriction>
   </simpleType>

   <simpleType name="cleanAbsPathInernal">
      <restriction base="string">
         <!-- clean absolute path -->
         <pattern value="/([a-z0-9_\.]+/?)+[a-z0-9_\.]+"/>
      </restriction>
   </simpleType>

</schema>



As one can see unsigned integer, bool, string and file path types are defined. There are validation rules provided for every type. Unsigned integer types can be declared in two forms: as decimal value (uint8, uint16 and uint32) or given in decimal, hex or octal form (uint8hod, uint16hod and uint32hod). Such distinction is to make it explicit and avoid mistakes as typing decimal with leading 0 which will be interpreted as octal. Range checking is provided for all integer types.
Boolean type is simply enumeration and string type is extension of standard xsd string.
All the types have additional attributes: 'comment' and 'type'. Former is optional, the latter is fixed and required. It is due to limitation of xslt 1.0, which is not schema aware. Therefore type information has to be included in xml file with values. Due to fixed attribute such solution is safe from mistyping.
Last of the common types defined is "filePathCleanAbs" which requires full path (absolute) to file or directory.
Similarly other types e.g. signed integer type can be defined.

Next step is to define target configuration schema (one or more, depending on project and its versions) e.g.:


<schema xmlns="http://www.w3.org/2001/XMLSchema"
        targetNamespace="http://www.mycorp.com/path"
        xmlns:es="http://www.mycorp.com/path"
        elementFormDefault="qualified">

   <include schemaLocation="config_def_base.xsd"/>

   <annotation>
      <documentation xml:lang="en">
         Default configuration file.
         The file provides base types allowed for configuration items.
         Author: Andrzej Polanski (andrzej.polanski(at)gmail.com)
      </documentation>
   </annotation>

   <element name="config">
      <complexType>
         <sequence>
            <element name="PARAMS_STRUCT">
               <complexType>
                  <sequence>
                     <element name="FIRST" type="es:uint8"/>
                     <element name="SECOND" type="es:uint16hod"/>
                     <element name="THIRD" type="es:uint32hod"/>
                  </sequence>
                  <attribute name="comment" type="string" use="optional"/>
               </complexType>
            </element>

            <element name="FILES">
               <complexType>
                  <sequence>
                     <element name="FILE_1">
                        <complexType>
                           <sequence>
                              <element name="FILE_NAME" type="es:filePathCleanAbs"/>
                           </sequence>
                           <attribute name="comment" type="string" use="optional"/>
                        </complexType>
                     </element>
                     <element name="FILE_2">
                        <complexType>
                           <sequence>
                              <element name="FILE_NAME" type="es:filePathCleanAbs"/>
                           </sequence>
                           <attribute name="comment" type="string" use="optional"/>
                        </complexType>
                     </element>
                  </sequence>
                  <attribute name="comment" type="string" use="optional"/>
               </complexType>
            </element>

            <element name="SETTING_BOOL" type="es:bool"/>
            <element name="SETTING_STRING" type="es:string"/>

         </sequence>
         <attribute name="appName" type="string" use="required"/>
      </complexType>
      
      <!-- global check for files' names uniqueness -->
      <unique name="uniqeFilesNames">
         <selector xpath=".//es:FILE_NAME"/>
         <field xpath="."/>
      </unique>
   </element>
</schema>

Additional remark on "unique" constraints - it is to avoid copy-paste mistakes. Value of each "FILE_NAME" element must be different - points to different file/directory (which sometimes can be too restrictive - it is up to specific configuration file).


Of course schema nesting can be deeper - there can be general schema for all project's variants which includes base schema, and then specific schema for variant. Such variant schema can extend base configuration (here "config" element) with "extension" xml schema element.

Configuration values for target project/version is stored in xml file e.g.


<?xml version="1.0"?>
<config xmlns="http://www.mycorp.com/path"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.mycorp.com/path
                            config.xsd"
        appName="APPNAME">

   <PARAMS_STRUCT comment="sizes of configuration variables">
      <FIRST type="uint8">100</FIRST>      <!-- decimal -->
      <SECOND type="uint16hod">0100</SECOND>  <!-- octal -->
      <THIRD type="uint32hod">0x100</THIRD>   <!-- hexadecimal -->
   </PARAMS_STRUCT>

   <FILES comment="settings for fileslist">
      <FILE_1 comment="names for file 1">
         <FILE_NAME type="filePathCleanAbs">/dir/dir2/filename1</FILE_NAME>
      </FILE_1>
      <FILE_2 comment="names for file 2">
         <FILE_NAME type="filePathCleanAbs">/dir/dir2/filename2</FILE_NAME>
      </FILE_2>
   </FILES>

   <SETTING_BOOL type="bool">true</SETTING_BOOL>
   <SETTING_STRING type="string">4:3</SETTING_STRING>

</config>




Unfortunately 'xsltproc' is not xslt 2.0 and therefore is not schema-aware. Because of this xml file has to store redundant type information. This is not dangerous because schema does not allow for attribute's "type" value change (it is 'fixed'), but additional typing (or copy-pasting) is necessary.

Now we can check if solution works and whether xml is well-formed and valid according to schema. Using linux's standard 'xmllint':

xmllint --noout --schema config.xsd config.xml



After successful validation required output shall be produced automatically from xml. In this case it is appropriate header file formed for further x-macro processing. To perform the task xslt can be used. Such transformation can look like following:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:es="http://www.mycorp.com/path">

<xsl:output omit-xml-declaration="yes"/>


<xsl:template match="/">
/*
 *=======================================================================
 * Automatically generated file - do not edit.
 *=======================================================================
 */

#ifndef _CONFIGURATION_INTERNAL_
#error "Do not include this file directly!"
#endif

CONFIGURATION_ITEMS_START

<xsl:call-template name="traverse">
   <xsl:with-param name="elem" select="es:config"/>
   <xsl:with-param name="pname" select="es:config/@appName"/>
</xsl:call-template>

CONFIGURATION_ITEMS_END

</xsl:template>




<xsl:template name="traverse">
   <xsl:param name="elem"/>
   <xsl:param name="pname"/>

   <xsl:if test="$elem">
      <xsl:for-each select="$elem/*">
         <xsl:choose>
            <xsl:when test="count(./*) > 0">
               <xsl:call-template name="traverse">
                  <xsl:with-param name="elem" select="."/>
                  <xsl:with-param name="pname" select="concat($pname, '_', local-name())"/>
               </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
               <xsl:apply-templates select=".">
                  <xsl:with-param name="pname" select="concat($pname, '_', local-name())"/>
               </xsl:apply-templates>
               <xsl:text>&#xa;</xsl:text>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:for-each>
   </xsl:if>

</xsl:template>


<xsl:template match="es:*[(@type='filePathCleanAbs') or (@type='string')]" priority="2">
   <xsl:param name="pname"/>
   <xsl:value-of select="concat('CONFIGURATION_ITEM(', $pname, ', string', ', &#x22;', ., '&#x22;)')"/>
</xsl:template>

<xsl:template match="es:*[@type]" priority="1">
   <xsl:param name="pname"/>
   <xsl:value-of select="concat('CONFIGURATION_ITEM(', $pname, ', ', ./@type, ', ', ., ')')"/>
</xsl:template>

</xsl:stylesheet>


In above example flat list of configuration items is created from xml input data. Transformation consists of recursive template "travers" which creates unique name for item and calls one of the general templates at the end of recursion.
Please note different template (specialization) for elements with 'filePathCleanAbs' value for 'type' attribute. The template also has higher priority because both apply to the elements with 'type' attribute.
To run transformation following command can be used:

xsltproc header.xsl config.xml >config.h


Output looks like following:

/*
 *=========================================================================
 * Automatically generated file - do not edit.
 *=========================================================================
 */

#ifndef _CONFIGURATION_INTERNAL_
#error "Do not include this file directly!"
#endif

CONFIGURATION_ITEMS_START

CONFIGURATION_ITEM(APPNAME_PARAMS_STRUCT_FIRST, uint8, 100)
CONFIGURATION_ITEM(APPNAME_PARAMS_STRUCT_SECOND, uint16hod, 0100)
CONFIGURATION_ITEM(APPNAME_PARAMS_STRUCT_THIRD, uint32hod, 0x100)
CONFIGURATION_ITEM(APPNAME_FILES_FILE_1_FILE_NAME, string, "/dir/dir2/filename1")
CONFIGURATION_ITEM(APPNAME_FILES_FILE_2_FILE_NAME, string, "/dir/dir2/filename2")
CONFIGURATION_ITEM(APPNAME_SETTING_BOOL, bool, true)
CONFIGURATION_ITEM(APPNAME_SETTING_STRING, string, "4:3")

CONFIGURATION_ITEMS_END




Last step is to create flat text file with values that can be read during embedded application startup to overwrite default value. In this case also xml file will be created. Therefore we need transformation from xml to xml in different form.


<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                xmlns:es="http://www.mycorp.com/path">

<xsl:template match="/">

   <xsl:comment>Automatically generated file.</xsl:comment>

   <xsl:element name="configuration">
      <xsl:text>&#xa;</xsl:text>
      <xsl:call-template name="traverse">
         <xsl:with-param name="elem" select="es:config"/>
         <xsl:with-param name="pname" select="es:config/@appName"/>
      </xsl:call-template>
   </xsl:element>

</xsl:template>

<xsl:template name="traverse">
   <xsl:param name="elem"/>
   <xsl:param name="pname"/>

   <xsl:if test="$elem">
      <xsl:for-each select="$elem/*">
         <xsl:choose>
            <xsl:when test="count(./*) > 0">
               <xsl:call-template name="traverse">
                  <xsl:with-param name="elem" select="."/>
                  <xsl:with-param name="pname" select="concat($pname, '_', local-name())"/>
               </xsl:call-template>
            </xsl:when>
            <xsl:otherwise>
               <xsl:text disable-output-escaping="yes">&lt;!--</xsl:text>
               <xsl:element name="{concat($pname, '_', local-name())}">
                  <xsl:value-of select="."/>
               </xsl:element>
               <xsl:text disable-output-escaping="yes">--&gt;</xsl:text>
               <xsl:text>&#xa;</xsl:text>
            </xsl:otherwise>
         </xsl:choose>
      </xsl:for-each>
   </xsl:if>

</xsl:template>

</xsl:stylesheet>


The transformation is quite similar to previous one. It is even simpler as we do not need specialization.
Interesting point here might be output that for each element:

<?xml version="1.0"?>
<!--Automatically generated file.-->
<configuration>
<!--<APPNAME_PARAMS_STRUCT_FIRST>100</APPNAME_PARAMS_STRUCT_FIRST>-->
<!--<APPNAME_PARAMS_STRUCT_SECOND>0100</APPNAME_PARAMS_STRUCT_SECOND>-->
<!--<APPNAME_PARAMS_STRUCT_THIRD>0x100</APPNAME_PARAMS_STRUCT_THIRD>-->
<!--<APPNAME_FILES_FILE_1_FILE_NAME>/dir/dir2/filename1</APPNAME_FILES_FILE_1_FILE_NAME>-->
<!--<APPNAME_FILES_FILE_2_FILE_NAME>/dir/dir2/filename2</APPNAME_FILES_FILE_2_FILE_NAME>-->
<!--<APPNAME_SETTING_BOOL>true</APPNAME_SETTING_BOOL>-->
<!--<APPNAME_SETTING_STRING>4:3</APPNAME_SETTING_STRING>-->
</configuration>


All elements are commented. This is because it is not worth to read default value that is already provided in header file and compiled into application (to avoid parsing and processing).

To run transformation following command shall be used:

xsltproc runtime_config.xsl config.xml >config_flat.xml


Additionally to check if output xml is well-formed, we can run:

xmllint --noout config_flat.xml



Presented solution has problem that xslt is not schema-aware. This problem can be overcome using better xslt processor (xslt 2.0). I have tried AltovaXML which is free in its community edition version. The basic version allows for schema-aware xslt processing. Unfortunately application is for MS Windows only, but works very good under Wine.


References

W3C specifications:
XML
XPath
XSLT 1.0
XSLT 2.0
XML Schema
XML Schema primer - schema by example

Some interesting sources:
XML Schema beware of namespaces
Recursion in XSLT
Recursion in XSLT event better
Schema-aware XSLT processing
XML Schema referencing
XPath axes
Difference between xsl:value-of and xsl:copy-of and when to use node()
Restriction to XPath constraints
AltovaXML