Home
Python Projekte
genxmlif
minixsv
Download
Kontakt
Impressum
minixsv: A Lightweight XML schema validator

New release 0.9.0 available!

minixsv is a lightweight XML schema validator package written in pure Python (at least Python 2.4 is required).
It is based on genxmlif, a generic XML interface package,
which currently supports the standard python DOM implementations minidom, 4DOM and Fredrik Lundh's elementtree module
(configurable by parameter "xmlIfClass" which can be XMLIF_MINIDOM, XMLIF_4DOM or XMLIF_ELEMENTTREE).
Other DOM implementations can be adapted by implementing a new derived XML interface class.

minixsv provides a programming interface (API) which is explained below.
minixsv has been developed for using XML schema in code generators
but can also be used for any other application.

New features of release 0.9.0:
- check of facets of derived primitive types added
- unicode support added (except wide unicode characters)
- major improvements for pattern matching (but there are still some restrictions, refer below)
- limited support of XInclude added (no support of fallback tag)
- performance optimizations (caching option introduced)
- several bugs fixed

Release 0.9.0 has been tested against the W3C XML Schema Test Suite (new testsuite from 2006-11-06).

Results:
NIST tests:       3943 of 3953 testgroups passed
Microsoft tests: 8645 of 9745 testgroups passed
SUN tests:          559 of  679 testgroups passed

Most testgroups which haven't been passed correspond to the limitations listed below!


Constructor of the API class
pyxsval.XsValidator:
 
def __init__(self, xmlIfClass=XMLIF_MINIDOM,
                    warningProc=IGNORE_WARNINGS,
                    errorLimit=_XS_VAL_DEFAULT_ERROR_LIMIT,
                    verbose=0,
                    useCaching=1,
                    processXInclude=1):

xmlifClass:           XMLIF_MINIDOM, XMLIF_4DOM or XMLIF_ELEMENTTREE
warningProc:      
IGNORE_WARNINGS, PRINT_WARNINGS or STOP_ON_WARNINGS
verbose:               0, 1 or 2
useCaching:         0 or 1   (1: use internal caching for performance optimization, option new in release 0.9.0)
processXInclude:  0 or 1   (1: process XInclude instruction before validation
, option new in release 0.9.0)


Convenience functions:

parseAndValidate (inputFile, xsdFile=None, **kw):
minixsv uses the schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputFile" root tag by default.
Only if no schema file is specified in the input file, the schema file given by the input parameter xsdFile is used,
i.e. the schema specification in the input file has priority (changed in release 0.8)!
Other options (**kw) are forwarded to the
XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).

parseAndValidateString (inputText, xsdText=None, **kw):
This function expects text strings containing XML code instead of filenames.
minixsv uses the schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputText" root tag by default.
Only if no schema file is specified in "inputText", the schema given by the input parameter xsdText is used,
i.e. the schema specification of "inputText" has priority (changed in release 0.8)!
Other options (**kw) are forwarded to the XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).


parseAndValidateXmlInput (inputFile, xsdFile=None, validateSchema=0, **kw):
Same as parseAndValidate, but schema file is not validated by default.
minixsv uses the schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputFile" root tag by default.
Only if no schema file is specified in the input file, the schema file given by the input parameter xsdFile is used,
i.e. the schema specification in the input file has now priority
(changed in release 0.8)!
Other options (**kw) are forwarded to the XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).


parseAndValidateXmlInputString (inputText, xsdText=None, validateSchema=0, **kw):
Same as parseAndValidateString, but schema file is not validated by default.
This function expects text strings containing XML code instead of filenames.
minixsv uses the schema file referred in the "schemaLocation" or
"noNamespaceSchemaLocation" attribute of the "inputText" root tag by default.
Only if no schema file is specified in "inputText", the schema given by the input parameter xsdText is used,
i.e. the schema specification of "inputText" has now priority
(changed in release 0.8)!
Other options (**kw) are forwarded to the XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).


parseAndValidateXmlSchema (xsdFile, **kw):
This function validates only the given schema file.
Other options (**kw) are forwarded to the
XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).


parseAndValidateXmlSchemaString (xsdText, **kw):
This function validates only the given schema text string..
Other options (**kw) are forwarded to the
XsValidator class.
Return value is a wrapper object containing the PSVI (Post-Schema-Validation-Information-Set).


Examples for invoking minixsv:

from genxmlif import GenXmlIfError
from minixsv import pyxsval

try:
    # use default values of minixsv, location of the schema file must be specified in the XML file
    domTreeWrapper =
pyxsval .parseAndValidate ("Test.xml")

    # domTree is a minidom document object
    domTree = domTreeWrapper.getTree()

   
# call validator with non-default values
    elementTreeWrapper =
pyxsval .parseAndValidate ("Test.xml", xsdFile="TestSchema.xsd",
                                                                                        xmlIfClass= pyxsval.XMLIF_ELEMENTTREE,
                                                                                        warningProc=pyxsval.PRINT_WARNINGS,
                                                                                        errorLimit=200, verbose=1,
                                                                                        useCaching=0, processXInclude=0
)

    # get elementtree object after validation
    elemTree =
elementTreeWrapper.getTree()

except pyxsval.XsvalError, errstr:
    print errstr
    print "Validation aborted!"
   
except GenXmlIfError, errstr:
    print errstr
    print "Parsing aborted!"

Steps of validation performed by minixsv:
1. Parse XML input file and XML schema file (calls the parser of the configured DOM implementation)
2. Validate XML schema
3. Validate XML input

Since the validator is written in pure Python, it is not very fast. Instead of function "
parseAndValidate()"
the functions "
parseAndValidateXmlSchema()" and "parseAndValidateXmlInput()" can be used.

To speed-up validation "
parseAndValidateXmlSchema"
can be skipped,
if you are sure that the XML schema file is valid.


Using the 4DOM interface is rather slow. For best performance the elementtree interface should be used.

Note: It is essential for validation of the XML input that the XML schema file is valid.
          Otherwise runtime errors may occur inside minixsv.


The input file and xsd file for validation may be a path or an URL.

Caution: Interface changed in release 0.9.0!
Parser and XInclude errors will now result in "GenXmlIfError" exceptions, validation errors will result in "XsvalError" exceptions.
After successful validation "parseAndValidate...()" minixsv returns a XML tree wrapper object
(containing a DOM document or an elementtree) to the caller.
minixsv inserts "default" and "fixed" attributes automatically into the XML tree if they are not specified in the XML input file.
minixsv also normalizes and collapses white spaces of the XML input according to the specification in the XML schema.
Note, that the
PSVI (Post-Schema-Validation-Information-Set) of minixsv does not contain all the other information
specified  by
the XML schema standard 1.0.

Limitations

minixsv is in beta state (version 0.9.0). and supports a subset of the XML schema standard 1.0.
minixsv currently has at least the following limitations/restrictions:

- no check if derived type and base type match
- no check of attributes "final", "finalDefault"
- no support of substitution groups
- no support of abstract elements and types
- restrictions regarding pattern matching:
  * subtraction of character sets not supported, e.g. regex = "[\w-[ab]]"
  * character sets with \I, \C, \P{...} not supported, e.g. regex = "[\S\I\?a-c\?]"
     (character sets with \i, \c, \p{...} are supported!)

Note: This constraint list may not be complete!


Advanced features

addUserSpecXmlIfClass (xmlIfKey, factory):
Convenience function to add an user defined XML interface class.
This function expects a key to identify the interface class and a factory function which creates an instance
of the user defined XML interface class.
Example for minidom:

def _minidomInterfaceFactory (verbose):

    import minidomif
    return minidomif.MiniDomInterface(verbose)




Download

You can get the current release of minixsv here.


Copyright 2004-2008 by Roland Leuthe