EMF: Reading a model from XML - how to correctly declare its namespace - variants

January 3, 2011

When you use the Eclipse Modeling Framework (EMF) to read a model instance from an XML file, such as a webservice call message payload, it's essential for EMF to be able to match the root XML element with the model's "ePackage" that should be used for (re)constructing the model instance from the XML and this is done by matching the root element's and the ePackage's namespaces (as in XSD). So it's very important to have proper configuration of EMF and proper content of the XML. Since there ale multiple variations of both, there are more ways to get it wrong than right. To learn what I've discovered regarding the valid combinations suitable at different situations, read on.

EMF micro-introduction

In Java you can define models by creating classes with attributes (such as Book with a name, an Author, a Library) and connecting them together. You can then create an instance of the model by creating instances of the classes (Book: Clean Code, Author: Uncle Bob, ...). The meta-model you use for defining the models is defined by the Java language and consists of classes etc.

EMF is Java API which makes it also possible to define models and create, save, and load their instances in various formats (for example native EMF, UML, XSD). The main differences to Java are that models can be defined not only statically but also dynamically at runtime (and I suppose that the syntax may be richer than Java's as EMF is more general-purpose), that EMF supports multiple formats for model and model instance input and output, that there is a better reflection-like API, and that you can easily generate editors for your models. The generic meta-model for defining models consists of elements like EClass, EAttribute, EOperation etc. and these elements are grouped in so-called EPackages, each identified uniquely by a namespace URI as you might know it from XSD. Instances of a model are composed of EObjects (and supporting classes like EList). Each of this elements is represented by a Java class.

Nano-glossary: Feature (or structural feature) is, simply said, an attribute.

Assumptions

In this post I'll suppose that you've already defined your model, for example dynamically by reading it from an XSD or statically by generating it based on UML, and that you want to read an XML file representing an instance of the model and transform it into the runtime EMF representation composed of EObjects. The XML may be for example a part of a SOAP message.

For simplicity we may assume that all your model elements are within a single EPackage, under a single namespace, though there is no problem with having them distributed among multiple packages provided that each top-most XML element belonging to another package has proper namespace information so that EMF can match it to the correct EPackage.

Proper EMF configurations and namespace declarations

There are basically three options regarding the XML root element's namespace declaration:

It declares its namespace with an explicit prefix
It declares its namespace without any prefix, i.e. it declares its namespace as the default one since that on
There is no namespace information at all in the XML

Based on where you're getting the XML from you may encounter any of these situation and you will need to configure EMF accordingly. If you get EMF wrong then you will get strange results such as

null
a valid root EObject without any attributes for the nested elements or the failure "unknown feature"
an EObject without any EClass information (signifying that EMF failed to match the root XML element with the EPackage that you wanted and used its defaults instead)

1. XML root element in a namespace with a prefix (xmlns:ns="...">)

EMF normally expects the input XML to have the target namespace declared with a prefix on the root element and the nested elements to have neither the prefix nor any namespace declaration (and thus these local elements are unqualified in the XSD language). Example:


<myRootElement xmlns:ns="http://example.com/myXml" someAttribute="value">
   <myAnotherElement anotherAttribute="value 2"/>
<myRootElement>

This is what you would get if you saved an EMF model using its serialization API (Resource.save(..)).

The root element's namespace - http://example.com/myXml in this case - must correspond to the namespace URI of a registered EPackage in the EMF package registry (via put(namespaceUri, anEPackageInstance)).

2. XML root element in the default namespace(<elm xmlns="...">)

Another, a bit more tricky option is to declare the namespace without any prefix at all:


<myRootElement xmlns="http://example.com/myXml" someAttribute="value">
   <myAnotherElement anotherAttribute="value 2"/>
</myRootElement>

Beware that having the root element without any prefix and with a declaration of the default NS as here may cause a strange behavior, namely the root object being loaded but without any content, producing the error "unknown feature" for the 1st nested element.

It's important to know how namespaces and qualified elements work. In the case #1 (explicit prefix) you have only the root element in a namespace (NS) while the nested elements, having no prefix, have null NS. Therefore if you declare a default NS on the root element you have a totally different situation because now all elements without a prefix - including the nested ones - have a namespace. And from the point of view of XML and thus also EMF the elements "myAnotherElement from the namespace XY" and "myAnotherElement from the null namespace" are completely different and you will thus get the rather confusing error "unknown feature myAnotherElement".

The solution is to tell EMF that it should suppose that the nested elements are qualified with a namespace name instead of having a null NS. I don't know how to do that in general but if you load your model from an XSD then you can do it by adding the attribute 'elementFormDefault="qualified"' to the xsd:schema element to tell it that even local elements (those within a complexType) should be qualified:


<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
 targetNamespace="http://example.com/myXml"
 elementFormDefault="qualified">
	  <xsd:complexType name="myRootElement">
		<xsd:sequence>
			<xsd:element name="myAnotherElement" type="xsd:string" />
			...
		</xsd:sequence>
	</xsd:complexType>
</xsd:schema>

(Setting the elementFormDefault attribute was really the only change I had to do to get EMF working again.)

This tip comes from a forum post and its response. See the XSD spec. - 3.1 Target Namespaces & Unqualified Locals for details.

Notice that qualified, in XML terms, means "Associated with a namespace, either by the use of a declared prefix or via a default namespace declaration". All "unqualified" elements and attributes are in no (null) namespace. By default, all global elements and attributes are qualified.

Why you may need this

Either you receive input in this format
Or you receive input without any namespace information and you want to add it there with the least effort (adding just a namespace declaration is easier than also needing to change element's namespace prefix)

3. No namespace information in the XML (<elm>)

This is the worst case because the input XML doesn't describe itself sufficiently and you need to know or guess the right EPackage to parse it, it would be certainly better if it had a namespace information. But sometimes you are not able to influence the input yet still want to read it with EMF.


<myRootElement someAttribute="value">
   <myAnotherElement anotherAttribute="value 2"/>
</myRootElement>

If the root XML element has no prefix and no associated namespace information then EMF won't be able to match it with the corresponding model EPackage and thus won't be able to load it properly unless you register the target EPackage with the ResourceSet's/global package registry (also) under the null URI:


for (EPackage ePackage: eCorePackages) {
	resourceSet.getPackageRegistry().put(null, ePackage);
	// alternatively could call EPackage.Registry.INSTANCE.put(..)
}

This tip comes from an Eclipse forum thread.

General notes on XML saving/loading in EMF

When loading/saving model instance from/to a XML you will likely want to pass some of the following options to the save or load methods:


Map<String, Object> options = new HashMap<String, Object>();
options.put(XMLResource.OPTION_EXTENDED_META_DATA, Boolean.TRUE);
// options.put(XMLResource.OPTION_RECORD_UNKNOWN_FEATURE, Boolean.TRUE);
options.put(XMLResource.OPTION_ENCODING, "UTF-8");

OPTION_EXTENDED_META_DATA - create nested elements rather than attributes; I'm not sure whether it has any effect when loading
OPTION_RECORD_UNKNOWN_FEATURE - when loading and an unknown element is encountered (see #2), do not throw the "unknown feature" exception but simply skip it
OPTION_ENCODING - the encoding of the generated XML, default is ASCII; not sure what it does when loading

Environment

EMF version: 2.2.1

Summary

EMF needs to map the root element of an XML being loaded to a registered EPacakge by matching their namespaces. In the XML, the namespace may be declared with a prefix or it may be the default namespace - in which case EMF must be informed that the nested elements are qualified too - or, finally, there may be no namespace information, in which case you need to register the "default EPackage" to be used under the null namespace URI.

The documentation and JavaDoc of EMF 2.2 is not very good but fortunately you can find most answers in the Eclipse and EclipseZone forums.

In the next post I'll describe how to create a dynamic EMF model based on XSD schemas and how to use it to load model instances from XML either into the standard EObjects or into EMF SDO implementation's EDataObjects.

Tags: java library tool