The Translet API and TrAX
Contents
Note: This document describes the design of XSLTC's TrAX implementation. The XSLTC TrAX API user documentation is kept in a separate document.
The structure of this document is, and should be kept, as follows:
- A brief introduction to TrAX/JAXP
- Overall design of the XSLTC TrAX implementation
- Detailed design of various TrAX components
Abstract
JAXP is the Java extension API for XML parsing. TrAX is an API for XML transformations and is included in the later versions of JAXP. JAXP includes two packages, one for XML parsing and one for XML transformations (TrAX):
javax.xml.parsers javax.xml.transform
XSLTC is an XSLT processing engine and fulfills the role as an XML transformation engine behind the TrAX portion of the JAXP API. XSLTC is a provider for the TrAX API and a client of the JAXP parser API.
This document describes the design used for integrating XSLTC translets
with the JAXP TrAX API. The heart of the design is a wrapper class around the
XSLTC compiler that extends the JAXP SAXTransformerFactory
interface. This factory delivers translet class definitions (Java bytecodes)
wrapped inside TrAX Templates
objects. These
Templates
objects can be used to instanciate
Transformer
objects that transform XML documents into markup or
plain text. Alternatively a Transformer
object can be created
directly by the TransformerFactory
, but this approach is not
recommended with XSLTC. The reason for this will be explained later in this
document.
TrAX basics
The Java API for XML Processing (JAXP) includes an XSLT framework based on the Transformation API for XML (TrAX). A JAXP transformation application can use the TrAX framework in two ways. The simplest way is:
- create an instance of the TransformerFactory class
- from the factory instance and a given XSLT stylesheet, create a new Transformer object
- call the Transformer object's transform() method, specifying the XML input and a Result object.
import javax.xml.transform.*; public class Compile { public void run(Source xsl) { .... TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(xsl); .... } }
This suits most conventional XSLT processors that transform XML documents in one go. XSLTC needs one extra step to compile the XSL stylesheet into a Java class (a "translet"). Fortunately TrAX has another approach that suits XSLTC two-step transformation model:
- create an instance of the TransformerFactory class
- from the factory instance and a given XSLTC, stylesheet, create a new Templates object (this step will compile the stylesheet and put the bytecodes for translet class(es) into the Templates object)
- from the Template object create a Transformer object (this will instanciate a new translet object).
- call the Transformer object's transform() method, specifying the XML input and a Result object.
import javax.xml.transform.*; public class Compile { public void run(Source xsl) { .... TransformerFactory factory = TransformerFactory.newInstance(); Templates templates = factory.newTemplates(xsl); Transformer transformer = templates.newTransformer(); .... } }
Note that the first two steps need be performed only once for each
stylesheet. Once the stylesheet is compiled into a translet and wrapped in a
Templates
object, the Templates
object can be used
over and over again to create Transformer object (instances of the translet).
The Templates
instances can even be serialized and stored on
stable storage (ie. in a memory or disk cache) for later use.
The code below illustrates a simple JAXP transformation application that
creates the Transformer
directly. Remember that this is not the
ideal approach with XSLTC, as the stylesheet is compiled for each
transformation.
import javax.xml.transform.stream.StreamSource; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; public class Proto { public void run(String xmlfile, String xslfile) { Transformer transformer; TransformerFactory factory = TransformerFactory.newInstance(); try { StreamSource stylesheet = new StreamSource(xslfile); transformer = factory.newTransformer(stylesheet); transformer.transform(new StreamSource(xmlfile), new StreamResult(System.out)); } catch (Exception e) { // handle errors... } : : }
This approach seems simple is probably used in many applications. But, the
use of Templates
objects is useful when multiple instances of
the same Transformer
are needed. Transformer
objects are not thread safe, and if a server wants to handle several clients
requests it would be best off to create one global Templates
object, and then from this create a Transformer
object for each
thread handling the requests. This approach is also by far the best for
XSLTC, as the Templates
object will hold the class definitions
that make up the translet and its auxiliary classes. (Note that the bytecodes
and not the actuall class definitions are stored when serializing a
Templates
object to disk. This is because of class loader
security restrictions.) To accomodate this second approach to TrAX
transformations, the above class would be modified as follows:
try { StreamSource stylesheet = new StreamSource(xslfile); Templates templates = factory.newTemplates(stylesheet); transformer = templates.newTransformer(); transformer.transform(new StreamSource(inputFilename), new StreamResult(System.out)); } catch (Exception e) { // handle errors... }
TrAX configuration
JAXP's TransformerFactory
is configurable similar to the
other Java extensions. The API supports configuring thefactory by:
- passing vendor-specific attributes from the application, through the TrAX interface, to the underlying XSL processor
- registering an ErrorListener that will be used to pass error and warning messages from the XSL processor to the application
- registering an URIResolver that the application can use to load XSL and XML documents on behalf of the XSL processor (the XSL processor will use this to support the xsl:include and xsl:import elements and the document() functions.
The JAXP TransformerFactory can be queried at runtime to discover what features it supports. For example, an application might want to know if a particular factory implementation supports the use of SAX events as a source, or whether it can write out transformation results as a DOM. The factory API queries with the getFeature() method. In the above code, we could add the following code before the try-catch block:
if (!factory.getFeature(StreamSource.FEATURE) || !factory.getFeature(StreamResult.FEATURE)) { System.err.println("Stream Source/Result not supported by TransformerFactory\nExiting...."); System.exit(1); }
Other elements in the TrAX API are configurable. A Transformer object can be passed settings that override the default output settings and the settings defined in the stylesheet for indentation, output document type, etc.
XSLTC TrAX architecture
XSLTC's architecture fits nicely in behind the TrAX interface. XSLTC's
compiler is put behind the TransformerFactory
interface, the
translet class definition (either as a set of in-memory
Class
objects or as a two-dimmensional array of bytecodes on
disk) is encapsulated in the Templates
implementation and the
instanciated translet object is wrapped inside the Transformer
implementation. Figure 1 (below) shows this two-layered TrAX architecture:
Figure 1: Translet class definitions are wrapped inside Templates objects
The TransformerFactory
implementation also implements the
SAXTransformerFactory
and ErrorListener
interfaces from the TrAX API.
The TrAX implementation has intentionally been kept completely separate from the XSLTC native code. This prevents users of XSLTC's native API from having to include the TrAX code in an application. All the code that makes up our TrAX implementation resides in this package:
org.apache.xalan.xsltc.trax
Message to all XSLTC developers: Keep it this way! Do not mix TrAX and Native code!
TrAX implementation details
The main components of our TrAX implementation are:
TransformerFactory implementation
The methods that make up the basic TransformerFactory
iterface are:
public Templates newTemplates(Source source); public Transformer newTransformer(); public ErrorListener getErrorListener(); public void setErrorListener(ErrorListener listener); public Object getAttribute(String name); public void setAttribute(String name, Object value); public boolean getFeature(String name); public URIResolver getURIResolver(); public void setURIResolver(URIResolver resolver); public Source getAssociatedStylesheet(Source src, String media, String title, String charset);
And for the SAXTransformerFactory
interface:
public TemplatesHandler newTemplatesHandler(); public TransformerHandler newTransformerHandler(); public TransformerHandler newTransformerHandler(Source src); public TransformerHandler newTransformerHandler(Templates templates); public XMLFilter newXMLFilter(Source src); public XMLFilter newXMLFilter(Templates templates);
And for the ErrorListener
interface:
public void error(TransformerException exception); public void fatalError(TransformerException exception); public void warning(TransformerException exception);
TransformerFactory basics
The very core of XSLTC TrAX support for XSLTC is the implementation of
the basic TransformerFactory
interface. This factory class is
more or less a wrapper around the the XSLTC compiler and creates
Templates
objects in which compiled translet classes can
reside. These Templates
objects can then be used to create
Transformer
objects. In cases where the
Transformer
is created directly by the factory we will use
the Templates
class internally. In that way the transformation
will appear to be done in one step from the users point of view, while we
in reality use to steps. As described earler, this is not the best approach
when using XSLTC, as it causes the stylesheet to be compiled for each and
every transformation.
TransformerFactory attribute settings
The getAttribute()
and setAttribute()
methods
only recognise two attributes: translet-name
and
debug
. The latter is obvious - it forces XSLTC to output debug
information (dumps the stack in the very unlikely case of a failure). The
translet-name
attribute can be used to set the default class
name for any nameless translet classes that the factory creates. A nameless
translet will, for instance, be created when the factory compiles a translet
for the identity transformation. There is a default name,
GregorSamsa
, for nameless translets, so there is no absolute
need to set this attribute. (Gregor Samsa is the main character from Kafka's
"Metamorphosis" - transformations, metamorphosis - I am sure you
see the connection.)
TransformerFactory stylesheet handling
The compiler is can be passed a stylesheet through various methods in
the TransformerFactory
interface. A stylesheet is passed in as
a Source
object that containin either a DOM, a SAX parser or
a stream. The getInputSource()
method handles all inputs and
converts them, if necessary, to SAX. The TrAX implementation contains an
adapter that will generate SAX events from a DOM, and this adapter is used
for DOM input. If the Source
object contains a SAX parser, this
parser is just passed directly to the compiler. A SAX parse is instanciated
(using JAXP) if the Source
object contains a stream.
TransformerFactory URI resolver
A TransformerFactory needs a URIResolver
to locate documents
that are referenced in <xsl:import>
and
<xsl:include>
elements. XSLTC has an internal interface
that shares the same purpose. This internal interface is implemented by the
TransformerFactory
:
public InputSource loadSource(String href, String context, XSLTC xsltc);
This method will simply use any defined URIResolver
and
proxy the call on to the URI resolver's resolve()
method. This
method returns a Source
object, which is converted to SAX
events and passed back to the compiler.
Templates design
Templates creation
The TransformerFactory
implementation invokes the XSLTC
compiler to generate the translet class and auxiliary classes. These classes
are stored inside our Templates
implementation in a manner
which allows the Templates
object to be serialized. By making
it possible to store Templates
on stable storage we allow the
TrAX user to store/cache translet class(es), thus making room for XSLTC's
one-compilation-multiple-transformations approach. This was done by giving
the Templates
implementation an array of byte-arrays that
contain the bytecodes for the translet class and its auxiliary classes. When
the user first requests a Transformer
instance from the
Templates
object for the first time we create one or more
Class
objects from these byte arrays. Note that this is done
only once as long as the Template
object resides in memory. The
Templates
object then invokes the JVM's class loader with the
class definition(s) to instanciate the translet class(es). The translet
objects are then wraped inside a Transformer
object, which is
returned to the client code:
// Contains the name of the main translet class private String _transletName = null; // Contains the actual class definition for the translet class and // any auxiliary classes (representing node sort records, predicates, etc.) private byte[][] _bytecodes = null; /** * Defines the translet class and auxiliary classes. * Returns a reference to the Class object that defines the main class */ private Class defineTransletClasses() { TransletClassLoader loader = getTransletClassLoader(); try { Class transletClass = null; final int classCount = _bytecodes.length; for (int i = 0; i < classCount; i++) { Class clazz = loader.defineClass(_bytecodes[i]); if (clazz.getName().equals(_transletName)) transletClass = clazz; } return transletClass; // Could still be 'null' } catch (ClassFormatError e) { return null; } }
Translet class loader
The Templates
object will create the actual translet
Class
object(s) the first time the
newTransformer()
method is called. (The "first time" means the
first time either after the object was instanciated or the first time after
it has been read from storage using serialization.) These class(es) cannot
be created using the standard class loader since the method:
Class defineClass(String name, byte[] b, int off, int len);
of the ClassLoader is protected. XSLTC uses its own class loader that extends the standard class loader:
// Our own private class loader - builds Class definitions from bytecodes private class TransletClassLoader extends ClassLoader { public Class defineClass(byte[] b) { return super.defineClass(null, b, 0, b.length); } }
This class loader is instanciated inside a privileged code section:
TransletClassLoader loader = (TransletClassLoader) AccessController.doPrivileged( new PrivilegedAction() { public Object run() { return new TransletClassLoader(); } } );
Then, when the newTransformer() method returns it passes back and
instance of XSLTC's Transformer
implementation that contains
an instance of the main translet class. (One transformation may need several
Java classes - for sort-records, predicates, etc. - but there is always one
main translet class.)
Class loader security issues
When XSLTC is placed inside a JAR-file in the
$JAVA_HOME/jre/lib/ext
it is loaded by the extensions class
loader and not the default (bootstrap) class loader. The extensions class
loader does not look for class files/definitions in the user's
CLASSPATH
. This can cause two problems: A) XSLTC does not find
classes for external Java functions, and B) XSLTC does not find translet or
auxiliary classes when used through the native API.
Both of these problems are caused by XSLTC internally calling the
Class.forName()
method. This method will use the current class
loader to locate the desired class (be it an external Java class or a
translet/aux class). This is prevented by forcing XSLTC to use the bootstrap
class loader, as illustrated below:
Figure 2: Avoiding the extensions class loader
These are the steps that XSLTC will go through to load a class:
- the application requests an instance of the transformer factory
- the Java extensions mechanism locates XSLTC as the transformer factory implementation using the extensions class loader
- the extensions class loader loads XSLTC
- XSLTC's compiler attempts to get a reference to an external Java class, but the call to Class.forName() fails, as the extensions class loader does not use the user's class path
- XSLTC attempts to get a reference to the bootstrap class loader, and requests it to load the external class
- the bootstrap class loader loads the requested class
Step 5) is only allowed if XSLTC has special permissions. But, remember
that this problem only occurs when XSLTC is put in the
$JAVA_HOME/jre/lib/ext
directory, where it is given all
permissions (by the default security file).
Transformer detailed design
The Transformer
class is a simple proxy that passes
transformation settings on to its translet instance before it invokes the
translet's doTransform()
method. The Transformer
's
transform()
method maps directly to the translet's
doTransform()
method.
Transformer input and output handling
The Transformer
handles its input in a manner similar to
that of the TransformerFactory
. It has two methods for
creating standard SAX input and output handlers for its input and output
files:
private DOMImpl getDOM(Source source, int mask); private ContentHandler getOutputHandler(Result result);
One aspect of the getDOM
method is that it handles four
various types of Source
objects. In addition to the standard
DOM, SAX and stream types, it also handles an extended
XSLTCSource
input type. This input type is a lightweight
wrapper from XSLTC's internal DOM-like input tree. This allows the user
to create a cache or pool of XSLTC's native input data structures
containing the input XML document. The XSLTCSource
class
is located in:
org.apache.xalan.xsltc.trax.XSLTCSource
Transformer parameter settings
XSLTC's native interface has get/set methods for stylesheet parameters,
identical to those of the TrAX API. The parameter handling methods of
the Transformer
implementation are pure proxies.
Transformer output settings
The Transformer interface of TrAX has for methods for retrieving and defining the transformation output document settings:
public Properties getOutputProperties(); public String getOutputProperty(String name); public void setOutputProperties(Properties properties); public void setOutputProperty(String name, String value);
There are three levels of output settings. First there are the default settings defined in the XSLT 1.0 spec, then there are the settings defined in the attributes of the <xsl:output> element, and finally there are the settings passed in through the TrAX get/setOutputProperty() methods.
Figure 3: Passing output settings from TrAX to the translet
The AbstractTranslet class has a series of fields that contain the
default values for the output settings. The compiler/Output class will
compile code into the translet's constructor that updates these values
depending on the attributes in the <xsl:output> element. The
Transformer implementation keeps in instance of the java.util.Properties
class where it keeps all properties that are set by the
setOutputProperty()
and the
setOutputProperties()
methods. These settings are written to
the translet's output settings fields prior to initiating the
transformation.
Transformer URI resolver
The uriResolver()
method of the Transformer interface is
used to set a locator for documents referenced by the document() function
in XSL. The native XSLTC API has a defined interface for a DocumentCache.
The functionality provided by XSLTC's internal DocumentCache
interface is somewhat complimentary to the URIResolver
, and
can be used side-by-side. To acomplish this we needed to find out in which
ways the translet can load an external document:
Figure 4: Using URIResolver and DocumentCache objects
From the diagram we see that these three ways are:
- LoadDocument -> .xml
- LoadDocument -> DocumentCache -> .xml
- LoadDocument -> URIResolver -> .xml
- LoadDocument -> DocumentCache -> URIResolver -> .xml