




































|
|
|
This document describes the transformation API for XML
(TrAX), the set of APIs contained in javax.xml.transform,
javax.xml.transform.stream,
javax.xml.transform.dom,
and javax.xml.transform.sax.
There is a broad need for Java applications to be able to
transform XML and related tree-shaped data structures. In
fact, XML is not normally very useful to an application
without going through some sort of transformation, unless the
semantic structure is used directly as data. Almost all
XML-related applications need to perform transformations.
Transformations may be described by Java code, Perl code, XSLT Stylesheets, other
types of script, or by proprietary formats. The inputs, one or
multiple, to a transformation, may be a URL, XML stream, a DOM
tree, SAX Events, or a proprietary format or data structure.
The output types are the pretty much the same types as the
inputs, but different inputs may need to be combined with
different outputs.
The great challenge of a transformation API is how to deal
with all the possible combinations of inputs and outputs,
without becoming specialized for any of the given types.
The Java community will greatly benefit from a common API
that will allow them to understand and apply a single model,
write to consistent interfaces, and apply the transformations
polymorphically. TrAX attempts to define a model that is clean
and generic, yet fills general application requirements across
a wide variety of uses. |
|
|
This section will explain some general terminology used in
this document. Technical terminology will be explained in the
Model section. In many cases, the general terminology overlaps
with the technical terminology.
- Tree
- This term, as used within this document, describes an
abstract structure that consists of nodes or events that may
be produced by XML. A Tree physically may be a DOM tree, a
series of well balanced parse events (such as those coming
from a SAX2 ContentHander), a series of requests (the result
of which can describe a tree), or a stream of marked-up
characters.
- Source Tree(s)
- One or more trees that are the inputs to the
transformation.
- Result Tree(s)
- One or more trees that are the output of the
transformation.
- Transformation
- The process of consuming a stream or tree to produce
another stream or tree.
- Identity (or Copy) Transformation
- The process of transformation from a source to a result,
making as few structural changes as possible and no
informational changes. The term is somewhat loosely used, as
the process is really a copy. from one "format" (such as a
DOM tree, stream, or set of SAX events) to another.
- Serialization
- The process of taking a tree and turning it into a
stream. In some sense, a serialization is a specialized
transformation.
- Parsing
- The process of taking a stream and turning it into a
tree. In some sense, parsing is a specialized
transformation.
- Transformer
- A Transformer is the object that executes the
transformation.
- Transformation instructions
- Describes the transformation. A form of code, script, or
simply a declaration or series of declarations.
- Stylesheet
- The same as "transformation instructions," except it is
likely to be used in conjunction with XSLT.
- Templates
- Another form of "transformation instructions." In the
TrAX interface, this term is used to describe processed or
compiled transformation instructions. The Source flows
through a Templates object to be formed into the Result.
- Processor
- A general term for the thing that may both process the
transformation instructions, and perform the transformation.
- DOM
- Document Object Model, specifically referring to the Document Object
Model (DOM) Level 2 Specification.
- SAX
- Simple API for XML, specifically referring to the SAX 2.0
release.
|
|
|
The following requirements have been determined from broad
experience with XML projects from the various members
participating on the JCP.
- TrAX must provide a clean, simple interface for simple
uses.
- TrAX must be powerful enough to be applied to a wide
range of uses, such as, e-commerce, content management,
server content delivery, and client applications.
- A processor that implements a TrAX interface must be
optimizeable. Performance is a critical issue for most
transformation use cases.
- As a specialization of the above requirement, a TrAX
processor must be able to support a compiled model, so that
a single set of transformation instructions can be compiled,
optimized, and applied to a large set of input sources.
- TrAX must not be dependent an any given type of
transformation instructions. For instance, it must remain
independent of XSLT.
- TrAX must be able to allow processors to transform DOM
trees.
- TrAX must be able to allow processors to produce DOM
trees.
- TrAX must allow processors to transform SAX events.
- TrAX must allow processors to produce SAX events.
- TrAX must allow processors to transform streams of XML.
- TrAX must allow processors to produce XML, HTML, and
other types of streams.
- TrAX must allow processors to implement the various
combinations of inputs and outputs within a single
processor.
- TrAX must allow processors to implement only a limited
set of inputs. For instance, it should be possible to write
a processor that implements the TrAX interfaces and that
only processes DOM trees, not streams or SAX events.
- TrAX should allow a processor to implement
transformations of proprietary data structures. For
instance, it should be possible to implement a processor
that provides TrAX interfaces that performs transformation
of JDOM trees.
- TrAX must allow the setting of serialization properties,
without constraint as to what the details of those
properties are.
- TrAX must allow the setting of parameters to the
transformation instructions.
- TrAX must support the setting of parameters and
properties as XML Namespaced items (i.e., qualified names).
- TrAX must support URL resolution from within the
transformation, and have it return the needed data
structure.
- TrAX must have a mechanism for reporting errors and
warnings to the calling application.
|
|
|
The section defines the abstract model for TrAX, apart from
the details of the interfaces.
A TRaX TransformerFactory
is an object that processes transformation instructions, and
produces Templates
(in the technical terminology). A Templates
object provides a Transformer,
which transforms one or more Sources
into one or more Results.
To use the TRaX interface, you create a TransformerFactory,
which may directly provide a Transformers,
or which can provide Templates
from a variety of Sources.
The Templates
object is a processed or compiled representation of the
transformation instructions, and provides a Transformer.
The Transformer
processes a Source
according to the instructions found in the Templates,
and produces a Result.
The process of transformation from a tree, either in the
form of an object model, or in the form of parse events, into
a stream, is known as serialization. We believe this is
the most suitable term for this process, despite the overlap
with Java object
serialization. |
|
|
The intent, responsibilities, and thread safety of TrAX
objects:
|
|
- Intent
- Generic concept for the set of objects that
implement the TrAX interfaces.
- Responsibilities
- Create compiled transformation instructions,
transform sources, and manage transformation
parameters and properties.
- Thread safety
- Only the Templates object can be used concurrently
in multiple threads. The rest of the processor does
not do synchronized blocking, and so may not be used
to perform multiple concurrent operations.
|
|
|
- Intent
- Serve as a vendor-neutral Processor interface for
XSLT and
similar processors.
- Responsibilities
- Serve as a factory for a concrete implementation
of an TransformerFactory, serve as a direct factory
for Transformer objects, serve as a factory for
Templates objects, and manage processor specific
features.
- Thread safety
- A TransformerFactory may not perform mulitple
concurrent operations.
|
|
|
- Intent
- The runtime representation of the transformation
instructions.
- Responsibilities
- A data bag for transformation instructions; act as
a factory for Transformers.
- Thread safety
- Threadsafe for concurrent usage over multiple
threads once construction is complete.
|
|
|
- Intent
- Act as a per-thread execution context for
transformations, act as an interface for performing
the transformation.
- Responsibilities
- Perform the transformation.
- Thread safety
- Only one instance per thread is safe.
 |
The Transformer is bound
to the Templates object that created
it. | |
|
|
- Intent
- Serve as a single vendor-neutral object for
multiple types of input.
- Responsibilities
- Act as simple data holder for System IDs, DOM
nodes, streams, etc.
- Thread safety
- Threadsafe concurrently over multiple threads for
read-only operations; must be synchronized for edit
operations.
|
|
|
Alternative name: ResultTarget.
- Intent
- Serve as a single object for multiple types of
output, so there can be simple process method
signatures.
- Responsibilities
- Act as simple data holder for output stream, DOM
node, ContentHandler, etc.
- Thread safety
- Threadsafe concurrently over multiple threads for
read-only, must be synchronized for edit.
|
|
| |