|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
XSLT Transformations |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Executive
Summary
This chapter concludes our exploration of XML technologies and their application to data integration with a discussion of XSLT, a very rich and complex tool. We cover only the salient characteristics of XSLT, namely, the fact that XSLT uses XPath constructs, as well as the intimate connection between the models of the data to be transformed (XSD's) and the formulation of the transformations themselves. Using a handful of XSLT instructions we learn how to conduct in effect data normalization so that the resulting data sets can be readily imported into a database. We also see how new data can be built from the source data, and, lastly, we learn how to process multiple records using the same template. IntroductionIn the preceding chapters, we have learned not only to build well-formed XML documents, and to specify their structure for validation via XSD's, but, more importantly, we have also learned to 'model' the source data using this powerful technology. However, as we also saw there, the same data can be modeled by different people in vastly different ways. The consequence of this fact is that, absent some agreement among the data providers and users, we must be prepared to recast one XML vocabulary into another. Fortunately, there is already another XML technology that provides exactly that capability. Welcome to XML Stylesheet Language Transformation, or XSLT for short. A
Simple Scenario
The best way to begin learning about XSLT is to look at a simple case. Let's assume that Healthcare Provider A already has modeled its data and has a schema for all its XML documents as shown in the listing below. <?xml
version="1.0" encoding="UTF-8"?> Documents built according to this schema would look like this: <?xml
version="1.0"?> Let's suppose that the data model chosen by Healthcare Provider B for its data is as follows: <?xml
version="1.0" encoding="UTF-8"?> XML documents built according to this schema would look like this: <?xml
version="1.0"?> From
One Schema to Another
If Healthcare Provider B wanted to use the data collected by Healthcare Provider A we need to figure out a way to process its data so that after we manipulate it the resulting XML document will conform to the schema it has already adopted. As we look at both schemas we can see that there are substantial areas of overlap. For example, it is fair to assume that the value assigned to physician in the first schema is the same as the one covered by Clinician in the target schema. We also see that both schemas have the concept of 'patient', 'diagnosis', as well as 'date'. The complete mapping showing the specifics of each XSD
would look like this:
What is needed now is a way to extract the appropriate pieces of data from the source document and put them into the right container of the target document. The way XSLT accomplishes this is by specifying via XPath expressions how to fetch the content from the source document and then providing the XML tags that will go with that content. For the first row in the table above we could express this is words as follows:
We are now almost ready to begin writing our first XSLT. All we need now is to learn what the proper XPath expressions are that will accomplish the steps delineated above. But before we do that we need to understand one more concept. As we learned in the very beginning of the course, XML documents are essentially equivalent to what is known in graph theory as a 'tree'. Trees are made of connecting lines and nodes. In an XML document there are no explicit lines connecting the tags (i.e., the element nodes), instead we use nesting. If a tag is inside another tag that's equivalent to connecting the parent tag to the child tag. In addition to the element nodes (i.e., the XML tags) we also have in XML attribute nodes, text nodes, comment nodes, processing instruction nodes and namespace nodes. Every XML document is made of, or more precisely, maps to a tree structure made of the kinds of nodes just listed. The reason for spending some time thinking about this is that in order to use XPath effectively we need to understand the concept of the path operator. If you ever used the command line interface in MSDOS or in Unix you already know what the path operator looks like. Basically, it is a string separated by forward slashes where each chunk in between the slashes corresponds to a node of our XML tree graph. The XPath convention is that if a node is an attribute node it must be prefixed by the symbol '@'. The path operator is always accompanied by the select expression. Once the select and path operator have been correctly built we need to invoke the appropriate XSLT instruction that will act on the node specified by the path operator. About 99% of the time we will need only two XSLT instructions to accomplish most of the transformations we are interested in. The first one is xsl:value-of, which, as its name suggests emits the string corresponding to the select expression. The anatomy of the XSL Transformation we just described is then: <xsl:value-of
select="/path/to/node"/> Thus, for example, the XSL Transformation for fetching the name of the physician would be: <xsl:value-of select="cases/case/physician/@name"/> Our
First XSLT – Part 1
To write our first XSLT we need to make sure the processing application (e.g., XMLSpy) can properly identify it as such. To that effect, just as with XSD's one places the stylesheet root element at the beginning: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"> Next we put <xsl:template
match="/"> to indicate that this is a template that applies to the whole document.
After that we put the tag or tags that will show up in the resulting
document, and between the tags we put the appropriate XSLT expressions.
For our first example we are only going to output an XML document
consisting of the root tag, the record delimiter tag and one tag, namely,
<Clinician>. Therefore,
our first XSLT would have as the next lines: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> </CaseReports> To carry out the transformation it is necessary to add a processing instruction in the source XML document that links it to the XSLT that we just created. This is done using the following processing instruction: <?xml-stylesheet type="text/xsl"
href="C:\Data\04_Word\HSCI720\A2B.xsl"?> The modified XML document now looks like this: <?xml
version="1.0"?> Executing the transformation produces the following output: <?xml
version="1.0" encoding="UTF-8"?> We can now add the path to the schema used by Healthcare Provider B to validate the document. <?xml
version="1.0" encoding="UTF-8"?> Using XMLSpy one can test that the file is in fact valid. Alternative
XSLT
The preceding example was created using the commercial application XMLSpy. There are, however, freeware applications that accomplish the same task. A very popular one is XALAN, which can be downloaded from http://xml.apache.org/xalan. An even simpler XSLT engine is the one offered by
Microsoft. The executable can
be placed in any directory of your choice, and invoked by typing after the
prompt msxml. C:\Data\04_Word\HSCI720>msxsl Microsoft
(R) XSLT Processor Version 4.0
-? Show
this message For our example we can type: C:\Data\04_Word\HSCI720>msxsl
XSLT_01.xml A2B.xsl -o A2B_2.xml Where XSLT_01.xml is the file we want to transform, A2B.xsl is the XSLT we are going to use and -o A2B_2.xml indicates the name of the output file. After running the application the resulting file looks like this: <?xml
version="1.0" encoding="UTF-16"?> Our
First XSLT – Part 2
Class exercise: Complete the XSLT for all the remaining nodes in the source XML and test the transformation. Use MS msxml.exe for the transformation, and then validate the file using XMLSpy. Repeat
Elements
The previous sections have shown the basic concepts of the XSL Transformation, namely, the use of the processing instruction for stylesheets, the select and path operator, the concept of a template using something like <xsl:template match="/">, and the transformation instruction xsl:value-of. In addition we have also tested two engines that can accomplish XSL transformations. In real life, though, the XML sources will not consist of just one record. In fact there would be little point in writing and testing a whole transformation script to process one instance alone. So the real power lies in being able to process thousands of records from a source and have them recast in an XML vocabulary that conforms to the target XSD. As we alluded in the preceding section, the second most used XSLT instruction is the one that let us process multiple records in a single pass. This is the xsl:for-each instruction. Its specification is as follows: <xsl:for-each
select = node-set-expression> When the xsl:for-each is invoked it evaluates the template against each node in the path operator (node-set-expression) returned by the select expression. The order of evaluation can be influenced using one or more xsl:sorts. With this in mind let's look at an XML source data example from Healthcare Provider A containing more than one record: <cases> To transform it all we need is to let the transformation engine that there are multiple instances of the tag <case> and that all need to be processed in the same way. A possible solution for accomplishing this is shown below: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> The msxsl transformation engine can be invoked using a command like this: C:\Data\04_Word\HSCI720>msxsl
XSLT_01B.xml A2B.xsl -o A2B_4.xml The resulting file is shown below: <?xml
version="1.0" encoding="UTF-16"?> Transforming
a Flat Table
The sections above have shown how we can extract data from one XML source and recast it into a new XML vocabulary. The XSLT also allow us to insert new tags at the places we desire. We can use this capability to transform a flat table into a series of tables, i.e., to normalize the data as required. The listing below shows a single record from a notional CDC report on an anthrax related incident. <?xml
version="1.0"?> As we can see, the structure of the record is similar to that of a row in a flat table. One could begin to normalize the data by breaking it into its logical components. For example, the data could be recast to look like the listing below: <?xml
version="1.0" encoding="UTF-8"?> To accomplish this we can use an XSLT like the one listed below: <?xml
version="1.0"?> If you paid attention to the discussion on normalization you probably are wondering what good would do to us to break the original record into three distinct sections if there is no way to relink them. Well, there is a way to solve this. Remember that we said that in a transformation we can insert new tags as required. This means that we could add, for example, an <ID> tag to each of the segments of the resulting record. Since we don't have any indication of an ID in the original record we could simple assign record count as the new ID. In other words, all segments from the first record will have ID = 1, those from the second will have ID = 2, etc. For example, take a source XML document containing two records as shown in the listing below: <?xml
version="1.0"?> We would like the transformed XML to look like the listing below because with an XML instance document such as this it would be easy to load the data into a database and then to create the foreign key constraints necessary to rebuild the original data.: <?xml
version="1.0" encoding="UTF-8"?> The XSLT instruction that allows to this is xsl:number. Its specification is as follows: <xsl:number Its effect is to emit a number based on the XPath number expression found in value. All is required now is to add the new tag, namely <ID> to each segment of the original XSLT and to assign the proper value using xsl:number. The listing would look like this: <?xml
version="1.0"?> What
Do You Know?
(1) Modify the previous XSLT so that there is also an index tag <IDX> in the description section. This will permit entering additional description information related to the same incident, for example updates on the severity and the number of individuals affected. See it done (SWF file). (2) The example below shows the use of the concat() function. As shown, an XML document where the name of the person is broken into three pieces can be recast in the form of a single concatenated string. <?xml
version="1.0"?> The XSLT to accomplish this is shown below: <?xml
version="1.0"?> Use the same technique to transform the input file shown below: <?xml
version="1.0"?> The output file should contain an incident section made up of the <ID> and a name for the incident made up of the concatenation of the city and the year values (see listing below): <?xml
version="1.0" encoding="UTF-8"?> Submit your work by email to your instructor. Appendix—Listing
of XSLT Instructions
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||