George Mason University
Process Improvement
 

 

XSLT Transformations

 

Executive Summary

This chapter concludes our exploration of XML technologies and their application to data integration with a discussion of  XSLT, a very rich and complex tool.  We cover only the salient characteristics of XSLT, namely, the fact that XSLT uses XPath constructs, as well as the intimate connection between the models of the data to be transformed (XSD's) and the formulation of the transformations themselves.  Using a handful of XSLT instructions we learn how to conduct in effect data normalization so that the resulting data sets can be readily imported into a database.  We also see how new data can be built from the source data, and, lastly, we learn how to process multiple records using the same template.

Introduction

In the preceding chapters, we have learned not only to build well-formed XML documents, and to specify their structure for validation via XSD's, but, more importantly, we have also learned to 'model' the source data using this powerful technology.  However, as we also saw there, the same data can be modeled by different people in vastly different ways.  The consequence of this fact is that, absent some agreement among the data providers and users, we must be prepared to recast one XML vocabulary into another.  Fortunately, there is already another XML technology that provides exactly that capability.  Welcome to XML Stylesheet Language Transformation, or XSLT for short.

A Simple Scenario

The best way to begin learning about XSLT is to look at a simple case.  Let's assume that  Healthcare Provider A already has modeled its data and has a schema for all its XML documents as shown in the listing below.

<?xml version="1.0" encoding="UTF-8"?>
<!--
W3C Schema generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)-->
<
xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
      
<xs:element name="cases">
            
<xs:complexType>
                   
<xs:sequence>
                          
<xs:element ref="case" maxOccurs="unbounded"/>
                   
</xs:sequence>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="case">
            
<xs:complexType>
                    <xs:sequence>
                          
<xs:element ref="physician"/>
                          
<xs:element ref="diagnosis"/>
                          
<xs:element ref="visit"/>
                   
</xs:sequence>
                   
<xs:attribute name="patient" type="xs:string" use="required"/>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="diagnosis">
            
<xs:complexType>
                   
<xs:attribute name="code" type="xs:string" use="required"/>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="physician">
            
<xs:complexType>
             
      
<xs:attribute name="name" type="xs:string" use="required"/>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="visit">
            
<xs:complexType>
                   
<xs:attribute name="date" type="xs:date" use="required"/>
            
</xs:complexType>
      
</xs:element>
</
xs:schema>

Documents built according to this schema would look like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="C:\Data\04_Word\HSCI720\A2B.xsl"?>

<
cases>
      
<case patient="John Doe">
            
<physician name="Peter Alonzo"/>
            
<diagnosis code="heart arrhythmia"/>
            
<visit date="2004-03-22"/>
      
</case>
</
cases>

Let's suppose that the data model chosen by Healthcare Provider B for its data is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!--
edited with XMLSPY v2004 rel. 3 U (http://www.xmlspy.com) by DR. FRANCISCO LOAIZA (IDA) -->
<!--
W3C Schema generated by XMLSPY v2004 rel. 3 U (http://www.xmlspy.com)-->
<
xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
      
<xs:element name="Case">
            
<xs:complexType>
                   
<xs:all>
                          
<xs:element ref="Patient" minOccurs="0"/>
                          
<xs:element ref="Clinician" minOccurs="0"/>
                          
<xs:element ref="Date" minOccurs="0"/>
                          
<xs:element ref="Diagnosis" minOccurs="0"/>
                          
<xs:element ref="Treatment" minOccurs="0"/>
                   
</xs:all>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="CaseReports"> |
             <xs:complexType>
                   
<xs:sequence>
                          
<xs:element ref="Case"/>
                   
</xs:sequence>
            
</xs:complexType>
      
</xs:element>
      
<xs:element name="Clinician" type="xs:string"/>
      
<xs:element name="Date" type="xs:date"/>
      
<xs:element name="Diagnosis" type="xs:string"/>
      
<xs:element name="Patient" type="xs:string"/>
      
<xs:element name="Treatment" type="xs:string"/>
</
xs:schema>

XML documents built according to this schema would look like this:

<?xml version="1.0"?>
<
CaseReports>
      
<Case>
            
<Patient>Jane Doe</Patient>
            
<Clinician>Dr. Phillips</Clinician>
            
<Date>2004-03-22</Date>
            
<Diagnosis>herniated disk</Diagnosis>
            
<Treatment>physiotherapy</Treatment>
      
</Case>
</
CaseReports>
 

From One Schema to Another

If Healthcare Provider B wanted to use the data collected by Healthcare Provider A we need to figure out a way to process its data so that after we manipulate it the resulting XML document will conform to the schema it has already adopted.

As we look at both schemas we can see that there are substantial areas of overlap.  For example, it is fair to assume that the value assigned to physician in the first schema is the same as the one covered by Clinician in the target schema.  We also see that both schemas have the concept of 'patient', 'diagnosis', as well as 'date'.

The complete mapping showing the specifics of each XSD would look like this:

Schema A

Schema B

Concept

XSD

Concept

XSD

name

Attribute of element physician

Clinician

Element

code

Attribute of element diagnosis

Diagnosis

Element

patient

Attribute of element case

Patient

Element

date

Attribute of element visit

Date

Element

N/A

N/A

Treatment

Element

What is needed now is a way to extract the appropriate pieces of data from the source document and put them into the right container of the target document.  The way XSLT accomplishes this is by specifying via XPath expressions how to fetch the content from the source document and then providing the XML tags that will go with that content.

For the first row in the table above we could express this is words as follows:

  1. Traverse the source document and find the attribute 'name' in the element 'physician',

  2. Fetch the value it currently has,

  3. Make that value the content of the element 'Clinician'.

We are now almost ready to begin writing our first XSLT.  All we need now is to learn what the proper XPath expressions are that will accomplish the steps delineated above.  But before we do that we need to understand one more concept.  As we learned in the very beginning of the course, XML documents are essentially equivalent to what is known in graph theory as a 'tree'.  Trees are made of connecting lines and nodes.  In an XML document there are no explicit lines connecting the tags (i.e., the element nodes), instead we use nesting.  If a tag is inside another tag that's equivalent to connecting the parent tag to the child tag.  In addition to the element nodes (i.e., the XML tags) we also have in XML attribute nodes, text nodes, comment nodes, processing instruction nodes and namespace nodes.  Every XML document is made of, or more precisely, maps to a tree structure made of the kinds of nodes just listed.

The reason for spending some time thinking about this is that in order to use XPath effectively we need to understand the concept of the path operator.  If you ever used the command line interface in MSDOS or in Unix you already know what the path operator looks like.  Basically, it is a string separated by forward slashes where each chunk in between the slashes corresponds to a node of our XML tree graph.  The XPath convention is that if a node is an attribute node it must be prefixed by the symbol '@'.  The path operator is always accompanied by the select expression.  Once the select and path operator have been correctly built we need to invoke the appropriate XSLT instruction that will act on the node specified by the path operator.

About 99% of the time we will need only two XSLT instructions to accomplish most of the transformations we are interested in.  The first one is xsl:value-of, which, as its name suggests emits the string corresponding to the select expression.  The anatomy of the XSL Transformation we just described is then:

<xsl:value-of select="/path/to/node"/>

Thus, for example, the XSL Transformation for fetching the name of the physician would be:

<xsl:value-of select="cases/case/physician/@name"/>

Our First XSLT – Part 1

To write our first XSLT we need to make sure the processing application (e.g., XMLSpy) can properly identify it as such.  To that effect, just as with XSD's one places the stylesheet root element at the beginning:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

Next we put <xsl:template match="/"> to indicate that this is a template that applies to the whole document.  After that we put the tag or tags that will show up in the resulting document, and between the tags we put the appropriate XSLT expressions.  For our first example we are only going to output an XML document consisting of the root tag, the record delimiter tag and one tag, namely, <Clinician>.  Therefore, our first XSLT would have as the next lines: 

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<
xsl:template match="/">
<
CaseReports>
<
Case>
<
Clinician><xsl:value-of select="cases/case/physician/@name"/></Clinician>
</
Case>

</CaseReports>
</
xsl:template>
</
xsl:stylesheet>

To carry out the transformation it is necessary to add a processing instruction in the source XML document that links it to the XSLT that we just created.  This is done using the following processing instruction:

<?xml-stylesheet type="text/xsl" href="C:\Data\04_Word\HSCI720\A2B.xsl"?>

The modified XML document now looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="C:\Data\04_Word\HSCI720\A2B.xsl"?>

<
case patient="John Doe">
      
<physician name="Peter Alonzo"/>
      
<diagnosis code="heart arrhythmia"/>
      
<visit date="2004-03-22"/>
</
case>

Executing the transformation produces the following output:

<?xml version="1.0" encoding="UTF-8"?>
<
CaseReports>
      
<Case>
            
<Clinician>Peter Alonzo</Clinician>
      
</Case>
</
CaseReports>

We can now add the path to the schema used by Healthcare Provider B to validate the document.

<?xml version="1.0" encoding="UTF-8"?>
<
CaseReports xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\Data\04_Word\HSCI720\XSL_02.xsd">
      
<Case>
            
<Clinician>Peter Alonzo</Clinician>
      
</Case>
</
CaseReports>

Using XMLSpy one can test that the file is in fact valid.

Alternative XSLT

The preceding example was created using the commercial application XMLSpy.  There are, however, freeware applications that accomplish the same task.  A very popular one is XALAN, which can be downloaded from http://xml.apache.org/xalan.

An even simpler XSLT engine is the one offered by Microsoft.  The executable can be placed in any directory of your choice, and invoked by typing after the prompt msxml.

C:\Data\04_Word\HSCI720>msxsl

Microsoft (R) XSLT Processor Version 4.0
Usage: MSXSL source stylesheet [options] [param=value...] [xmlns:prefix=uri...]
Options:

    -?            Show this message
   
-o filename   Write output to named file
   
-m startMode  Start the transform in this mode
   
-xw           Strip non-significant whitespace from source and stylesheet

   
-xe           Do not resolve external definitions during parse phase
   
-v            Validate documents during parse phase
   
-t            Show load and transformation timings
   
-pi           Get stylesheet URL from xml-stylesheet PI in source document
   
-u version    Use a specific version of MSXML: '2.6', '3.0', '4.0'
   
-             Dash used as source argument loads XML from stdin
   
-             Dash used as stylesheet argument loads XSL from stdin

For our example we can type:

C:\Data\04_Word\HSCI720>msxsl XSLT_01.xml A2B.xsl -o A2B_2.xml

Where XSLT_01.xml is the file we want to transform, A2B.xsl is the XSLT we are going to use and -o A2B_2.xml indicates the name of the output file.  After running the application the resulting file looks like this:

<?xml version="1.0" encoding="UTF-16"?>
<
CaseReports>
    
<Case>
         
<Clinician>Peter Alonzo</Clinician>
    
</Case>
</
CaseReports>

Our First XSLT – Part 2

Class exercise: Complete the XSLT for all the remaining nodes in the source XML and test the transformation.  Use MS msxml.exe for the transformation, and then validate the file using XMLSpy.

Repeat Elements

The previous sections have shown the basic concepts of the XSL Transformation, namely, the use of the processing instruction for stylesheets, the select and path operator, the concept of a template using something like <xsl:template match="/">, and the transformation instruction xsl:value-of.  In addition we have also tested two engines that can accomplish XSL transformations.  In real life, though, the XML sources will not consist of just one record.  In fact there would be little point in writing and testing a whole transformation script to process one instance alone.  So the real power lies in being able to process thousands of records from a source and have them recast in an XML vocabulary that conforms to the target XSD.

As we alluded in the preceding section, the second most used XSLT instruction is the one that let us process multiple records in a single pass.  This is the xsl:for-each instruction.  Its specification is as follows:

<xsl:for-each select = node-set-expression>
 
<!- - Content: (xsl:sort*, template) - ->
</xsl:for-each>

 

When the xsl:for-each is invoked it evaluates the template against each node in the path operator (node-set-expression) returned by the select expression. The order of evaluation can be influenced using one or more xsl:sorts.

With this in mind let's look at an XML source data example from Healthcare Provider A containing more than one record:

<cases>
      
<case patient="John Doe">
            
<physician name="Peter Alonzo"/>
            
<diagnosis code="heart arrhythmia"/>
            
<visit date="2004-03-22"/>
      
</case>
      
<case patient="Mary Jones">
            
<physician name="Hans Allers"/>
            
<diagnosis code="high cholesterol"/>
            
<visit date="2004-03-20"/>
      
</case>
      
<case patient="Sandy Mullens">
            
<physician name="Emily Lang"/>
            
<diagnosis code="dislocated shoulder"/>
            
<visit date="2004-02-19"/>
      
</case>
</
cases>

To transform it all we need is to let the transformation engine that there are multiple instances of the tag <case> and that all need to be processed in the same way.  A possible solution for accomplishing this is shown below:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      
<xsl:template match="/">
            
<CaseReports>
                   
<xsl:for-each select="cases/case">
                          
<Case>
                                
<Clinician>
                                       
<xsl:value-of select="physician/@name"/>
                                
</Clinician>
                          
</Case>
                   
</xsl:for-each>
            
</CaseReports>
      
</xsl:template>
</
xsl:stylesheet>  

The msxsl transformation engine can be invoked using a command like this:

C:\Data\04_Word\HSCI720>msxsl XSLT_01B.xml A2B.xsl -o A2B_4.xml

The resulting file is shown below:

<?xml version="1.0" encoding="UTF-16"?>
<
CaseReports>
      
<Case>
            
<Clinician>Peter Alonzo</Clinician>
      
</Case>
      
<Case>
            
<Clinician>Hans Allers</Clinician>
      
</Case>
      
<Case>
            
<Clinician>Emily Lang</Clinician>
      
</Case>
</
CaseReports>

Transforming a Flat Table

The sections above have shown how we can extract data from one XML source and recast it into a new XML vocabulary.  The XSLT also allow us to insert new tags at the places we desire.  We can use this capability to transform a flat table into a series of tables, i.e., to normalize the data as required.

The listing below shows a single record from a notional CDC report on an anthrax related incident. 

<?xml version="1.0"?>
<
CDC_REPORT>
      
<INCIDENT>
            
<CITY>New York City</CITY>
            
<STATE>New York</STATE>
            
<SEVERITY>High</SEVERITY>
            
<PEOPLE_QY>139</PEOPLE_QY>
            
<CONTAMINANT>Anthrax</CONTAMINANT>
            
<FATALITIES>23</FATALITIES>
            
<FEMALE_ADULT>87</FEMALE_ADULT>
            
<MALE_ADULT>32</MALE_ADULT>
            
<CHILDREN>20</CHILDREN>
            
<TREATMENT>ANTIBIOTICS</TREATMENT>
            
<YEAR>2001</YEAR>
      
</INCIDENT>
</
CDC_REPORT>

As we can see, the structure of the record is similar to that of a row in a flat table.  One could begin to normalize the data by breaking it into its logical components.  For example, the data could be recast to look like the listing below:

<?xml version="1.0" encoding="UTF-8"?>
<
CDC_REPORT>
      
<INCIDENT>
            
<LOCATION>
                   
<CITY>New York City</CITY>
                   
<STATE>New York</STATE>
            
</LOCATION>
            
<DESCRIPTION>
                   
<YEAR>2001</YEAR>
                   
<SEVERITY>High</SEVERITY>
                   
<PEOPLE_QY>139</PEOPLE_QY>
                   
<CONTAMINANT>Anthrax</CONTAMINANT>
            
</DESCRIPTION>
            
<STATISTICS>
                   
<FATALITIES>23</FATALITIES>
                   
<FEMALE_ADULT>87</FEMALE_ADULT>
                   
<MALE_ADULT>32</MALE_ADULT>
                   
<CHILDREN>20</CHILDREN>
            
</STATISTICS>
      
</INCIDENT>
</
CDC_REPORT>

To accomplish this we can use an XSLT like the one listed below:

<?xml version="1.0"?>
<
xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      
<xsl:output method="xml" indent="yes"/>
      
<xsl:template match="/">
            
<CDC_REPORT xmlns:dt="urn:schemas-microsoft-com:datatypes">
                   
<xsl:for-each select="CDC_REPORT/INCIDENT">
                          
<INCIDENT>
                                
<LOCATION>
                                       
<CITY>
                                              
<xsl:value-of select="CITY"/>
             
                          
</CITY>
                                       
<STATE>
                                              
<xsl:value-of select="STATE"/>
                                       
</STATE>
                                
</LOCATION>
                                
<DESCRIPTION>
                                       
<YEAR>
                                              
<xsl:value-of select="YEAR"/>
                                       
</YEAR>
                                       
<SEVERITY>
                                              
<xsl:value-of select="SEVERITY"/>
                                       
</SEVERITY>
       
                                
<PEOPLE_QY>
                                              
<xsl:value-of select="PEOPLE_QY"/>
                                       
</PEOPLE_QY>
                                       
<CONTAMINANT>
                                              
<xsl:value-of select="CONTAMINANT"/>
                                       
</CONTAMINANT>
                                
</DESCRIPTION>
                                
<STATISTICS>
                                       
<FATALITIES>
                                              
<xsl:value-of select="FATALITIES"/>
                                       
</FATALITIES>
                                       
<FEMALE_ADULT>
                                              
<xsl:value-of select="FEMALE_ADULT"/>
                                       
</FEMALE_ADULT>
                                       
<MALE_ADULT>
                                              
<xsl:value-of select="MALE_ADULT"/>
                                       
</MALE_ADULT>
                                       
<CHILDREN>
                                              
<xsl:value-of select="CHILDREN"/>
       
                                
</CHILDREN>
                                
</STATISTICS>
                          
</INCIDENT>
                   
</xsl:for-each>
            
</CDC_REPORT>
      
</xsl:template>
</
xsl:stylesheet>  

If you paid attention to the discussion on normalization you probably are wondering what good would do to us to break the original record into three distinct sections if there is no way to relink them.  Well, there is a way to solve this.  Remember that we said that in a transformation we can insert new tags as required.  This means that we could add, for example, an <ID> tag to each of the segments of the resulting record.  Since we don't have any indication of an ID in the original record we could simple assign record count as the new ID.  In other words, all segments from the first record will have ID = 1, those from the second will have ID = 2, etc.  For example, take a source XML document containing two records as shown in the listing below:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="C:\Data\04_Word\HSCI720\Flat2nonFlat2.xsl"?>

<
CDC_REPORT>
      
<INCIDENT>
            
<CITY>New York City</CITY>
            
<STATE>New York</STATE>
            
<SEVERITY>High</SEVERITY>
            
<PEOPLE_QY>139</PEOPLE_QY>
            
<CONTAMINANT>Anthrax</CONTAMINANT>
            
<FATALITIES>23</FATALITIES>
            
<FEMALE_ADULT>87</FEMALE_ADULT>
            
<MALE_ADULT>32</MALE_ADULT>
            
<CHILDREN>20</CHILDREN>
            
<TREATMENT>ANTIBIOTICS</TREATMENT>
            
<YEAR>2001</YEAR>
      
</INCIDENT>
      
<INCIDENT>
            
<CITY>New York City</CITY>
            
<STATE>New York</STATE>
            
<SEVERITY>Medium</SEVERITY>
            
<PEOPLE_QY>49</PEOPLE_QY>
            
<CONTAMINANT>Anthrax</CONTAMINANT>
            
<FATALITIES>13</FATALITIES>
            
<FEMALE_ADULT>17</FEMALE_ADULT>
            
<MALE_ADULT>12</MALE_ADULT>
            
<CHILDREN>20</CHILDREN>
            
<TREATMENT>ANTIBIOTICS</TREATMENT>
            
<YEAR>2002</YEAR>
      
</INCIDENT>        
</
CDC_REPORT> 

We would like the transformed XML to look like the listing below because with an XML instance document such as this it would be easy to load the data into a database and then to create the foreign key constraints necessary to rebuild the original data.:

<?xml version="1.0" encoding="UTF-8"?>
<
CDC_REPORT>
      
<INCIDENT>
            
<LOCATION>
                   
<ID>1</ID>
                   
<CITY>New York City</CITY>
                   
<STATE>New York</STATE>
            
</LOCATION>
            
<DESCRIPTION>
                   
<ID>1</ID>
                   
<YEAR>2001</YEAR>
                   
<SEVERITY>High</SEVERITY>
                   
<PEOPLE_QY>139</PEOPLE_QY>
                   
<CONTAMINANT>Anthrax</CONTAMINANT>
            
</DESCRIPTION>
            
<STATISTICS>
                   
<ID>1</ID>
                   
<FATALITIES>23</FATALITIES>
                   
<FEMALE_ADULT>87</FEMALE_ADULT>
                   
<MALE_ADULT>32</MALE_ADULT>
                   
<CHILDREN>20</CHILDREN>
            
</STATISTICS>
      
</INCIDENT>
      
<INCIDENT>
            
<LOCATION>
                   
<ID>2</ID>
                   
<CITY>New York City</CITY>
                   
<STATE>New York</STATE>
            
</LOCATION>
            
<DESCRIPTION>
                   
<ID>2</ID>
                   
<YEAR>2002</YEAR>
                   
<SEVERITY>Medium</SEVERITY>
                   
<PEOPLE_QY>49</PEOPLE_QY>
                   
<CONTAMINANT>Anthrax</CONTAMINANT>
            
</DESCRIPTION>
            
<STATISTICS>
                   
<ID>2</ID>
                   
<FATALITIES>13</FATALITIES>
                   
<FEMALE_ADULT>17</FEMALE_ADULT>
                   
<MALE_ADULT>12</MALE_ADULT>
                   
<CHILDREN>20</CHILDREN>
            
</STATISTICS>
      
</INCIDENT>
</
CDC_REPORT>

The XSLT instruction that allows to this is xsl:number.  Its specification is as follows:

<xsl:number
 
level = "single" | "multiple" | "any"
 
count = pattern
 
from = pattern
 
value = number-expression
 
format = { string }
 
lang = { nmtoken }
 
letter-value = { "alphabetic" | "traditional" }
 
grouping-separator = { char }
 
grouping-size = { number } />

Its effect is to emit a number based on the XPath number expression found in value.  All is required now is to add the new tag, namely <ID> to each segment of the original XSLT and to assign the proper value using xsl:number.  The listing would look like this:

<?xml version="1.0"?>
<
xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      
<xsl:output method="xml" indent="yes"/>
      
<xsl:template match="/">
            
<CDC_REPORT>
                   
<xsl:for-each select="CDC_REPORT/INCIDENT">
                          
<INCIDENT>
                                
<LOCATION>
                                 
    
<ID><xsl:number  level="single" count="CDC_REPORT/INCIDENT" format="1"/></ID>
                                       
<CITY>
                                              
<xsl:value-of select="CITY"/>
                                       
</CITY>
                                       
<STATE>
                                              
<xsl:value-of select="STATE"/>
                                       
</STATE>
                                
</LOCATION>
                                
<DESCRIPTION>
                                 
    
<ID><xsl:number  level="single" count="CDC_REPORT/INCIDENT" format="1"/></ID>
                                       
<YEAR>
                                              
<xsl:value-of select="YEAR"/>
                                       
</YEAR>
                                       
<SEVERITY>
                                              
<xsl:value-of select="SEVERITY"/>
                                       
</SEVERITY>
                                       
<PEOPLE_QY>
                                              
<xsl:value-of select="PEOPLE_QY"/>
                                       
</PEOPLE_QY>
                                       
<CONTAMINANT>
                                              
<xsl:value-of select="CONTAMINANT"/>
                                       
</CONTAMINANT>
                                
</DESCRIPTION>
                                
<STATISTICS>
                                 
    
<ID><xsl:number  level="single" count="CDC_REPORT/INCIDENT" format="1"/></ID>
                                       
<FATALITIES>
                                              
<xsl:value-of select="FATALITIES"/>
                                       
</FATALITIES>
                                       
<FEMALE_ADULT>
                                              
<xsl:value-of select="FEMALE_ADULT"/>
                                       
</FEMALE_ADULT>
                                       
<MALE_ADULT>
                                              
<xsl:value-of select="MALE_ADULT"/>
                                       
</MALE_ADULT>
                                       
<CHILDREN>
                                              
<xsl:value-of select="CHILDREN"/>
                                       
</CHILDREN>
                                
</STATISTICS>
                          
</INCIDENT>
                   
</xsl:for-each>
            
</CDC_REPORT>
      
</xsl:template>
</
xsl:stylesheet>  

What Do You Know?

(1) Modify the previous XSLT so that there is also an index tag <IDX> in the description section.  This will permit entering additional description information related to the same incident, for example updates on the severity and the number of individuals affected.  See it done (SWF file). 

(2) The example below shows the use of the concat() function.  As shown, an XML document where the name of the person is broken into three pieces can be recast in the form of a single concatenated string.

<?xml version="1.0"?>
<
Persons>
      
<Person>
            
<FirstName>Farrokh</FirstName>
            
<MiddleInitial>M</MiddleInitial>
            
<LastName>Alemi</LastName>
      
</Person>
</
Persons>
<?xml version="1.0" encoding="UTF-8"?>
<
Persons>
      
<Person>
            
<PersonName>Farrokh M. Alemi</PersonName>
      
</Person>
</
Persons>  

The XSLT to accomplish this is shown below:

<?xml version="1.0"?>
<
xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
      
<xsl:output method="xml" indent="yes"/>
      
<xsl:template match="/">
      
<Persons>
            
<Person>
                   
<PersonName><xsl:value-of select="concat(Persons/Person/FirstName, ' ', Persons/Person/MiddleInitial, '. ', Persons/Person/LastName)"/></PersonName>
            
</Person>
      
</Persons>
      
</xsl:template>
</
xsl:stylesheet>  

Use the same technique to transform the input file shown below:

<?xml version="1.0"?>
<
CDC_REPORT>
      
<INCIDENT>
            
<CITY>New York City</CITY>
            
<STATE>New York</STATE>
            
<SEVERITY>High</SEVERITY>
            
<PEOPLE_QY>139</PEOPLE_QY>
            
<CONTAMINANT>Anthrax</CONTAMINANT>
            
<FATALITIES>23</FATALITIES>
            
<FEMALE_ADULT>87</FEMALE_ADULT>
            
<MALE_ADULT>32</MALE_ADULT>
            
<CHILDREN>20</CHILDREN>
            
<TREATMENT>ANTIBIOTICS</TREATMENT>
            
<YEAR>2001</YEAR>
      
</INCIDENT>
</
CDC_REPORT>

The output file should contain an incident section made up of the <ID> and a name for the incident made up of the concatenation of the city and the year values (see listing below):

<?xml version="1.0" encoding="UTF-8"?>
<
CDC_REPORT>
      
<REPORT>
            
<
INCIDENT>
                   
<ID>1</ID>
                   
<NAME>New York City -- 2001</NAME>
            
</
INCIDENT>
            
<LOCATION>
                   
<ID>1</ID>
                   
<CITY>New York City</CITY>
                   
<STATE>New York</STATE>
            
</LOCATION>
            
<DESCRIPTION>
                   
<ID>1</ID>
                   
<YEAR>2001</YEAR>
                   
<SEVERITY>High</SEVERITY>
                   
<PEOPLE_QY>139</PEOPLE_QY>
                   
<CONTAMINANT>Anthrax</CONTAMINANT>
            
</DESCRIPTION>
            
<STATISTICS>
                   
<ID>1</ID>
                   
<FATALITIES>23</FATALITIES>
                   
<FEMALE_ADULT>87</FEMALE_ADULT>
                   
<MALE_ADULT>32</MALE_ADULT>
                   
<CHILDREN>20</CHILDREN>
            
</STATISTICS>
      
</REPORT>
</
CDC_REPORT>  

Submit your work by email to your instructor.

Appendix—Listing of XSLT Instructions

Instruction

Syntax

Description

xsl:copy-of

<xsl:copy-of

 select = expression />

Emits the node-set corresponding to the select expression.

xsl:value-of

<xsl:value-of

 select = string-expression

 disable-output-escaping = "yes"

 | "no" />

Emits the string corresponding to the select expression.

xsl:if

<xsl:if

 test = boolean-expression>

 <!- - Content: template - ->

</xsl:if>

 

Evaluates the template if and only if the test expression evaluates to true.

xsl:choose

<xsl:choose>

 <!- - Content: (xsl:when+, xsl:otherwise?) - ->

</xsl:choose>

Evaluates the template from the first xsl:when clause whose test expression evaluates to true. If none of the test expressions evaluate to true, then the template contained in the xsl:otherwise clause is evaluated.

xsl:for-each

<xsl:for-each

 select = node-set-expression>

 <!- - Content: (xsl:sort*, template) - ->

</xsl:for-each>

Evaluates the template against each node in node-set returned by the select expression. The order of evaluation can be influenced using one or more xsl:sorts.

xsl:call-template

<xsl:call-template

 name = qname>

 <!- - Content: xsl:with-param* - ->

</xsl:call-template>

Invokes the template rule named by name.

xsl:variable

<xsl:variable

 name = qname

 select = expression>

 <!- - Content: template - ->

</xsl:variable>

Declares a variable named name and initializes it using the select expression or template.

xsl:text

<xsl:text

 disable-output-escaping = "yes" | "no">

 <!- - Content: #PCDATA - ->

</xsl:text>

Emits the text found in #PCDATA. Escaping of the five built-in entities is controlled using disable-output-escaping.

xsl:number

<xsl:number

 level = "single" | "multiple" | "any"

 count = pattern

 from = pattern

 value = number-expression

 format = { string }

 lang = { nmtoken }

 letter-value = { "alphabetic" | "traditional" }

 grouping-separator = { char }

 grouping-size = { number } />

Emits a number based on the XPath number expression found in value.

xsl:copy

<xsl:copy

 use-attribute-sets = qnames>

 <!- - Content: template - ->

</xsl:copy>

Copies the current context node (and associated namespace nodes) to the result tree fragment.

xsl:apply-templates

<xsl:apply-templates

 select = node-set-expression

 mode = qname>

 <!- - Content: (xsl:sort | xsl:with-param)* - ->

</xsl:apply-templates>

Invokes the best-match template rules against the node-set returned by the select expression.

xsl:apply-imports

<xsl:apply-imports />

Promotes the current stylesheet in import precedence.

xsl:message

<xsl:message

 terminate = "yes" | "no">

 <!- - Content: template - ->

</xsl:message>

Emits a message in a processor-dependent manner.

xsl:fallback

<xsl:fallback>

 <!- - Content: template - ->

</xsl:fallback>

Evaluates the template when the parent instruction/directive is not supported by the current processor.

xsl:comment

<xsl:comment>

 <!- - Content: template - ->

</xsl:comment>

Emits an XML comment containing the template as its character data.

xsl:processing-instruction

<xsl:processing-instruction

 name = { ncname }>

 <!- - Content: template - ->

</xsl:processing-instruction>

Emits an XML processing instruction whose [target] is name and whose [children] are based on template.

xsl:element

<xsl:element

 name = { qname }

 namespace = { uri-reference }

 use-attribute-sets = qnames>

 <!- - Content: template - ->

</xsl:element>

Emits an XML element whose [local name] is name, whose [namespace URI] is namespace, and whose [children] are based on template.

xsl:attribute

<xsl:attribute

 name = { qname }

 namespace = { uri-reference }>

 <!- - Content: template - ->

</xsl:attribute>

Emits an XML attribute whose [local name] is name, whose [namespace URI] is namespace, and whose [children] are based on template.

(From XSL Transformations: XSLT Alleviates XML Schema Incompatibility Headaches, by Don Box, Aaron Skonnard, John Lam.  Modified excerpt from Essential XML (Chapter 5), by Don Box, Aaron Skonnard, and John Lam © 2001 Addison Wesley Longman.)

Recommended Reading

XSLT by Doug Tidwell, O'Reilly 2001, ISBN 0-596-00053-7


This page is part of the course on Data Integration, the lecture on XSLT transformations.  This page was first prepared on 3/24/2004 and last revised on 10/22/2011.