This question has been significantly edited to make things a bit clearer.
I am attempting to pull data out of the electronic Code of Federal Regulations XML feed (http://www.gpo.gov/fdsys/bulkdata/CFR/2015/title-15/CFR-2015-title15-vol2.xml) and am having trouble.
Specifically, I'd like to grab data that will be matched by a combination of Node and Attribute. In the following snippet of XML, you can see some of the text I'd like to grab. I would like to obtain the data for each FP node where the attribute FP-2 is present. I would also like to grab the data for each FP node having the attribute FP-1.
<APPENDIX>
<EAR>Pt. 774, Supp. 1</EAR>
<HD SOURCE="HED">Supplement No. 1 to Part 774—The Commerce Control List</HD>
<HD SOURCE="HD1">Category 0—Nuclear Materials, Facilities, and Equipment [and Miscellaneous Items]</HD>
<HD SOURCE="HD1">A. “End Items,” “Equipment,” “Accessories,” “Attachments,” “Parts,” “Components,” and “Systems”</HD>
<FP SOURCE="FP-2">
<E T="02">0A002Power generating or propulsion equipment “specially designed” for use with space, marine or mobile “nuclear reactors”. (These items are “subject to the ITAR.” See 22 CFR parts 120 through 130.)</E>
</FP>
<FP SOURCE="FP-2">
<E T="02">0A018Items on the Wassenaar Munitions List (see List of Items Controlled).</E>
</FP>
<FP SOURCE="FP-1">
<E T="04">License Requirements</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Reason for Control:</E> NS, AT, UN</FP>
<GPOTABLE CDEF="s50,r50" COLS="2" OPTS="L2">
<BOXHD>
<CHED H="1">Control(s)</CHED>
<CHED H="1">Country Chart (See Supp. No. 1 to part 738)</CHED>
</BOXHD>
<ROW>
<ENT I="01">NS applies to entire entry</ENT>
<ENT>NS Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">AT applies to entire entry</ENT>
<ENT>AT Column 1.</ENT>
</ROW>
<ROW>
<ENT I="01">UN applies to entire entry</ENT>
<ENT>See § 746.1(b) for UN controls.</ENT>
</ROW>
</GPOTABLE>
<FP SOURCE="FP-1">
<E T="05">List Based License Exceptions (See Part 740 for a description of all license exceptions)</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">LVS:</E> $3,000 for 0A018.b</FP>
<FP SOURCE="FP-1">$1,500 for 0A018.c and .d</FP>
<FP SOURCE="FP-1">
<E T="03">GBS:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="03">CIV:</E> N/A</FP>
<FP SOURCE="FP-1">
<E T="04">List of Items Controlled</E>
</FP>
<FP SOURCE="FP-1">
<E T="03">Related Controls:</E> (1) See also 0A979, 0A988, and 22 CFR 121.1 Categories I(a), III(b-d), and X(a). (2) See ECCN 0A617.y.1 and .y.2 for items formerly controlled by ECCN 0A018.a. (3) See ECCN 1A613.c for military helmets providing less than NIJ Type IV protection and ECCN 1A613.y.1 for conventional military steel helmets that, immediately prior to July 1, 2014, were classified under 0A018.d and 0A988. (4) See 22 CFR 121.1 Category X(a)(5) and (a)(6) for controls on other military helmets.</FP>
<FP SOURCE="FP-1">
<E T="03">Related Definitions:</E> N/A</FP>
<FP>
<E T="03">Items:</E> a. [Reserved]</FP>
<P>b. “Specially designed” components and parts for ammunition, except cartridge cases, powder bags, bullets, jackets, cores, shells, projectiles, boosters, fuses and components, primers, and other detonating devices and ammunition belting and linking machines (all of which are “subject to the ITAR.” (See 22 CFR parts 120 through 130);</P>
<NOTE>
<HD SOURCE="HED">
<E T="03">Note:</E>
</HD>
<P>
<E T="03">0A018.b does not apply to “components” “specially designed” for blank or dummy ammunition as follows:</E>
</P>
<P>
<E T="03">a. Ammunition crimped without a projectile (blank star);</E>
</P>
</APPENDIX>
To complicate matters, I'm trying to pull this data into Filemaker, but upon edit, I'll stick to simple XSL.
The following XSL grabs all of the FP nodes without differentiation.
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:for-each select="//FP">
<xsl:value-of select="."/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Modifying this to match on xsl:template match="FP[#SOURCE='FP-1'] allows me to make the necessary match based on the attribute, but I'm still not clear on how to capture the data I need. Thoughts?
A few things:
Your XSLT actually is not an XSLT format
In XPath, to reference an attribute (i.e., SOURCE), it must be prefixed with #.
Finally, there are many FP1s and FP2s but your setup only choose first instances.
Consider the following XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8"/>
<xsl:template match="/">
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<xsl:for-each select="//FP[#SOURCE = 'FP-2']/E[#T='02']">
<ROW>
<COL>
<DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
</COL>
</ROW>
</xsl:for-each>
<xsl:for-each select="//FP[#SOURCE = 'FP-1']/E[#T='02']">
<ROW>
<COL>
<DATA><xsl:value-of select="substring(.,1,5)"/></DATA>
</COL>
</ROW>
</xsl:for-each>
</RESULTSET>
</FMPXMLRESULT>
</xsl:template>
</xsl:stylesheet>
Which would output:
<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
</ROW>
</RESULTSET>
</FMPXMLRESULT>
And partial output of full web link xml:
<?xml version='1.0' encoding='UTF-8'?>
<FMPXMLRESULT xmlns="http://www.filemaker.com/fmpxmlresult">
<METADATA>
<FIELD NAME="ECCNFP_2" TYPE="TEXT"/>
<FIELD NAME="ECCNFP_1" TYPE="TEXT"/>
</METADATA>
<RESULTSET>
<ROW>
<COL>
<DATA>2A000</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A002</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A018</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A521</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A604</DATA>
</COL>
</ROW>
<ROW>
<COL>
<DATA>0A606</DATA>
</COL>
</ROW>
...
In fact, point your XSLT processor to the GPO link and all FP1s and FP2s output. I just did so with Python! Close to 3,000 lines!
Your question is still not clear. If I concentrate on this part:
I would like to obtain the data for each FP node where the attribute
FP-2 is present. I would also like to grab the data for each FP node
having the attribute FP-1.
then you probably want to change this:
<xsl:for-each select="//FP">
to:
<xsl:for-each select="//FP[#SOURCE='FP-1' or #SOURCE='FP-2']">
Note that this returns the value of each FP element where the SOURCE attribute has a value of either 'FP-1' or 'FP-2'. I see no "FP node where the attribute FP-2 is present" in your input.
Note also that the // syntax is expensive in terms of processing power. You will get better performance if you use a full, explicit path.
Related
Given the following xml
<dsQueryResponse>
<Proposals>
<Rows>
<Row ID="1"/>
<Row ID="2"/>
<Row ID="3"/>
</Rows>
</Proposals>
<ProposalReviewers>
<Rows>
<Row ID="1" ProposalID="1"/>
<Row ID="2" ProposalID="1"/>
<Row ID="3" ProposalID="2"/>
</Rows>
</ProposalReviewers>
</dsQueryResponse>
What xpath expression, or XSLT transform (Xslt 1.0), will give me the following output, based on the values of attribute ProposalID?
<Rows>
<Row ID="1"/>
<Row ID="2"/>
</Rows>
I know if I'm running inside of a for-each I can use current(), but I am hoping to do this outside the for-each.
Your question can be read in many ways, and I am mostly guessing here. Still, the only logical way to return only the two Rows from the input that has three (or six), seems to be this:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="proposal" match="Proposals/Rows/Row" use="#ID" />
<xsl:template match="/dsQueryResponse">
<Rows>
<xsl:copy-of select="key('proposal', ProposalReviewers/Rows/Row/#ProposalID)"/>
</Rows>
</xsl:template>
</xsl:stylesheet>
To understand how this works, see: https://www.w3.org/TR/xslt/#key
In the below XML, need to replace the namespace by using XPath.
<application xmlns="http://ns.adobe.com/air/application/4.0">
<child id="1"></child>
<child id="2"></child>
</application>
I tried with
/application/#xmlns
and
/*[local-name()='application']/#[local-name()='xmlns']
Both failed to give the desire output. To replace the text, I have used xmltask replace.
<xmltask source="${temp.file1}" dest="${temp.file1}">
<replace path="/application/#xmlns" withText="http://ns.adobe.com/air/application/16.0" />
</xmltask>
The problem is that xmlns is not an attribute. You cannot select it with XPath.
A namespace is part of the node name in XML: <foo xmlns="urn:foo-namespace" /> and <foo xmlns="urn:bar-namespace" /> are not two nodes with the same name and different attributes, they are two nodes with different names and no attributes.
If you want to change a namespace, you must construct a completely new node.
XSLT is better-suited to this task:
<!-- update-air-ns.xsl -->
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:air4="http://ns.adobe.com/air/application/4.0"
xmlns="http://ns.adobe.com/air/application/16.0"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="air4:*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
</xsl:transform>
This XSLT transformation does two things:
the first template (identity template) copies nodes recursively, unless there is a better matching template for a given node
the second template matches elements in the air4 namespace and constructs new elements that have the same local name but a different namespace. This happens because of the default namespace declaration in the XSLT. The http://ns.adobe.com/air/application/16.0 namespace is used for all newly constructed elements.
Applied to your input XML, the result is
<application xmlns="http://ns.adobe.com/air/application/16.0">
<child id="1"/>
<child id="2"/>
</application>
You can use Ant's xslt task:
<xslt in="${temp.file1}" out="${temp.file1}" style="update-air-ns.xsl" />
I really do hope that my title is at least a bit clear.
important: i can only use xslt 1.0 because the project needs to work with the MSXML XSLT processor.
What I try to do:
I generate documents containing information about rooms. Rooms have walls, I need the sum of wall area of these per room.
The input xml file I get is dynamically created by another program.
Changing the structure of the input xml file is not the solution, trust me, it's needed like that and is much more complex than I show you here.
My XML (the innerArea attribute in the wall element has to get summed up):
<root>
<floor id="30" name="EG">
<flat name="Wohnung" nr="1">
<Room id="49" area="93.08565">
<WallSegments>
<WallSegment id="45"/>
<WallSegment id="42"/>
<WallSegment id="39"/>
</WallSegments>
</Room>
</flat>
</floor>
<components>
<Wall id="20" innerArea="20.7654"/>
<wallSegment id="45" wall="20">[...]</wallSegment>
<Wall id="21" innerArea="12.45678"/>
<wallSegment id="42" wall="21">[...]</wallSegment>
<Wall id="22" innerArea="17.8643"/>
<wallSegment id="39" wall="22">[...]</wallSegment>
</components>
</root>
With my XSLT I was able to reach the values of the walls which belong to a room.
But I have really no idea how I could get the sum of the value out of that.
My XSLT:
<xsl:for-each select="flat/Room">
<xsl:for-each select="WallSegments/WallSegment">
<xsl:variable name="curWallSegId" select="#id"/>
<xsl:for-each select="/root/components/wallSegment[#id = $curWallSegId]">
<xsl:variable name="curWallId" select="#wall"/>
<xsl:for-each select="/root/components/Wall[#id = $curWallId]">
<!--I didn't expect that this was working, but at least I tried :D-->
<xsl:value-of select="sum(#AreaInner)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
Desired Output should be something like...
[...]
<paragraph>
Room 1:
Wall area: 51.09 m²
[...]
</paragraph>
[...]
So I hope I described my problem properly. If not: I am sorry, you may beat me right into the face x)
It's best to use keys to get "related" data. Place this at the top of your stylesheet, outside of any template:
<xsl:key name="wall" match="components/Wall" use="#id" />
<xsl:key name="wallSegment" match="components/wallSegment" use="#id" />
Then:
<xsl:for-each select="flat/Room">
<paragraph>
<xsl:text>Room </xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>:
Wall area: </xsl:text>
<xsl:value-of select="format-number(sum(key('wall', key('wallSegment', WallSegments/WallSegment/#id)/#wall)/#innerArea), '0.00m²')"/>
<xsl:text>
</xsl:text>
</paragraph>
</xsl:for-each>
will return:
<paragraph>Room 1:
Wall area: 51.09m²</paragraph>
If what you need it's the area of every room, this is a way of getting it:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/root/floor">
<xsl:for-each select="flat/Room">
<xsl:variable name="currentRoomSegmentsIds" select="WallSegments/WallSegment/#id"/>
<xsl:variable name="currentRoomWallsIds" select="/root/components/wallSegment[#id = $currentRoomSegmentsIds]/#wall"/>
<xsl:variable name="currentRoomWallsInnerAreas" select="/root/components/Wall[#id = $currentRoomWallsIds]/#innerArea"/>
Id of the room = <xsl:value-of select="#id"/>.
Area of the room = <xsl:value-of select="sum($currentRoomWallsInnerAreas)"/>
</xsl:for-each> <!-- Enf of for each room -->
</xsl:template>
</xsl:stylesheet>
This produces the following result:
Id of the room = 49.
Area of the room = 51.08648
have one XML Structure:
<INFO>
<para Type="07">07 L„hetysluettelo</para>
<para Type="+0">+0 074064</para>
<para Type="07">07 Tilausnumero Ostajan viite</para>
<para Type="+0">+0 044275 5549177</para>
<para Type=" 0"> 0 836679586 (LONG 2478 3.63 8995.14</para>
<para Type="07">07 L„hetysluettelo2</para>
<para Type="+0">+0 074517</para>
<para Type="07">07 Tilausnumero Ostajan viite</para>
<para Type="+0">+0 044276 5534435</para>
<para Type=" 0"> 0 836679586 (LONG 2478 3.63 8995.14</para>
<para Type=" 0"> 0 L1 KUORMAL. 800 14 0.00 0.00</para>
</INFO>
I would like to use loop on para which has type SPACE0. By doing this i would alos like to fetch its previous data elements for example: +0 and 07 segments.
I am looking for this output:
<ROWS>
<ROW>
<NUMBER>074064</NUMBER>
<ORDER_NUMBER>044275</ORDER_NUMBER>
<REF>5549177</REF>
<NAME>836679586 (LONG</NAME>
<COUNT>2478</COUNT>
<PRICE>3.63</PRICE>
<TOTAL>8995.14</TOTAL>
</ROW>
<ROW>
<NUMBER>074517</NUMBER>
<ORDER_NUMBER>044276</ORDER_NUMBER>
<REF>5534435</REF>
<NAME>836679586 (LONG</NAME>
<COUNT>2478</COUNT>
<PRICE>3.63</PRICE>
<TOTAL>8995.14</TOTAL>
</ROW>
<ROW>
<NUMBER>074517</NUMBER>
<ORDER_NUMBER>044276</ORDER_NUMBER>
<REF>5534435</REF>
<NAME>L1 KUORMAL. 800</NAME>
<COUNT>14</COUNT>
<PRICE>0.00</PRICE>
<TOTAL>0.00</TOTAL>
</ROW>
</ROWS>
in some cases the elements NUMBER, ORDER_NUMBER, and REF value will remain the same because they belong to same segment.
Is it possible to do this?
I have tried this:
<xsl:for-each select="INFO/para[#Type=' 0']">
<ROW>
<NUMBER>
<xsl:value-of select="normalize-space(substring(../following-sibling::para[1],6,24))"/>
</NUMBER>
<ORDER_NUMBER>
<xsl:value-of select="normalize-space(substring(../following-sibling::para[3],7,24))"/>
</ORDER_NUMBER>
<REF>
<xsl:value-of select="normalize-space(substring(../following-sibling::para[3],31,50))"/>
</REF>
<NAME>
<xsl:value-of select="normalize-space(substring(.,7,24))"/>
</NAME>
<COUNT>
<xsl:value-of select="normalize-space(substring(.,33,8))"/>
</COUNT>
<PRICE>
<xsl:value-of select="normalize-space(substring(.,41,11))"/>
</PRICE>
<TOTAL>
<xsl:value-of select="normalize-space(substring(.,69,11))"/>
</TOTAL>
</ROW>
</xsl:for-each>
Thanks.
By using substring functions and nested loops, it worked well. So marking it closed.
I have a large collection of xml documents with a wide array of different tags in them. I need to change all tags of the form <foo> and turn them into tags of the form <field name="foo"> in a way that will also ignore the attributes of a given tag. That is, a tag of the form <foo id="bar"> should also be changed to the tag <field name="foo">.
In order for this transformation to work, I also need to distinguish between <foo> and </foo>, as </foo> must go to </field>.
I have played around with sed in a bash script, but to no avail.
Although sed is not ideal for this task (see comments; further reading: regular, context-free grammar and xml), it can be pressed into service. Try this one-liner:
sed -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
First it will replace all end tags with </field>, then replace every open tag first words with <field name="firstStoredWord">
This solution prints everything on the standard output. If you want to replace it in file directly when processing, try
sed -i -e 's/<\([^>\/\ ]*\)[^>]*>/<field name=\"\1\">/g' -e 's/<field name=\"\">/<\/field>/g' file
That makes from
<html>
<person>
but <person name="bob"> and <person name="tom"> would both become
</person>
this
<field name="html">
<field name="person">
but <field name="person"> and <field name="person"> would both become
</field>
Sed is the wrong tool for the job - a simple XSL Transform can do this much more reliably:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="foo">
<field name="foo">
<xsl:apply-templates/>
</field>
</xsl:template>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Note that unlike sed, it can handle short empty elements, newlines within tags (e.g. as produced by some tools), and just about anything that's well-formed XML. Here's my test file:
<?xml version="1.0"?>
<doc>
<section>
<foo>Plain foo, simple content</foo>
</section>
<foo attr="0">Foo with attr, with content
<bar/>
<foo attr="shorttag"/>
</foo>
<foo
attr="1"
>multiline</foo
>
<![CDATA[We mustn't transform <foo> in here!]]>
</doc>
which is transformed by the above (using xsltproc 16970175.xslt 16970175.xml) to:
<?xml version="1.0"?>
<doc>
<section>
<field name="foo">Plain foo, simple content</field>
</section>
<field name="foo">Foo with attr, with content
<bar/>
<field name="foo"/>
</field>
<field name="foo">multiline</field>
We mustn't transform <foo> in here!
</doc>