Nokogiri XSLT transform using multiple source XML files - ruby

I want to translate XML using Nokogiri. I built an XSL and it all works fine. I ALSO tested it in Intellij. My data comes from two XML files.
My problem occurs when I try to get Nokogiri to do the transform. I can't seem to find a way to get it to parse multiple source files.
This is the code I am using from the documentation:
require 'Nokogiri'
doc1 = Nokogiri::XML(File.read('F:/transcoder/xslt_repo/core_xml.xml',))
xslt = Nokogiri::XSLT(File.read('F:/transcoder/xslt_repo/google.xsl'))
puts xslt.transform(doc1)
I tried:
require 'Nokogiri'
doc1 = Nokogiri::XML(File.read('F:/transcoder/xslt_repo/core_xml.xml',))
doc2 = Nokogiri::XML(File.read('F:/transcoder/xslt_repo/file_data.xml',))
xslt = Nokogiri::XSLT(File.read('F:/transcoder/xslt_repo/test.xsl'))
puts xslt.transform(doc1,doc2)
However it seems transform only takes one argument, so at the moment I am only able to parse half the data I need:
<?xml version="1.0"?>
<package package_id="LB000001">
<asset_metadata>
<series_title>test asset 1</series_title>
<season_title>Number 1</season_title>
<episode_title>ET 1</episode_title>
<episode_number>1</episode_number>
<license_start_date>21-07-2016</license_start_date>
<license_end_date>31-07-2016</license_end_date>
<rating>15</rating>
<synopsis>This is a test asset</synopsis>
</asset_metadata>
<video_file>
<file_name/>
<file_size/>
<check_sum/>
</video_file>
<image_1>
<file_name/>
<file_size/>
<check_sum/>
</image_1>
</package>
How can I get this to work?
Edit:
This is the core_metadata.xml which is created via a PHP code block and the data comes from a database.
<?xml version="1.0" encoding="utf-8"?>
<manifest task_id="00000000373">
<asset_metadata>
<material_id>LB111111</material_id>
<series_title>This is a test</series_title>
<season_title>This is a test</season_title>
<season_number>1</season_number>
<episode_title>that test</episode_title>
<episode_number>2</episode_number>
<start_date>23-08-2016</start_date>
<end_date>31-08-2016</end_date>
<ratings>15</ratings>
<synopsis>this is a test</synopsis>
</asset_metadata>
<file_info>
<source_filename>LB111111</source_filename>
<number_of_segments>2</number_of_segments>
<segment_1 seg_1_start="00:00:10.000" seg_1_dur="00:01:00.000"/>
<segment_2 seg_2_start="00:02:00.000" seg_2_dur="00:05:00.000"/>
<conform_profile definition="hd" aspect_ratio="16f16">ffmpeg -i S_PATH/F_NAME.mp4 SEG_CONFORM 2> F:/Transcoder/logs/transcode_logs/LOG_FILE.txt</conform_profile>
<transcode_profile profile_name="xbox" package_type="tar">ffmpeg -f concat -i T_PATH/CONFORM_LIST TRC_PATH/F_NAME.mp4 2> F:/Transcoder/logs/transcode_logs/LOG_FILE.txt</transcode_profile>
<target_path>F:/profiles/xbox</target_path>
</file_info>
</manifest>
The second XML (file_date.xml) is dynamically create during the trancode process by nokogiri:
<?xml version="1.0"?>
<file_data>
<video_file>
<file_name>LB111111_xbox_230816114438.mp4</file_name>
<file_size>141959922</file_size>
<md5_checksum>bac7670e55c0694059d3742285079cbf</md5_checksum>
</video_file>
<image_1>
<file_name>test</file_name>
<file_size>test</file_size>
<md5_checksum>test</md5_checksum>
</image_1>
</file_data>
I managed to work around this issue by making a call to by hard coding the file_date.xml into the XSLT file:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<package>
<xsl:attribute name="package_id">
<xsl:value-of select="manifest/asset_metadata/material_id"/>
</xsl:attribute>
<asset_metadata>
<series_title>
<xsl:value-of select="manifest/asset_metadata/series_title"/>
</series_title>
<season_title>
<xsl:value-of select="manifest/asset_metadata/season_title"/>
</season_title>
<episode_title>
<xsl:value-of select="manifest/asset_metadata/episode_title"/>
</episode_title>
<episode_number>
<xsl:value-of select="manifest/asset_metadata/episode_number"/>
</episode_number>
<license_start_date>
<xsl:value-of select="manifest/asset_metadata/start_date"/>
</license_start_date>
<license_end_date>
<xsl:value-of select="manifest/asset_metadata/end_date"/>
</license_end_date>
<rating>
<xsl:value-of select="manifest/asset_metadata/ratings"/>
</rating>
<synopsis>
<xsl:value-of select="manifest/asset_metadata/synopsis"/>
</synopsis>
</asset_metadata>
<video_file>
<file_name>
<xsl:value-of select="document('file_data.xml')/file_data/video_file/file_name"/>
</file_name>
<file_size>
<xsl:value-of select="document('file_data.xml')/file_data/video_file/file_size"/>
</file_size>
<check_sum>
<xsl:value-of select="document('file_data.xml')/file_data/video_file/md5_checksum"/>
</check_sum>
</video_file>
<image_1>
<file_name>
<xsl:value-of select="document('file_data.xml')/file_data/image_1/file_name"/>
</file_name>
<file_size>
<xsl:value-of select="document('file_data.xml')/file_data/image_1/file_size"/>
</file_size>
<check_sum>
<xsl:value-of select="document('file_data.xml')/file_data/image_1/md5_checksum"/>
</check_sum>
</image_1>
</package>
</xsl:template>
I then use Saxon to do the transform:
xslt = "java -jar C:/SaxonHE9-7-0-7J/saxon9he.jar #{temp}core_metadata.xml #{temp}#{profile}.xsl > #{temp}#{file_name}.xml"
system("#{xslt}")
I would love to find a way to do this without having to hardcode the file_date.xml into the XSLT.

Merge XML Documents and Transform
You'll have to do a bit of work to combine the XML content prior to your XLS-Transformation. #the-Tin-Man has a nice answer to a similar question in the archives, which can be adapted for your use case.
Let's say we have the following sample content:
<!--a.xml-->
<?xml version="1.0"?>
<xml>
<packages>
<package>Data here for A</package>
<package>Another Package</package>
</packages>
</xml>
<!--a.xml-->
<!--b.xml-->
<?xml version="1.0"?>
<xml>
<packages>
<package>B something something</package>
</packages>
</xml>
<!--end b.xml-->
And we want to apply the following XLST template:
<!--transform.xslt-->
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="//packages">
<html>
<body>
<h2>Packages</h2>
<ol>
<xsl:for-each select="./package">
<li><xsl:value-of select="text()"/></li>
</xsl:for-each>
</ol>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
<!--end transform.xslt-->
If we have parallel document structure, as in this case, we can merge the two XML documents' content together and pass that along for transformation.
require 'Nokogiri'
doc1 = Nokogiri::XML(File.read('./a.xml'))
doc2 = Nokogiri::XML(File.read('./b.xml'))
moved_packages = doc2.search('package')
doc1.at('/descendant::packages[1]').add_child(moved_packages)
xslt = Nokogiri::XSLT(File.read('./transform.xslt'))
puts xslt.transform(doc1)
This would generate the following output:
<html><body>
<h2>Packages</h2>
<ol>
<li>Data here for A</li>
<li>Another Package</li>
<li>B something something</li>
</ol>
</body></html>
If your XML documents have varying structure, you may benefit from an intermediary XML nodeset that you add your content to, rather than the shortcut of merging document 2 content into document 1.

Related

XSLT apply templates select condition on node list

I have an xml with a list and wanted to apply template on that which will send only specific nodes by a condition, but it is applying on the whole list. Could someone if I am missing anything, I am relatively new to XSL.
The condition I wanted to apply is if dep is 7 and no city tag exists, I started with condition to check if dep is 7. After apply template if i print my list, it is getting all of them, Instead of dep just with value 7.In my output I expect not to have dep with value 9.
Input XML:
<employeeList>
<employee>
<dep>7</dep>
<salary>900</salary>
</employee>
<employee>
<dep>7</dep>
<city>LA</city>
<salary>500</salary>
</employee>
<employee>
<dep>9</dep>
<salary>600</salary>
</employee>
<employee>
<dep>7</dep>
<salary>800</salary>
</employee>
</employeeList>
My XSL:
<xsl:apply-templates select="employeeList[employee/dep = '7']" mode="e"/>
<xsl:template match="employeeList" mode="e">
<xsl:for-each select="employee">
<dep>
<xsl:value-of select="dep" />
</dep>
</xsl:for-each>
Output XML:
<dep>7</dep><dep>7</dep><dep>9</dep><dep>7</dep>
The condition I wanted to apply is if dep is 7 and no city tag exists
Such condition can be easily implemented using e.g.:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/employeeList">
<root>
<xsl:for-each select="employee[dep='7' and not(city)]">
<dep>7</dep>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Or shortly:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/employeeList">
<root>
<xsl:copy-of select="employee[dep='7' and not(city)]/dep"/>
</root>
</xsl:template>
</xsl:stylesheet>
But it's hard to see the point in outputting X number of <dep>7</dep> elements.
You select the employeeList based on a condition on its employee/dep, but once you have selected it, that condition no longer matters, and the <xsl:for-each select="employee"> selects all employees, regardless of their dep.
You can repeat the condition in the xsl:for-each statement:
<xsl:for-each select="employee[dep = '7']">

How to get nested nodes from XML to CSV via XSLT

I have XML like below:
<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns="http://schemas.microsoft.com/dynamics/2011/01/documents/Message">
<Header>
<MessageId>{70BF3A9B-9111-48D8-93B4-C6232E74307F}</MessageId>
<Action>http://tempuri.org/example/find</Action>
</Header>
<Body>
<MessageParts>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<pain.001.001.02>
<GrpHdr>
<MsgId>AB01029407</MsgId>
<CreDtTm>2020-05-07T11:23:08</CreDtTm>
<NbOfTxs>2</NbOfTxs>
<CtrlSum>4598</CtrlSum>
<Grpg>MIXD</Grpg>
<InitgPty>
<Nm>MY COMPANY Ltd1</Nm>
<Id>
<OrgId>
<TaxIdNb>GB 823825133</TaxIdNb>
</OrgId>
</Id>
</InitgPty>
</GrpHdr>
<PmtInf>
<PmtInfId>AB01029407</PmtInfId>
<PmtMtd>TRF</PmtMtd>
<PmtTpInf>
<SvcLvl>
<Cd>SEPA</Cd>
</SvcLvl>
</PmtTpInf>
<Dbtr>
<Nm>MY COMPANY Ltd</Nm>
<PstlAdr>
<AdrLine>Address Line 1</AdrLine>
<AdrLine>Address Line 2</AdrLine>
<Ctry>CB</Ctry>
</PstlAdr>
</Dbtr>
<DbtrAcct>
<Id>
<IBAN>98</IBAN>
</Id>
</DbtrAcct>
<DbtrAgt>
<FinInstnId>
<BIC>ABC123</BIC>
</FinInstnId>
</DbtrAgt>
<ChrgBr>SLEV</ChrgBr>
<CdtTrfTxInf>
<PmtId>
<EndToEndId>Not-Provided</EndToEndId>
</PmtId>
<Amt>
<InstdAmt Ccy="CAD">2198.00</InstdAmt>
</Amt>
<CdtrAgt>
<FinInstnId>
<BIC>SWIFT01</BIC>
</FinInstnId>
</CdtrAgt>
<Cdtr>
<Nm>Creditor Name</Nm>
<PstlAdr>
<AdrLine>tests</AdrLine>
<AdrLine>Chicago</AdrLine>
<Ctry>US</Ctry>
</PstlAdr>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>98</IBAN>
</Id>
</CdtrAcct>
<RmtInf>
<Ustrd>1345</Ustrd>
</RmtInf>
</CdtTrfTxInf>
<CdtTrfTxInf>
<PmtId>
<EndToEndId>Not-Provided</EndToEndId>
</PmtId>
<Amt>
<InstdAmt Ccy="EUR">2400.00</InstdAmt>
</Amt>
<CdtrAgt>
<FinInstnId>
<BIC>SWIFT01</BIC>
</FinInstnId>
</CdtrAgt>
<Cdtr>
<Nm>Creditor Name1</Nm>
<PstlAdr>
<AdrLine>tests</AdrLine>
<AdrLine>Chicago</AdrLine>
<Ctry>US</Ctry>
</PstlAdr>
</Cdtr>
<CdtrAcct>
<Id>
<IBAN>98</IBAN>
</Id>
</CdtrAcct>
<RmtInf>
<Ustrd>123456765</Ustrd>
</RmtInf>
</CdtTrfTxInf>
</PmtInf>
</pain.001.001.02>
</Document>
</MessageParts>
</Body>
</Envelope>
I have XSLT like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ns1="http://schemas.microsoft.com/dynamics/2011/01/documents/Message"
xmlns:ns2="urn:iso:std:iso:20022:tech:xsd:pain.001.001.02"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates
select="ns1:Envelope/ns1:Body//ns2:pain.001.001.02//ns2:GrpHdr"/>
</xsl:template>
<xsl:template match="ns2:GrpHdr">
<xsl:value-of select="ns2:CreDtTm"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ns2:NbOfTxs"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ns2:CtrlSum"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ns2:Grpg"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="ns2:InitgPty/Nm"/>
<xsl:text>
</xsl:text> <!-- Line Return -->
</xsl:template>
</xsl:stylesheet>
With this XSLT I am getting only one set..but not able to go beyond one group of elements. Output i got is:
2020-05-07T11:23:08,2,4598,MIXD,
This looks correct only. But i wanted almost all specific nodes. I could not able to get the inner nested elements from a template.
The desired output is:
2020-05-07T11:23:08,2,4598,MIXD,MY COMPANY Ltd1,GB 823825133,AB01029407,TRF,SEPA,MY COMPANY Ltd,Address Line 1,Address Line 2,CB,98,ABC123,SLEV,Not-Provided,2198.00,SWIFT01,Creditor Name,tests,Chicago,US,98,1345
2020-05-07T11:23:08,2,4598,MIXD,MY COMPANY Ltd1,GB 823825133,AB01029407,TRF,SEPA,MY COMPANY Ltd,Address Line 1,Address Line 2,CB,98,ABC123,SLEV,Not-Provided,2400.00,SWIFT01,Creditor Name1,tests,Chicago,US,98,123456765
I am newer to XSLT. Can anyone help with this ?
Thanks in advance.
Try this as your starting point:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ns1="http://schemas.microsoft.com/dynamics/2011/01/documents/Message"
xmlns:ns2="urn:iso:std:iso:20022:tech:xsd:pain.001.001.02">
<xsl:output method="text"/>
<xsl:template match="/ns1:Envelope">
<!-- data from header -->
<xsl:variable name="header" select="ns1:Body/ns1:MessageParts/ns2:Document/ns2:pain.001.001.02/ns2:GrpHdr" />
<xsl:value-of select="$header/ns2:CreDtTm"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$header/ns2:NbOfTxs"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$header/ns2:CtrlSum"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$header/ns2:Grpg"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$header/ns2:InitgPty/ns2:Nm"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$header/ns2:InitgPty/ns2:Id/ns2:OrgId/ns2:TaxIdNb"/>
<xsl:text>,</xsl:text>
<!-- data from pmt -->
<xsl:variable name="pmt" select="ns1:Body/ns1:MessageParts/ns2:Document/ns2:pain.001.001.02/ns2:PmtInf" />
<xsl:value-of select="$pmt/ns2:PmtMtd"/>
<xsl:text>,</xsl:text>
<xsl:value-of select="$pmt/ns2:Dbtr/ns2:Nm"/>
<xsl:text>,</xsl:text>
<!-- CONTINUE HERE -->
</xsl:template>
</xsl:stylesheet>
Note that this assumes there is only one record in the input XML and therefore only one row in the output CSV. Your XML is structured in a way that allows multiple nodes of the same kind at various level of the hierarchy. If you want to reflect this in your CSV, you need to decide which node will represent a record and adjust the stylesheet so that it creates a separate row for each instance of such node - see an example here: https://stackoverflow.com/a/55311500/3016153

Failing XPath on xsl:variable with XML fragment

I must be missing something very basic. I want to look up a key from a transformed XML document in XML fragment stored in xsl:variable. Here's a minimal example:
XML:
<?xml version="1.0" encoding="UTF-8"?>
<code>A</code>
XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:variable name="mappings">
<mapping key="A">Amy</mapping>
</xsl:variable>
<xsl:template match="code">
<xsl:value-of select="$mappings/mapping[#key = text()]"/>
</xsl:template>
</xsl:stylesheet>
Transforming this XML document with the XSL stylesheet produces empty result. It seems that the comparison #key = text() is wrong, because when I use <xsl:value-of select="$mappings/mapping[#key = 'A']"/>, it retrieves the expected value (i.e. "Amy"). What am I missing?
Use
<xsl:template match="code">
<xsl:value-of select="$mappings/mapping[#key = current()]"/>
</xsl:template>
With an intermediate variable, it works properly :
<xsl:template match="code">
<xsl:variable name="keyval" select="./text()" />
<out>
<xsl:value-of select="$mappings/mapping[#key = $keyval]"/>
</out>
</xsl:template>

How do I use the msxsl:node-set to get a node set that I can use in a template parameter?

TL;DR; Why can't I use the element name in the XPATH going against a msxsl:node-set? It always returns nothing, as if the node-set is empty, when debugging shows that it is not empty.
Details: I need to use a node-set in an XSLT 1.0 document because my source XML is missing an important node. Instead of having to rewrite the entire XSLT, I'd like to instead inject a node-set so that my XSLT processing can continue as normal. I would like to use XPATH on the node-set but I am not able to use the actual element names, instead only a * works, but I am not sure why, or how I can access the actual element names in the XPATH.
Here is my XML (example only, the XML document here is the least important, see XSLT):
<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="generic.xslt" ?>
<ParentNode xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" i:noNamespaceSchemaLocation="generic.xsd">
<SomeChildNode>text</SomeChildNode>
</ParentNode>
Here is my XSLT:
<?xml version="1.0" encoding="utf-16"?>
<xsl:stylesheet version="1.0" xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:a="http://schemas.datacontract.org/2004/07/MeM.BizEntities" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="xml" indent="yes" encoding="utf-16" omit-xml-declaration="no" />
<!-- Global Variables, used in multiple places -->
<xsl:variable name="empty"/>
<!-- Match Templates -->
<xsl:template match="ParentNode">
<ArrayOfSalesOrder>
<xsl:for-each select="SomeChildNode">
<xsl:call-template name="SomeChildNodeTemplate">
<xsl:with-param name="order" select="."/>
</xsl:call-template>
</xsl:for-each>
</ArrayOfSalesOrder>
</xsl:template>
<xsl:template name="SomeChildNodeTemplate">
<xsl:variable name="someRTF">
<Items>
<Item>
<Code>code</Code>
<Price>75</Price>
<Quantity>1</Quantity>
</Item>
<Item>
<Code>code2</Code>
<Price>100</Price>
<Quantity>3</Quantity>
</Item>
</Items>
</xsl:variable>
<xsl:call-template name="ItemsTemplate">
<xsl:with-param name="items" select="msxsl:node-set($someRTF)"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="ItemsTemplate">
<xsl:param name="items"/>
<ItemsTransformed>
<xsl:for-each select="$items/Item">
<NewItem>
<NewCode>
<xsl:value-of select="Code"/>
</NewCode>
</NewItem>
</xsl:for-each>
</ItemsTransformed>
<ItemsTransformedThatWorksButNotHowIWant>
<xsl:for-each select="$items/*/*">
<NewItem>
<NewCode>
<xsl:value-of select="*[1]"/>
</NewCode>
<NewPrice>
<xsl:value-of select="*[2]"/>
</NewPrice>
<NewQuantity>
<xsl:value-of select="*[3]"/>
</NewQuantity>
</NewItem>
</xsl:for-each>
</ItemsTransformedThatWorksButNotHowIWant>
</xsl:template>
</xsl:stylesheet>
I would expect to be able to use XPATH to query into the node-set such that I can use their proper element names. This doesn't seem to be the case, and I'm struggling to understand why. I know there can be namespacing issues, but trying *:Item etc. doesn't work for me. I am able to use *[local-name()='Item'] but this seems like a horrible work around, not to mention that I'll have to rewrite any downstream templates and that is what I'm trying to avoid by using the node-set in the first place.
Result:
<?xml version="1.0" encoding="utf-16"?>
<ArrayOfSalesOrder xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:a="http://schemas.datacontract.org/2004/07/MeM.BizEntities" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ItemsTransformed />
<ItemsTransformedThatWorksButNotHowIWant>
<NewItem>
<NewCode>code</NewCode>
<NewPrice>75</NewPrice>
<NewQuantity>1</NewQuantity>
</NewItem>
<NewItem>
<NewCode>code2</NewCode>
<NewPrice>100</NewPrice>
<NewQuantity>3</NewQuantity>
</NewItem>
</ItemsTransformedThatWorksButNotHowIWant>
</ArrayOfSalesOrder>
As you can see, I can get it to work with * but this is not very usable on a more complex structure. What am I doing wrong? Does this have to do with namespaces?
I would expect to see something under the <ItemsTransformed /> node, but instead it is just empty, and so far I can't get anything except the * to work.
The SO question below is what I was using, I thought I had an answer there, but I can't get the XPATH to work.
Reference:
XSLT 1.0 - Create node set and pass as a parameter
The problem here is that your stylesheet has a default namespace:
xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2"
Therefore, when you do:
<xsl:variable name="someRTF">
<Items>
<Item>
<Code>code</Code>
<Price>75</Price>
<Quantity>1</Quantity>
</Item>
<Item>
<Code>code2</Code>
<Price>100</Price>
<Quantity>3</Quantity>
</Item>
</Items>
</xsl:variable>
you are populating your variable with elements in the default namespace, so the variable actually contains:
<Items xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2">
<Item>
<Code>code</Code>
<Price>75</Price>
<Quantity>1</Quantity>
</Item>
<Item>
<Code>code2</Code>
<Price>100</Price>
<Quantity>3</Quantity>
</Item>
</Items>
Naturally, when you try later to select something like:
<xsl:for-each select="xyz:node-set($someRTF)/Items/Item">
you select nothing, because both Items and Item are in the default namespace and you're not calling them by their fully qualified name.
--- edit: ---
The problem can be easily solved by making sure that the root element of the variable - and by extension, all its descendants - are in no namespace.
Here's a simplified example (will run with any input):
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2"
xmlns:exsl="http://exslt.org/common"
exclude-result-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:variable name="someRTF">
<Items xmlns="">
<Item>
<Code>code</Code>
<Price>75</Price>
<Quantity>1</Quantity>
</Item>
<Item>
<Code>code2</Code>
<Price>100</Price>
<Quantity>3</Quantity>
</Item>
</Items>
</xsl:variable>
<xsl:template match="/">
<ArrayOfSalesOrder>
<ItemsTransformed>
<xsl:for-each select="exsl:node-set($someRTF)/Items/Item">
<NewItem>
<NewCode>
<xsl:value-of select="Code"/>
</NewCode>
</NewItem>
</xsl:for-each>
</ItemsTransformed>
</ArrayOfSalesOrder>
</xsl:template>
</xsl:stylesheet>
Result:
<?xml version="1.0" encoding="UTF-8"?>
<ArrayOfSalesOrder xmlns="http://schemas.datacontract.org/2004/07/MeM.BizEntities.Integration.DataFeedV2">
<ItemsTransformed>
<NewItem>
<NewCode>code</NewCode>
</NewItem>
<NewItem>
<NewCode>code2</NewCode>
</NewItem>
</ItemsTransformed>
</ArrayOfSalesOrder>

XPath ignore span

I have a HTML which contains some tags like below:
<div id="SNT">text1</div>
<div id="SNT">text2</div>
<div id="SNT">textbase1<span style='color: #EFFFFF'>text3</span></div>
<div id="SNT">textbase2<span style='color: #EFFFFF'>text4</span></div>
how can I get all the texts included in all <div> tags using XPath, ignoring the span fields?
i.e.:
text1
text2
textbase1text3
textbase2text4
This cannot be specified with a single XPath 1.0 expression.
You need to first select all relevant div elements:
//div[#id='SNT']
then for each selected node get its string node:
string(.)
In XPath 2.0 this can be specified with a single expression:
//div[#id='SNT]/string(.)
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="div[#id='SNT']">
<xsl:copy-of select="string()"/>
========
</xsl:template>
</xsl:stylesheet>
When this XSLT 1.0 transformation is applied on the following XML document (the provided XML fragment, wrapped into a single top element):
<t>
<div id="SNT">text1</div>
<div id="SNT">text2</div>
<div id="SNT">textbase1<span style='color: #EFFFFF'>text3</span></div>
<div id="SNT">textbase2<span style='color: #EFFFFF'>text4</span></div>
</t>
the relevant div elements are selected (matched) and processed by the only specified template, in which the string(.) XPath expression is evaluated and its result is copied to the output:
text1
========
text2
========
textbase1text3
========
textbase2text4
========
And for the XPath 2.0 expression:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:copy-of select="//div[#id='SNT']/string(.)"/>
</xsl:template>
</xsl:stylesheet>
When this XSLT 2.0 transformation is applied on the same XML document (above), the XPath 2.0 expression is evaluated and the result (four strings) is copied to the output:
text1 text2 textbase1text3 textbase2text4
You could simply use:
//div/text()
or
div/text()
Hope this helps.
Here's a link The lxml.etree Tutorial, and search Using XPath to find text
For example:
from lxml import etree
html = """
<span class='demo'>
Hi,
<span>Tom</span>
</span>
tree = etree.HTML(html)
node = tree.xpath('//span[#class="demo"]')[0]
print(node.xpath('string()')
If there is no other content in the HTML files, just those <div>s inside the usual HTML root elements, the following stylesheet will be sufficient to extract the text:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
</xsl:stylesheet>
If you only want the <div>s, and only with those particular IDs, use the following code - it also makes sure the linebreaks are like in your example:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="//div[#id='SNT']">
<xsl:copy-of select="node()|text()"/><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>

Resources