I have questions about libxml-ruby.
There is a xml file "sample.xml".
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://***" xmlns:r="http://???">
<sheetData>
<row><v>1</v></row>
</sheetData>
</worksheet>
I want to deal with nodes without specifying default namespace like below.
xml = XML::Document.file('sample.xml')
sheet_data = xml.find_first('sheetData')
Of course, I can do it like below.
NS = {
main: 'http://***',
r: 'http://???',
}
sheet_data = xml.find_first('main:sheetData', NS)
But I want to omit string of default namespace.
I tried some properties and methods belongs to XML::Namespace[s], but not effected.
And one more problem when I save a xml file.
ns = XML::Namespace.new(xml.root, 'main', 'http://***')
row = XML::Node.new('row', nil, ns)
sheet_data << row
xml.save("sample.xml")
Published like below.
<row><v>1</v></row>
<main:row/>
I want that it's omitted string of "main:".
So I do this, but it's really ugly.
open('sample.xml', 'wb') do |f|
f.write(xml.to_s.gsub(/(<\/?)main:/, '\1'))
end
Do you have any good idea?
Related
I need to parse and print ns4:feature part. Karate prints it in json format. I tried referring to this answer. But, i get 'ERROR: 'Namespace for prefix 'xsi' has not been declared.' error, if used suggested xPath. i.e.,
* def list = $Test1/Envelope/Body/getPlan/planSummary/feature[1]
This is my XML: It contains lot many parts with different 'ns' values, but i have given here an extraxt.
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Header/>
<S:Body>
<ns9:getPlan xmlns:ns10="http://xmlschema.test.com/xsd_v8" xmlns:ns9="http://xmlschema.test.com/srv/SMO_v4" xmlns:ns8="http://xmlschema.test.com/xsd/Customer_v2" xmlns:ns7="http://xmlschema.test.com/xsd/Customer/Customer_v4" xmlns:ns6="http://schemas.test.com/eca/common_types_2_1" xmlns:ns5="http://xmlschema.test.com/xsd/Customer/BaseTypes_1_0" xmlns:ns4="http://xmlschema.test.com/xsd_v4" xmlns:ns3="http://xmlschema.test.com/xsd/Enterprise/BaseTypes/types/ping_v1" xmlns:ns2="http://xmlschema.test.com/xsd/common/exceptions/Exceptions_v1_0">
<ns9:planSummary xsi:type="ns4:Plan" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns5:code>XPBSMWAT</ns5:code>
<ns5:description>Test Plan</ns5:description>
<ns4:category xsi:nil="true"/>
<ns4:effectiveDate>2009-11-05</ns4:effectiveDate>
<ns4:sharingGroupList>
<ns4:sharingCode>CAD_DATA</ns4:sharingCode>
<ns4:contributingInd>true</ns4:contributingInd>
</ns4:sharingGroupList>
<ns4:feature>
<ns5:code>ABC</ns5:code>
<ns5:description>Service</ns5:description>
<ns5:descriptionFrench>Service</ns5:descriptionFrench>
<ns4:poolGroupId xsi:nil="true"/>
<ns4:switchCode/>
<ns4:type/>
<ns4:dtInd>false</ns4:dtInd>
<ns4:usageCharge>0.0</ns4:usageCharge>
<ns4:connectInd>false</ns4:connectInd>
</ns4:feature>
</ns9:planSummary>
</ns9:getPlan>
</S:Body>
</S:Envelope>
This is the xPath i used;
Note: I saved above xml in a separate file test1.xml. I am just reading it and parsing the value.
* def Test1 = read('classpath:PP1/data/test1.xml')
* def list = $Test1/Envelope/Body/*[local-name()='getPlan']/*[local-name()='planSummary']/*[local-name()='feature']/*
* print list
This is the response i am getting;
16:20:10.729 [ForkJoinPool-1-worker-1] INFO com.intuit.karate - [print] [
"ABC",
"Service",
"Service",
"",
"",
"",
"false",
"0.0",
"false"
]
How can i get the same in XML?
This is interesting, I haven't seen this before. The problem was you have an attribute with a namespace xsi:nil="true" which is causing problems when you take a sub-set of the XML but the namespace is not defined anymore. If you remove it first, things will work.
Try this:
* remove Test1 //poolGroupId/#nil
* def temp = $Test1/Envelope/Body/getPlan/planSummary/feature
Another approach you could have tried is to do a string replace to remove troublesome stuff in the XML before doing XPath.
EDIT: added info on how to do a string replace using Java. The below will strip out the entire xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns4:Plan" part.
* string temp = Test1
* string temp = temp.replaceAll("xmlns:xsi[^>]*", "")
* print temp
So you get the idea. Just use regex.
Also see: https://stackoverflow.com/a/50372295/143475
I'm trying to read/write an XML file, using Boost functions read_xml and write_xml.
The XML file original encoding is "windows-1252", but after the read/write operations, the encoding became "utf-8".
This is the XML original file:
<?xml version="1.0" encoding="windows-1252" standalone="no" ?>
<lot>
<name>Lot1</name>
<lot_id>123</lot_id>
<descr></descr>
<job>
<name>TEST</name>
<num_items>2</num_items>
<item>
<label>Item1</label>
<descr>Item First Test</descr>
</item>
<item>
<label>Item2</label>
<descr>Item Second Test</descr>
</item>
</job>
</lot>
And this is the output one:
<?xml version="1.0" encoding="utf-8"?>
<lot>
<name>Lot1</name>
<lot_id>123</lot_id>
<descr></descr>
<job>
<name>TEST</name>
<num_items>2</num_items>
<item>
<label>Item1</label>
<descr>Item First Test</descr>
</item>
<item>
<label>Item2</label>
<descr>Item Second Test</descr>
</item>
</job>
</lot>
This is my C++ code (just a test code):
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
using boost::property_tree::ptree;
ptree xmlTree;
read_xml(FILE_XML, xmlTree);
for (auto it = xmlTreeChild.begin(); it != xmlTreeChild.end();)
{
std::string strItem = it->first.data();
if (strcmp(strItem.c_str(), "item") == 0)
{
std::string strLabel = it->second.get_child("label").data();
if (strcmp(strLabel.c_str(), "item3") != 0)
{
it = xmlTreeChild.erase(it);
}
}
++it;
}
auto settings = boost::property_tree::xml_writer_make_settings<std::string>('\t', 1);
write_xml(FILE_XML, xmlTree, std::locale(), settings);
I need to read and re-write the file using the same encoding from the original file.
I've tried also to change the Locale settings, using:
std::locale newlocale1("English_USA.1252");
read_xml(FILE_XML, xmlTree, 0, newlocale1);
...
auto settings = boost::property_tree::xml_writer_make_settings<std::string>('\t', 1);
write_xml(FILE_XML, xmlTree, newlocale1, settings);
but I've got the same result.
How can I be able to read and write, using the original file encoding, with Boost functions?
Thank you
You can pass an encoding via the writer settings:
auto settings = boost::property_tree::xml_writer_make_settings<std::string>(
'\t', 1, "windows-1252");
Of course, make sure key/values are in fact latin1/cp1252 compatible (this makes sense as long as you read all the information from the source file; however you have to take care when e.g. assigning user input to a property tree node; you might need to convert from the input encoding to cp1252 first).
Live On Coliru
To fix the problem you experience you have to replace this line:
read_xml(FILE_XML, xmlTree);
with
read_xml(FILE_XML,
xmlTree,
boost::property_tree::xml_parser::trim_whitespace);
as far as I know your issue cannot be fixed only by modifying the settings of the write_xml function.
I tried it and worked: when I compare the files ignoring the whitespaces, the input and output xml files are identical.
You can also write to a string stream as following:
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>
boost::property_tree::ptree pt;
std::ostringstream oss;
write_xml(
oss, pt,
boost::property_tree::xml_writer_make_settings<char>(
'\t', 0, "ASCII"));
Given the following XML:
<?xml version="1.0" encoding="UTF-8" ?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Header>
<WFContext xmlns="http://service.wellsfargo.com/entity/message/2003/" soapenv:actor="" soapenv:mustUnderstand="0">
<messageId>cci-sf-dev14.wellsfargo.com:425a9286:14998ac6245:-7e1e</messageId>
<sessionId>425a9286:14998ac6245:-7e1d</sessionId>
<sessionSequenceNumber>1</sessionSequenceNumber>
<creationTimestamp>2014-11-10T00:14:49.243-08:00</creationTimestamp>
<invokerId>cci-sf-dev14.wellsfargo.com</invokerId>
<activitySourceId>P7</activitySourceId>
<activitySourceIdType>FNC</activitySourceIdType>
<hostName>cci-sf-dev14.wellsfargo.com</hostName>
<billingAU>05426</billingAU>
<originatorId>287586861901211</originatorId>
<originatorIdType>ECN</originatorIdType>
<initiatorId>GTST0793</initiatorId>
<initiatorIdType>ACF2</initiatorIdType>
</WFContext>
</soapenv:Header>
<soapenv:Body>
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
<initiatorInformation xmlns="http://service.wellsfargo.com/provider/ecpr/shared/common/2011/11/">
<channelInfo>
<initiatorCompanyNbr xmlns="http://service.wellsfargo.com/entity/message/2003/">114</initiatorCompanyNbr>
</channelInfo>
</initiatorInformation>
<custNbr xmlns="http://service.wellsfargo.com/entity/party/2003/">287586861901211</custNbr>
<customerViewList xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformationCommon/2012/05/">
<customerView>
<customerViewType>GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<generalInformationPreferences201205 xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<formattedNameIndicator xmlns="">true</formattedNameIndicator>
<includeTaxCertificationIndicator xmlns="">true</includeTaxCertificationIndicator>
</generalInformationPreferences201205>
</preferences>
</customerView>
<customerView>
<customerViewType>SEGMENT_LIST</customerViewType>
</customerView>
<customerView>
<customerViewType>LIMITED_PROFILE_REQUIRED_DATA</customerViewType>
</customerView>
<customerView>
<customerViewType>INDIVIDUAL_CUSTOMER_GENERAL_INFORMATION_201205</customerViewType>
<preferences>
<individualGeneralInformationPreferences xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/common/2012/05/">
<includeMinorIndicator xmlns="">true</includeMinorIndicator>
</individualGeneralInformationPreferences>
</preferences>
</customerView>
</customerViewList>
</getCustomerInformation>
</soapenv:Body>
</soapenv:Envelope>
I am trying to access the getCustomerInformation tag using relative XPath in VBScript.
XMLDataFile = "C:\testReqfile.xml"
Set xmlDoc = XMLUtil.CreateXML()
xmlDoc.LoadFile(XMLDataFile)
Print xmlDoc.ToString
'xmlDoc.AddNamespace "ns","xmlns:soapenv=http://schemas.xmlsoap.org/soap/envelope/"
Set childrenObj = xmlDoc.ChildElementsByPath("//*[contains(#xmlns,'getCustomerInformation')]")
msgbox childrenObj.Count
But is failing to return a node.
Your XPath expression does not work because xmlns as in
<getCustomerInformation xmlns="http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/">
is a default namespace, not an attribute. Therefore, it cannot be accessed with #xmlns.
But it seems you do not have to rely on the namespace at all, because the element name ("getCustomer Information") is telling already. To bypass the problem of those elements being in a namespace, use local-name() to select elements by their name.
Set childrenObj = xmlDoc.ChildElementsByPath("//*[local-name() = 'getCustomerInformation']")
As #Mathias Müller already explained in his answer, xmlns defines a namespace and can thus not be accessed like a regular attribute. I don't have experience with XmlUtil, but in standard VBScript you could select the node(s) like this:
Set xml = CreateObject("Msxml2.DOMDocument.6.0")
xml.async = False
xml.load "C:\path\to\your.xml"
If xml.ParseError Then
WScript.Echo xml.ParseError.Reason
WScript.Quit 1
End If
'define a namespace alias "ns"
uri = "http://service.wellsfargo.com/provider/ecpr/customerProfile/inquiry/getCustomerInformation/2012/05/"
xml.setProperty "SelectionNamespaces", "xmlns:ns='" & uri & "'"
'select nodes using the namespace alias
Set nodes = xml.SelectNodes("//ns:getCustomerInformation")
I'm parsing a pptx file and ran into an issue. This is a sample of the source XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<p:presentation xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:sldMasterIdLst>
<p:sldMasterId id="2147483648" r:id="rId2"/>
</p:sldMasterIdLst>
<p:sldIdLst>
<p:sldId id="256" r:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="10080625" cy="7559675"/>
<p:notesSz cx="7772400" cy="10058400"/>
</p:presentation>
I need to to get the r:id attribute value in the sldMasterId tag.
doc = Nokogiri::XML(path_to_pptx)
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attr('id').value
returns 2147483648 but I need rId2, which is the r:id attribute value.
I found the attribute_with_ns(name, namespace) method, but
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').attribute_with_ns('id', 'r')
returns nil.
You can reference the namespace of attributes in your xpath the same way you reference element namespaces:
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId/#r:id')
If you want to use attribute_with_ns, you need to use the actual namespace, not just the prefix:
doc.at_xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId')
.attribute_with_ns('id', "http://schemas.openxmlformats.org/officeDocument/2006/relationships")
http://nokogiri.org/Nokogiri/XML/Node.html#method-i-attributes
If you need to distinguish attributes with the same name, with different namespaces use attribute_nodes instead.
doc.xpath('p:presentation/p:sldMasterIdLst/p:sldMasterId').each do |element|
element.attribute_nodes().select do |node|
puts node if node.namespace && node.namespace.prefix == "r"
end
end
I am trying to down my last 3200 tweets in groups of 200(in multiple pages) using restclient gem.
In the process, I end up adding the following lines multiple times to my file:
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
To get this right(as the XML parsing goes for a toss), after downloading the file, I want to replace all occurrences of the above string except the first.
I am trying the following:
tweets_page = RestClient.get("#{GET_STATUSES_URL}&page=#{page_number}")
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
unless page_number == 1
tweets_page.gsub!(message,"")
end
What is wrong in the above? Is there a better way to do the same?
I believe it would be faster to download the whole bunch at once and split the body of your response by message and add it for the first entry.
Something like this, can't try it out so consider this just as an idea.
tweets_page = RestClient.get("#{GET_STATUSES_URL}").body
tweets = tweets_page.split(message)
tweets_page = tweets[0]+message+tweets[1..-1]
You could easily break them up in groups of 200 like that also
If you want to do it with a gsub on the whole text you could use the following
tweets_page = <<-MSG
first
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
second
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
rest
MSG
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
new_str = tweets_page.gsub message do |match|
if defined? #first
""
else
#first = true
message
end
end
p new_str
gives
type=\"array\">\nrest\n"
"first\n</statuses>\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<statuses type=\"array\">\nsecond\nrest\n"