I have exhausted google on this subject and I just can't seem to get it right..
I have the following XML payload returned from Savon:
<?xml version='1.0' encoding='UTF-8'?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Body>
<ns:listGFUsersResponse xmlns:ns="http://ws.fds.com">
<ns:return>
<responseCode>0000</responseCode><responseDescription>No Errors-DWI</responseDescription><user><login>aa1283</login><name>Andrew Alonzo</name><team>DIALER</team><secLev>-1</secLev><maxDiscount>0.00</maxDiscount><phoneSystemId></phoneSystemId></user><user><login>aaronc</login><name>Aaron Callison</name><team></team><secLev>-1</secLev><maxDiscount>0.00</maxDiscount><phoneSystemId></phoneSystemId></user>
</ns:return>
</ns:listGFUsersResponse>
</soapenv:Body>
</soapenv:Envelope>
I would like to parse out ALL values of <name> * </name> and <login> * </login>
A few of my attempts here:
response1 = client1.call(
:list_gf_users,
message: message)
doc = Nokogiri::XML(response1.to_s)
pp doc
p doc.search('/name').text
p doc.search('/login').text
Nothing returned...
doc = Nokogiri::XML(response1.to_s)
value = doc.xpath('/name').map(&:text)
puts value
Nada....
doc = Nokogiri::XML(response1.to_s)
value = doc.xpath('/user[name]').map(&:text)
puts value
Zilch...
would love to be able to see:
name: Andrew Alonzo
login: aa1283
or even better a Hash?
{"aa1283" => "Andrew Alonzo"}
Getting 0 results such as:
""
[]
nil
Figured it out... probably not most efficient but gets the job done:
Convert Savon response to string(can't use scan on Savon output)
doc = response1.to_s
subFile = doc.gsub("<","<") #Replace the string convert characters
Run scan using regex capture groups:
#user = subFile.scan /<user><login>(.+?)<\/login><name>(.*?)<\/name>.+?><\/user>/
In your comments you have
doc = response1.doc
which gives you a Nokogiri document. With that you should be able to do the following:
doc.xpath("//user").each do |user|
login = user.at("login")&.text
name = user.at("name")&.text
puts "#{login}: #{name}"
end
The output is
aa1283: Andrew Alonzo
aaronc: Aaron Callison
I used the XML from your comment:
<root>
<responseCode>0000</responseCode>
<responseDescription>No Errors-DWI</responseDescription>
<user>
<login>aa1283</login>
<name>Andrew Alonzo</name>
<team>DIALER</team>
<secLev>-1</secLev>
<maxDiscount>0.00</maxDiscount>
<phoneSystemId></phoneSystemId>
</user>
<user>
<login>aaronc</login>
<name>Aaron Callison</name>
<team></team>
<secLev>-1</secLev>
<maxDiscount>0.00</maxDiscount>
<phoneSystemId></phoneSystemId>
</user>
</root>
Note that I had to convert this to plaintext. You have some non-printing unicode characters sprinkled throughout the document in seemingly random places (which makes me wonder if that's actually the cause of your problems).
Related
I need to parse and print ns4:feature part. Karate prints it in json format. I tried referring to this answer. But, i get 'ERROR: 'Namespace for prefix 'xsi' has not been declared.' error, if used suggested xPath. i.e.,
* def list = $Test1/Envelope/Body/getPlan/planSummary/feature[1]
This is my XML: It contains lot many parts with different 'ns' values, but i have given here an extraxt.
<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">
<S:Header/>
<S:Body>
<ns9:getPlan xmlns:ns10="http://xmlschema.test.com/xsd_v8" xmlns:ns9="http://xmlschema.test.com/srv/SMO_v4" xmlns:ns8="http://xmlschema.test.com/xsd/Customer_v2" xmlns:ns7="http://xmlschema.test.com/xsd/Customer/Customer_v4" xmlns:ns6="http://schemas.test.com/eca/common_types_2_1" xmlns:ns5="http://xmlschema.test.com/xsd/Customer/BaseTypes_1_0" xmlns:ns4="http://xmlschema.test.com/xsd_v4" xmlns:ns3="http://xmlschema.test.com/xsd/Enterprise/BaseTypes/types/ping_v1" xmlns:ns2="http://xmlschema.test.com/xsd/common/exceptions/Exceptions_v1_0">
<ns9:planSummary xsi:type="ns4:Plan" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ns5:code>XPBSMWAT</ns5:code>
<ns5:description>Test Plan</ns5:description>
<ns4:category xsi:nil="true"/>
<ns4:effectiveDate>2009-11-05</ns4:effectiveDate>
<ns4:sharingGroupList>
<ns4:sharingCode>CAD_DATA</ns4:sharingCode>
<ns4:contributingInd>true</ns4:contributingInd>
</ns4:sharingGroupList>
<ns4:feature>
<ns5:code>ABC</ns5:code>
<ns5:description>Service</ns5:description>
<ns5:descriptionFrench>Service</ns5:descriptionFrench>
<ns4:poolGroupId xsi:nil="true"/>
<ns4:switchCode/>
<ns4:type/>
<ns4:dtInd>false</ns4:dtInd>
<ns4:usageCharge>0.0</ns4:usageCharge>
<ns4:connectInd>false</ns4:connectInd>
</ns4:feature>
</ns9:planSummary>
</ns9:getPlan>
</S:Body>
</S:Envelope>
This is the xPath i used;
Note: I saved above xml in a separate file test1.xml. I am just reading it and parsing the value.
* def Test1 = read('classpath:PP1/data/test1.xml')
* def list = $Test1/Envelope/Body/*[local-name()='getPlan']/*[local-name()='planSummary']/*[local-name()='feature']/*
* print list
This is the response i am getting;
16:20:10.729 [ForkJoinPool-1-worker-1] INFO com.intuit.karate - [print] [
"ABC",
"Service",
"Service",
"",
"",
"",
"false",
"0.0",
"false"
]
How can i get the same in XML?
This is interesting, I haven't seen this before. The problem was you have an attribute with a namespace xsi:nil="true" which is causing problems when you take a sub-set of the XML but the namespace is not defined anymore. If you remove it first, things will work.
Try this:
* remove Test1 //poolGroupId/#nil
* def temp = $Test1/Envelope/Body/getPlan/planSummary/feature
Another approach you could have tried is to do a string replace to remove troublesome stuff in the XML before doing XPath.
EDIT: added info on how to do a string replace using Java. The below will strip out the entire xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="ns4:Plan" part.
* string temp = Test1
* string temp = temp.replaceAll("xmlns:xsi[^>]*", "")
* print temp
So you get the idea. Just use regex.
Also see: https://stackoverflow.com/a/50372295/143475
I try to parse savon's response as nokokiri document
c = Savon.client(wsdl: 'http://test.fedresurs.ru/MessageService/WebService.svc?wsdl', digest_auth: ['demowebuser', 'Ax!761BN'], namespace: "http://tempuri.org/", namespace_identifier: :tem, log: true)
r = c.call(:get_trade_messages, message: {'tem:startFrom' => DateTime.now-1})
r.doc.search("TradePlace")
and it returns an empty array.
What I'm doing wrong? May be I should deal somehow with namespaces? But, how?. Examples, that I found in nokogiri documentation use Xpath, not search. And even with Xpath it returns an empty array.
XML-response:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<GetTradeMessagesResponse xmlns="http://tempuri.org/">
<GetTradeMessagesResult>
<TradePlace INN="7606055642" Name="Первая электронная площадка " Site="1torgi.ru " OwnerName="ООО "Промтех"">
<TradeList>
<Trade ID_External="ЗКОФЦП-17136" ID_EFRSB="653476">
<MessageList>
<TradeMessage ID="4851134"/>
<TradeMessage ID="4851135"/>
</MessageList>
</Trade>
</TradeList>
</TradePlace>
</GetTradeMessagesResult>
</GetTradeMessagesResponse>
</s:Body>
</s:Envelope>
You can use Nokogiri to break apart the XML response.
A (now nofunctional) example is this:
doc = Nokogiri::XML(response.to_hash[:get_quote_response][:get_quote_result])
print doc.to_xml(indent: 2)
print "Date : ", doc.at_css("Date").text, "\n"
print "Last price: ", doc.at_css("Last").text
are more complete example in my pastebin https://pastebin.com/W0RUuaHU. The WebserviceX was unfortunately discontinued.
As I expected the answer was in namespace, code below works fine:
r.doc.search("a|TradePlace", {"a" => "http://tempuri.org/"})
I have an XML, that as I understand it has already been parsed by tags. My goal is to parse all the information that is in the <GetResidentsContactInfoResult> tag. In this tag of the sample xml below there are two records in here which begin each with the Lease PropertyId key. How can I iterate over the <GetResidentsContactInfoResult> tag and print out the key/value pairs for each record? I'm new to Ruby and working with XML files, is this something I can do with Nokogiri?
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soap:Body>
<GetResidentsContactInfoResponse xmlns="http://tempuri.org/">
<GetResidentsContactInfoResult><PropertyResidents><Lease PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-01-07T00:00:00" MoveOutDate="2016-02-06T00:00:00" LeaseBeginDate="2016-01-07T00:00:00" LeaseEndDate="2017-01-31T00:00:00" MktgSource="DBY" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0903" ResiID="3" OccuSeqNo="3444755" OccuFirstName="Efren" OccuLastName="Cerda" Phone2No="(832) 693-9448" ResponsibleFlag="Responsible" /></Lease><Lease PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" occustatuscode="P" occustatuscodedescription="Previous" MoveInDate="2016-02-20T00:00:00" MoveOutDate="2016-04-25T00:00:00" LeaseBeginDate="2016-02-20T00:00:00" LeaseEndDate="2017-02-28T00:00:00" MktgSource="PW" PrimaryEmail="noemail1#fake.com"><Occupant PropertyId="21M" BldgID="00" UnitID="0908" ResiID="2" OccuSeqNo="3451301" OccuFirstName="Donna" OccuLastName="Mclean" Phone2No="(713) 785-4240" ResponsibleFlag="Responsible" /></Lease></PropertyResidents></GetResidentsContactInfoResult>
</GetResidentsContactInfoResponse>
</soap:Body>
</soap:Envelope>
This uses Nokogiri to find all the GetResidentsContactInfoResponse elements, and then Active Support to convert the inner text to a hash of key-value pairs.
Read "sparklemotion/nokogiri" and "Tutorials" regarding installing and using Nokogiri.
Read "Active Support Core Extensions" about more capabilities of Active Support (though the guide does not include Hash.from_xml). To install it simply do gem install activesupport.
I assume you're fine with Nokogiri as you mentioned it in your question.
If you don't want to use Active Support, consider looking into "Convert a Nokogiri document to a Ruby Hash" as an alternative to the line Hash.from_xml(elm.text):
# Needed in order to use the `Hash.from_xml`
require 'active_support/core_ext/hash/conversions'
def find_key_values(str)
doc = Nokogiri::XML(str)
# Ignore namespaces for easier traversal
doc.remove_namespaces!
doc.css('GetResidentsContactInfoResponse').map do |elm|
Hash.from_xml(elm.text)
end
end
Usage:
# Option 1: if your XML above is stored in a variable called `string`
find_key_values string
# Option 2: if your XML above is stored in a file
find_key_values File.open('/path/to/file')
Which returns:
[{"PropertyResidents"=>
{"Lease"=>
[{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-01-07T00:00:00",
"MoveOutDate"=>"2016-02-06T00:00:00",
"LeaseBeginDate"=>"2016-01-07T00:00:00",
"LeaseEndDate"=>"2017-01-31T00:00:00",
"MktgSource"=>"DBY",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0903",
"ResiID"=>"3",
"OccuSeqNo"=>"3444755",
"OccuFirstName"=>"Efren",
"OccuLastName"=>"Cerda",
"Phone2No"=>"(832) 693-9448",
"ResponsibleFlag"=>"Responsible"}},
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"occustatuscode"=>"P",
"occustatuscodedescription"=>"Previous",
"MoveInDate"=>"2016-02-20T00:00:00",
"MoveOutDate"=>"2016-04-25T00:00:00",
"LeaseBeginDate"=>"2016-02-20T00:00:00",
"LeaseEndDate"=>"2017-02-28T00:00:00",
"MktgSource"=>"PW",
"PrimaryEmail"=>"noemail1#fake.com",
"Occupant"=>
{"PropertyId"=>"21M",
"BldgID"=>"00",
"UnitID"=>"0908",
"ResiID"=>"2",
"OccuSeqNo"=>"3451301",
"OccuFirstName"=>"Donna",
"OccuLastName"=>"Mclean",
"Phone2No"=>"(713) 785-4240",
"ResponsibleFlag"=>"Responsible"}}]}}]
I am extracting a value from XML and using that value to check if it exists in a PDF file:
XML I have is
<RealTimeLetter>
<Customer>
<RTLtr_Acct>0163426</RTLtr_Acct>
<RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>
<RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>
<RTLtr_CustomerAddress2 />
<RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>
<RTLtr_CustomerState>GA</RTLtr_CustomerState>
<RTLtr_CustomerZip>30188</RTLtr_CustomerZip>
<RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>
<RTLtr_Addressee>0</RTLtr_Addressee>
</Customer>
</RealTimeLetter>
The PDF file has the Customer Name and address
LSIH JHTWVZ
887 YPCLY THYZO SU
WOODSTOCK, GA 30188
I am using PDF Reader and Nokogiri gems to read the text from PDF, extract the Customer name from XML and perform a check if the PDF includes the Customer name in it.
PDF reader is parsed as
require 'pdf_reader'
def parse_pdf
PDF::Reader.new(#filename)
end
#reader = file('C:\Users\ecz560\Desktop\30004_Standard.pdf').parse_pdf
require 'nokogiri'
#xml = Nokogiri::XML(File.open('C:\Users\ecz560\Desktop\30004Standard.xml'))
#CustName = #xml.xpath("//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName").map(&:text).to_s
page_index = 0
#reader.pages.each do |page|
page_index = page_index+1
if expect(page.text).to include #CustName
valid_text = "Given text is present in -- #{page_index}"
puts valid_text
end
end
But I am getting a error:
RSpec::Expectations::ExpectationNotMetError: expected "LSIH JHTWVZ\n 887 YPCLY THYZO SU\n WOODSTOCK, GA 30188\n Page 1 of 1" to include "[\"LSIH JHTWVZ\"]"
Diff:
## -1,2 +1,80 ##
-["LSIH JHTWVZ"]
+ LSIH JHTWVZ
+ 887 YPCLY THYZO SU
+ WOODSTOCK, GA 30188
./features/step_definitions/Letters/Test1_Letters.rb:372:in `block (2 levels) in <top (required)>'
./features/step_definitions/Letters/Test1_Letters.rb:370:in `each'
./features/step_definitions/Letters/Test1_Letters.rb:370:in `/^I validate the PDF content$/'
C:\Users\ecz560\Documents\GitHub\ATDD Local\features\FeatureFiles\Letters\Test1_Letters.feature:72:in `Then I validate the PDF content'
In understanding the issue is with the way I am comparing the #Custname.
How do I resolve this?
One thing I see is that your XPath selector isn't working.
//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName
Testing it:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<RealTimeLetter>
<Customer>
<RTLtr_Acct>0163426</RTLtr_Acct>
<RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>
<RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>
<RTLtr_CustomerAddress2 />
<RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>
<RTLtr_CustomerState>GA</RTLtr_CustomerState>
<RTLtr_CustomerZip>30188</RTLtr_CustomerZip>
<RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>
<RTLtr_Addressee>0</RTLtr_Addressee>
</Customer>
</RealTimeLetter>
EOT
doc.search("//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName").to_xml # => ""
Using a bit of a modification finds the <Customer> node:
doc.search('//Customer/RTLtr_Acct/text()[contains(., "0163426")]/../..').to_xml
# => "<Customer>\n <RTLtr_Acct>0163426</RTLtr_Acct>\n <RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>\n <RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>\n <RTLtr_CustomerAddress2/>\n <RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>\n <RTLtr_CustomerState>GA</RTLtr_CustomerState>\n <RTLtr_CustomerZip>30188</RTLtr_CustomerZip>\n <RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>\n <RTLtr_Addressee>0</RTLtr_Addressee>\n</Customer>"
At that point it's easy to grab content from elements inside <Customer>:
customer = doc.search('//Customer/RTLtr_Acct/text()[contains(., "0163426")]/../..')
customer.at('RTLtr_Acct').text # => "0163426"
customer.at('RTLtr_CustomerAddress1').text # => "887 YPCLY THYZO SU"
if expect(page.text).to include #CustName
expect is not used this way.
an expectation is used in testing, to verify that your code is working the way it should. It should not be used during normal code.
an expectation throws an exception and halts all code if it fails. It doesn't return true/false - you can't continue if it fails - it will throw the exception (correctly) as it did in your code, and then all your code will stop and won't start again.
What you probably want to do is just check the value like this:
if page.text.includes?(#CustName)
(Note: I have not bug-tested that... you will probably have to google for the correct way of writing it and write something similar that actually works.)
I am trying to down my last 3200 tweets in groups of 200(in multiple pages) using restclient gem.
In the process, I end up adding the following lines multiple times to my file:
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
To get this right(as the XML parsing goes for a toss), after downloading the file, I want to replace all occurrences of the above string except the first.
I am trying the following:
tweets_page = RestClient.get("#{GET_STATUSES_URL}&page=#{page_number}")
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
unless page_number == 1
tweets_page.gsub!(message,"")
end
What is wrong in the above? Is there a better way to do the same?
I believe it would be faster to download the whole bunch at once and split the body of your response by message and add it for the first entry.
Something like this, can't try it out so consider this just as an idea.
tweets_page = RestClient.get("#{GET_STATUSES_URL}").body
tweets = tweets_page.split(message)
tweets_page = tweets[0]+message+tweets[1..-1]
You could easily break them up in groups of 200 like that also
If you want to do it with a gsub on the whole text you could use the following
tweets_page = <<-MSG
first
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
second
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
rest
MSG
message = <<-MSG
</statuses>
<?xml version="1.0" encoding="UTF-8"?>
<statuses type="array">
MSG
new_str = tweets_page.gsub message do |match|
if defined? #first
""
else
#first = true
message
end
end
p new_str
gives
type=\"array\">\nrest\n"
"first\n</statuses>\n<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<statuses type=\"array\">\nsecond\nrest\n"