REXML parsing an XML in ruby - ruby

Folks,
I am using REXML for a sample XML file:
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = "https://frenchcu.com/"/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
</exporttest>
</Account>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = "https://britishcu.com/"/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
</exporttest>
</Account>
</Accounts>
I am reading the XML like this:
#data = (REXML::Document.new file).root
#dataarr = ##testdata.elements.to_a("//Account")
Now I want to get the username of the frenchcustomer, so I tried this:
#dataarr[#name=fenchcustomer].elements["username"].attributes["name"]
this fails, I do not want to use the array index, for example
#dataarr[1].elements["username"].attributes["name"]
will work, but I don't want to do that, is there something that i m missing here. I want to use the array and get the username of the french user using the Account name.
Thanks a lot.

I recommend you to use XPath.
For the first match, you can use first method, for an array, just use match.
The code above returns the username for the Account "frenchcustomer" :
REXML::XPath.first(yourREXMLDocument, "//Account[#name='frenchcustomer']/username/#name").value
If you really want to use the array created with ##testdata.elements.to_a("//Account"), you could use find method :
french_cust_elt = the_array.find { |elt| elt.attributes['name'].eql?('frenchcustomer') }
french_username = french_cust_elt.elements["username"].attributes["name"]

puts #data.elements["//Account[#name='frenchcustomer']"]
.elements["username"]
.attributes["name"]
If you want to iterate over multiple identical names:
#data.elements.each("//Account[#name='frenchcustomer']") do |fc|
puts fc.elements["username"].attributes["name"]
end

I don't know what your ##testdata are, I tried with the following testcode:
require "rexml/document"
#data = (REXML::Document.new DATA).root
#dataarr = #data.elements.to_a("//Account")
# Works
p #dataarr[1].elements["username"].attributes["name"]
#Works not
#~ p #dataarr[#name='fenchcustomer'].elements["username"].attributes["name"]
##dataarr is an array
#dataarr.each{|acc|
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
}
##dataarr is an array
puts "===Array#each"
#dataarr.each{|acc|
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
}
puts "===XPATH"
#data.elements.to_a("//Account[#name='frenchcustomer']").each{|acc|
p acc.elements["username"].attributes["name"]
}
__END__
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = "https://frenchcu.com/"/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
</exporttest>
</Account>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = "https://britishcu.com/"/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
</exporttest>
</Account>
</Accounts>
I'm not very familiar with rexml, so I expect there is a better solution. But perhaps aomebody can take my code to build a better solution.

Related

For loop while using scrapy

I am trying to crawl a website. I want to do it for different dates. So i am storing date in a list. But while trying to access items of list, crawler works only for 1st value in list. Please help. following is my code:
class SpidyQuotesViewStateSpider(scrapy.Spider):
name = 'retail_price'
def start_requests(self):
print "start request"
urls = "http://fcainfoweb.nic.in/PMSver2/Reports/Report_Menu_web.aspx"
yield scrapy.Request(url=urls, callback=self.parse)
def parse(self, response):
dated = ["05/03/2017","04/03/2017"]
urls = "http://fcainfoweb.nic.in/PMSver2/Reports/Report_Menu_web.aspx"
#frmdata =
cookies1 ={}
val = response.headers.getlist('Set-Cookie')
print "login session values" ,response.headers.getlist('Set-Cookie')
if(len(val) != 0):
cookies1['ASP.NET_SessionId'] = str(response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1])
cookies1['path'] = str(response.headers.getlist('Set-Cookie')[0].split(";")[1].split("=")[1])
print cookies1;
for i in range(len(dated)):
yield scrapy.FormRequest(url=urls, callback=self.parse1, formdata={'ctl00$MainContent$btn_getdata1':"Get Data",
'ctl00$MainContent$Txt_FrmDate':dated[i],
'ctl00$MainContent$Ddl_Rpt_Option0':"Daily Prices",
'ctl00$MainContent$Rbl_Rpt_type':"Price report",
'ctl00$MainContent$ddl_Language':"English",
'ctl00$MainContent$Ddl_Rpt_type':"Retail",
'__EVENTVALIDATION':"IwZyKgfTXVzxiHxiPXGk/W8XQZBDb0EOPxJh6s8hofq0ffqOpiHSH77CafcxySF3PbkYgSMNFCJhLM2cGnL6SxT0PJuGDCJtV0V8Y4a94UErUCiSANiin+4uKckk9v9Ux8JqTVeaipppmlH+wyks2U9SgPfkNUsqw4eHCkDyB5akNNZImRIixOHHVY3JSXGkwXn7ueK9w+AgnqJzpXaWdMr9J1++M4VAFImSNF8brFSfPHe5kb/qzkGIwUr/KRouaRYK8WLWZh/Mbl9xwREwhDSxWJSOdihSE0WWoaqSMtpaR99rDDCsD3mdJqfu0aPIlREupTZRzlrmztXU0eS3949YW+ywdTRvykaMNgOW2Q4saYP5j/niKbRW6GiDnaLV2A38X/HW80+trrsjwJr9tjTKVFyikf6s/3gzyiTp11ivSkwIY2b3hutjYn7OfTDo",
#'__EVENTVALIDATION':"HqVo2xHk04clYwnBposXbZGhbIr181A7RbyeZv74Cia7rXSKmpOpbeSnn3XXnoDJKRxMK0W9nxKZFfkNje+P/K7gE5HVjHJr9Gr0Gs46TntzKDsvzyii8jZ7e0fdZgQCJKoXxQNgR2vNkWqChKcEldBuMHCOgJRqCNCF/JPFKpdKZoIWr7GU8rhzwLijf/Gkm+FuTULs/fl2HHK6Z1QQEozzEHFsDwzl0G4IiN//eNYfHuUBXKZ3wdZzPqG0s53WHEuSBzhqBC9AtCJOs4ZZhdtwFh8iyTJ4PlsLP9DLHYHRCOAd72UO0UH8gT7gAkKVo1I4L540DilowOR9SttH7MM/oOs9qhKlnG61FgqkYGW8zGzF/yNEXO+beVAK1RVvuO+FDnuq/g36TRnUieei5GpAZ+96CSoCIxykdvHx8R+smTNF/5erlowV4ci+tcI7",
'__VIEWSTATEENCRYPTED':"",
'__VIEWSTATEGENERATOR':"85862B00",
'__VIEWSTATE':"+a+3jrBEKxDdkPOzx2wXwKaTMWvCB60WPaHRfJUAZQrdFIpxSqFr5VseTclpGzeHXdxaFnxJe/PkxDKYa7sj3Wiv/os1bNeX0IEB3s45eFsHYWGiU8cvsXCGa5z7rrGRDL5hotg7k/MuUWj8w27xXZO423MN5OsHS+wh+tC/5/Xix+w3zxuQhi8jR5DnreimHbhGZn1sYaKYIGCc8mDIDRNl+w1OZ058F+3LAx96QUu5BYiMYOmrlyxrb9b2yPTmmIrI4NtC4ClBQlxuST5wMDP3vUqqWMhn4auk8ev5gHyPestCRrsAXWs07wDNnikemMwo/4wPiTEbnZQV6SLcDUw0gZpXjXwLI7mhsVjEyVNaQnJp6+Wi6FLsAEEMlFYmQut3JecpVIUkjF9uYSN2GLIbXHPs37AiEXPeQ8E/GyBMx3z1X5l8sw/xSNmFgYQC3riajn8V0+SdkuV2PbNbYKtc+uoSCNLppLYCqiOv5eWanGvAQro2Q67FBA4w2xY+V/K8mzHaGMLoDBxJxLslWyJpL5cX0C6qoXVUu8B028auAQM4eVzH1YPF5qrJiCDo",
#'__VIEWSTATE':"W+m8kNAS6QHiRPo+zFj00EDs/Dbq+y/XvtCmSNwOIkGKlikAlphT8HBAWQDskSm1vdNterBuo0Hy7m4xPbXMOnyEm6IlseXO3jPw+ofnI2WHAKknLil+GeS0IfMWGeoD5aNyiz3zh1jZkKU7R7hQsxwARoHRyjhf8UCooFbkVvL6ddHVYZbH5LcocmCF1BTOCqYN5y5yzfDfYbp3KNW9kH53pdmwCsjiEirdxxUGDoG1Ke3JBEXfSl+4XubirHSR8z+VlFmPPXZGU8mMogwq9Eg822RYjvbwvZG74djcf7kdfB9KXCPO9u6cWIjLiW+cfXHSXD+1XYFVf9ATU2/NV4YbUzsI4PJRwoGD4BryUNIm2JFeT4c8F4REYTA16shxz5mDTFQ6rbmg6SmqP8G9gAc2Hr9ABD8+2BUNabGhNZ8wDIZArfYS4pl5DNrlPlpqeCjhmvv0znKAJSOac3pCUej8G90ZGwQKOPORWbNVzQShoH7QvrXV8pCklcia6psuAGO+Oj72oDWPxedE4DjdjX5TbLoW4bzsk/YNfUv4JpjGR8DWpG8IFYJG9CCjMEYb",
'__LASTFOCUS':"",
'__EVENTARGUMENT':"",
'__EVENTTARGET':"",
'ctl00_MainContent_ToolkitScriptManager1_HiddenField':";;AjaxControlToolkit,+Version=4.1.51116.0,+Culture=neutral,+PublicKeyToken=28f01b0e84b6d53e:en-US:fd384f95-1b49-47cf-9b47-2fa2a921a36a:475a4ef5:addc6819:5546a2b:d2e10b12:effe2a26:37e2e5c9:5a682656:c7029a2:e9e598a9"},method='POST',cookies = cookies1)
def parse1(self, response):
path1 = "id('Panel1')"
value1 = response.xpath(path1).extract_first()
print value1
First of all, you are sending the spider more time on the same site, though with different form parameters. You have therefore to use dont_filter=True in the request, otherwise Scrapy blocks duplicate calls.
Then it seems to me that the site you are scraping don't allow you to make more than one request during the same session. Try for example to go to http://fcainfoweb.nic.in/PMSver2/Reports/Report_Menu_web.aspx with your browser, compile the form, get the data and than to go back to the initial page: It's impossible. So you have to modify your spider. Here's a very rough code just to give an idea. It works for me, but please don't use it in production!
class SpidyQuotesViewStateSpider(scrapy.Spider):
name = 'retail_price'
urls = "http://fcainfoweb.nic.in/PMSver2/Reports/Report_Menu_web.aspx"
def start_requests(self):
dated = ["01/03/2017","05/03/2017","04/03/2017"]
for i in dated:
request = scrapy.Request(url=self.urls, dont_filter=True, callback=self.parse)
request.meta['question'] = i
yield request
def parse(self, response):
thedate = response.meta['question']
cookies1 ={}
val = response.headers.getlist('Set-Cookie')
print("login session values" ,response.headers.getlist('Set-Cookie'))
if(len(val) != 0):
cookies1['ASP.NET_SessionId'] = str(str(response.headers.getlist('Set-Cookie')[0]).split(";")[0].split("=")[1])
cookies1['path'] = str(str(response.headers.getlist('Set-Cookie')[0]).split(";")[1].split("=")[1])
yield scrapy.FormRequest(url=self.urls, dont_filter=True, callback=self.parse1, formdata={'ctl00$MainContent$btn_getdata1':"Get Data",
'ctl00$MainContent$Txt_FrmDate': thedate,
'ctl00$MainContent$Ddl_Rpt_Option0':"Daily Prices",
'ctl00$MainContent$Rbl_Rpt_type':"Price report",
'ctl00$MainContent$ddl_Language':"English",
'ctl00$MainContent$Ddl_Rpt_type':"Retail",
'__EVENTVALIDATION':"IwZyKgfTXVzxiHxiPXGk/W8XQZBDb0EOPxJh6s8hofq0ffqOpiHSH77CafcxySF3PbkYgSMNFCJhLM2cGnL6SxT0PJuGDCJtV0V8Y4a94UErUCiSANiin+4uKckk9v9Ux8JqTVeaipppmlH+wyks2U9SgPfkNUsqw4eHCkDyB5akNNZImRIixOHHVY3JSXGkwXn7ueK9w+AgnqJzpXaWdMr9J1++M4VAFImSNF8brFSfPHe5kb/qzkGIwUr/KRouaRYK8WLWZh/Mbl9xwREwhDSxWJSOdihSE0WWoaqSMtpaR99rDDCsD3mdJqfu0aPIlREupTZRzlrmztXU0eS3949YW+ywdTRvykaMNgOW2Q4saYP5j/niKbRW6GiDnaLV2A38X/HW80+trrsjwJr9tjTKVFyikf6s/3gzyiTp11ivSkwIY2b3hutjYn7OfTDo",
#'__EVENTVALIDATION':"HqVo2xHk04clYwnBposXbZGhbIr181A7RbyeZv74Cia7rXSKmpOpbeSnn3XXnoDJKRxMK0W9nxKZFfkNje+P/K7gE5HVjHJr9Gr0Gs46TntzKDsvzyii8jZ7e0fdZgQCJKoXxQNgR2vNkWqChKcEldBuMHCOgJRqCNCF/JPFKpdKZoIWr7GU8rhzwLijf/Gkm+FuTULs/fl2HHK6Z1QQEozzEHFsDwzl0G4IiN//eNYfHuUBXKZ3wdZzPqG0s53WHEuSBzhqBC9AtCJOs4ZZhdtwFh8iyTJ4PlsLP9DLHYHRCOAd72UO0UH8gT7gAkKVo1I4L540DilowOR9SttH7MM/oOs9qhKlnG61FgqkYGW8zGzF/yNEXO+beVAK1RVvuO+FDnuq/g36TRnUieei5GpAZ+96CSoCIxykdvHx8R+smTNF/5erlowV4ci+tcI7",
'__VIEWSTATEENCRYPTED':"",
'__VIEWSTATEGENERATOR':"85862B00",
'__VIEWSTATE':"+a+3jrBEKxDdkPOzx2wXwKaTMWvCB60WPaHRfJUAZQrdFIpxSqFr5VseTclpGzeHXdxaFnxJe/PkxDKYa7sj3Wiv/os1bNeX0IEB3s45eFsHYWGiU8cvsXCGa5z7rrGRDL5hotg7k/MuUWj8w27xXZO423MN5OsHS+wh+tC/5/Xix+w3zxuQhi8jR5DnreimHbhGZn1sYaKYIGCc8mDIDRNl+w1OZ058F+3LAx96QUu5BYiMYOmrlyxrb9b2yPTmmIrI4NtC4ClBQlxuST5wMDP3vUqqWMhn4auk8ev5gHyPestCRrsAXWs07wDNnikemMwo/4wPiTEbnZQV6SLcDUw0gZpXjXwLI7mhsVjEyVNaQnJp6+Wi6FLsAEEMlFYmQut3JecpVIUkjF9uYSN2GLIbXHPs37AiEXPeQ8E/GyBMx3z1X5l8sw/xSNmFgYQC3riajn8V0+SdkuV2PbNbYKtc+uoSCNLppLYCqiOv5eWanGvAQro2Q67FBA4w2xY+V/K8mzHaGMLoDBxJxLslWyJpL5cX0C6qoXVUu8B028auAQM4eVzH1YPF5qrJiCDo",
#'__VIEWSTATE':"W+m8kNAS6QHiRPo+zFj00EDs/Dbq+y/XvtCmSNwOIkGKlikAlphT8HBAWQDskSm1vdNterBuo0Hy7m4xPbXMOnyEm6IlseXO3jPw+ofnI2WHAKknLil+GeS0IfMWGeoD5aNyiz3zh1jZkKU7R7hQsxwARoHRyjhf8UCooFbkVvL6ddHVYZbH5LcocmCF1BTOCqYN5y5yzfDfYbp3KNW9kH53pdmwCsjiEirdxxUGDoG1Ke3JBEXfSl+4XubirHSR8z+VlFmPPXZGU8mMogwq9Eg822RYjvbwvZG74djcf7kdfB9KXCPO9u6cWIjLiW+cfXHSXD+1XYFVf9ATU2/NV4YbUzsI4PJRwoGD4BryUNIm2JFeT4c8F4REYTA16shxz5mDTFQ6rbmg6SmqP8G9gAc2Hr9ABD8+2BUNabGhNZ8wDIZArfYS4pl5DNrlPlpqeCjhmvv0znKAJSOac3pCUej8G90ZGwQKOPORWbNVzQShoH7QvrXV8pCklcia6psuAGO+Oj72oDWPxedE4DjdjX5TbLoW4bzsk/YNfUv4JpjGR8DWpG8IFYJG9CCjMEYb",
'__LASTFOCUS':"",
'__EVENTARGUMENT':"",
'__EVENTTARGET':"",
'ctl00_MainContent_ToolkitScriptManager1_HiddenField':";;AjaxControlToolkit,+Version=4.1.51116.0,+Culture=neutral,+PublicKeyToken=28f01b0e84b6d53e:en-US:fd384f95-1b49-47cf-9b47-2fa2a921a36a:475a4ef5:addc6819:5546a2b:d2e10b12:effe2a26:37e2e5c9:5a682656:c7029a2:e9e598a9"},method='POST',cookies = cookies1)
def parse1(self, response):
path1 = "id('Panel1')"
value1 = response.xpath(path1).extract_first()[:574]
print(value1)

How to pull data from tags based on other tags

I have the following example document:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<n1:Form109495CTransmittalUpstream xmlns="urn:us:gov:treasury:irs:ext:aca:air:7.0" xmlns:irs="urn:us:gov:treasury:irs:common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage IRS-Form1094-1095CTransmitterUpstreamMessage.xsd" xmlns:n1="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage">
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>1</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JOHN</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>Doe</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>123456790</irs:SSN>
</Form1095CUpstreamDetail>
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>2</RecordId>
<CorrectedInd>0</CorrectedInd>
<irs:TaxYr>2015</irs:TaxYr>
<EmployeeInfoGrp>
<OtherCompletePersonName>
<PersonFirstNm>JANE</PersonFirstNm>
<PersonMiddleNm>B</PersonMiddleNm>
<PersonLastNm>DOE</PersonLastNm>
</OtherCompletePersonName>
<PersonNameControlTxt/>
<irs:TINRequestTypeCd>INDIVIDUAL_TIN</irs:TINRequestTypeCd>
<irs:SSN>222222222</irs:SSN>
</EmployeeInfoGrp>
</Form1095CUpstreamDetail>
</n1:Form109495CTransmittalUpstream>
Using Nokogiri I want to extract the value between the <PersonFirstNm>, <PersonLastNm> and <irs:SSN> for each <Form1095CUpstreamDetail> based on the <RecordId>.
I tried removing namespaces as well. I posted a small snippet, but I have tried many iterations of working through the XML with no success. This is my first time using XML, so I realize I am likely missing something easy.
When I set my XPath:
require 'nokogiri'
submission_doc = Nokogiri::XML(open('1094C_Request.xml'))
submissions = submission_doc.remove_namespaces
nodes = submission.xpath('//Form1095CUpstreamDetail')
I do not seem to have any association between the RecordId and the tags mentioned above, and I am stuck on where to go next.
The fields are not listed as children for the RecordId, so I can't think of how to approach obtaining their values. I am including the full document as an example to make sure I am not excluding anything.
I have an array of values, and I would like to pull the three tags mentioned above if the RecordId is contained within the array of numbers.
Nokogiri makes it pretty easy to do what you want (assuming the XML is syntactically correct). I'd do something like:
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(<<EOT)
<n1:Form109495CTransmittalUpstream xmlns="urn:us:gov:treasury:irs:ext:aca:air:7.0" xmlns:irs="urn:us:gov:treasury:irs:common" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage IRS-Form1094-1095CTransmitterUpstreamMessage.xsd" xmlns:n1="urn:us:gov:treasury:irs:msg:form1094-1095Ctransmitterupstreammessage">
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>1</RecordId>
<PersonFirstNm>JOHN</PersonFirstNm>
<PersonLastNm>Doe</PersonLastNm>
<irs:SSN>123456790</irs:SSN>
</Form1095CUpstreamDetail>
<Form1095CUpstreamDetail RecordType="String" lineNum="1">
<RecordId>2</RecordId>
<PersonFirstNm>JANE</PersonFirstNm>
<PersonLastNm>DOE</PersonLastNm>
<irs:SSN>222222222</irs:SSN>
</Form1095CUpstreamDetail>
</Form109495CTransmittalUpstream>
EOT
info = doc.search('Form1095CUpstreamDetail').map{ |form|
{
record_id: form.at('RecordId').text,
person_first_nm: form.at('PersonFirstNm').text,
person_last_nm: form.at('PersonLastNm').text,
ssn: form.at('irs|SSN').text
}
}
pp info
# >> [{:record_id=>"1",
# >> :person_first_nm=>"JOHN",
# >> :person_last_nm=>"Doe",
# >> :ssn=>"123456790"},
# >> {:record_id=>"2",
# >> :person_first_nm=>"JANE",
# >> :person_last_nm=>"DOE",
# >> :ssn=>"222222222"}]
While it's possible to do this with XPath, Nokogiri's implementation of CSS selectors tends to result in more easily read selectors, which translates to easier to maintain, which is a very good thing.
You'll see the use of | in 'irs|SSN' which is Nokogiri's way of defining a namespace for CSS. This is documented in "Namespaces".
First of all the xml validator reports error
The default (no prefix) Namespace URI for XPath queries is always '' and it cannot be redefined to 'urn:us:gov:treasury:irs:ext:aca:air:7.0'.
so you must set this default xmlns to "".
You can use this code.
require 'nokogiri'
doc = Nokogiri::XML(open('1094C_Request.xml'))
doc.namespaces['xmlns'] = ''
details = doc.xpath("//:Form1095CUpstreamDetail")
elem_a = ["PersonFirstNm", "PersonLastNm", "irs:SSN"]
output = details.each_with_object({}) do |element, exp|
exp[element.xpath("./:RecordId").text] = elem_a.each_with_object({}) do |elem_n, exp_h|
exp_h[elem_n] = element.xpath(".//#{elem_n.include?(':') ? elem_n : ":#{elem_n}"}").text
end
end
output
p output
# {
# "1" => {"PersonFirstNm" => "JOHN", "PersonLastNm" => "Doe", "irs:SSN" => "123456790"},
# "2" => {"PersonFirstNm" => "JANE", "PersonLastNm" => "DOE", "irs:SSN" => "222222222"}
# }
I hope this helps

how to replace #VARIABLE in text file with value of Emails in xml file

How i can read this xml file
<Subs>
<Sub Report="BusinessSummarySubs" EMails="lalla#yahoo.com; haha#yahoo.com">
<Sub Report="PlayerSubs" EMails="hehe#hotmail.com">
</Subs>
and replace #VARIABLE in BusinesSummarySubs.txt with EMails value in
Here is the content(part of the content) from BusinessSumarySubs.txt
CType(extensionParams(0),ParameterValue).Name = "TO"
CType(extensionParams(0),ParameterValue).Label = ""
CType(extensionParams(0),ParameterValue).Value = "#VARIABLE"
If you look here, you'll see how to search for and to access attributes. Follow the link chain to 'the same for text' and do a mental diff, if you want to get a skeleton for a minimal XML processing script to use for your next task.
Single placeholder substitution in VBScript is easy: just use Replace:
>> attr = "lalla#yahoo.com; haha#yahoo.com"
>> content = "... .Value = ""#VARIABLE"" ..."
>> ph = "#VARIABLE"
>> WScript.Echo Replace(content, ph, attr)
>>
... .Value = "lalla#yahoo.com; haha#yahoo.com" ...
>>
Something like this i suposed
set xmlDoc=CreateObject("Microsoft.XMLDOM")
xmlDoc.async="false"
xmlDoc.load("note.xml")
for each Emails in xmlDoc.documentElement.childNodes
document.write(Emails .nodename)
document.write(": ")
document.write(Emails .text)
next

How to sort a comma-delimited string?

I have the following Ruby code:
settings= hash.new
settings= batch.getPartialSettings
settings= batchSettings.merge(batch.getEntireSettings)
puts settings
The result is:
{"Resolution"=>"1024", "Applications"=>"Mozilla,IE,Chrome", "Programming"=>"Java,HTML"}
I want "Applications" to be sorted as:
"Applications"=>"Chrome,IE,Mozilla"
So, my final result should be:
{"Resolution"=>"1024", "Applications"=>"Chrome,IE,Mozilla", "Programming"=>"Java,HTML"}
unsorted_apps = settings['Applications']
sorted_apps = unsorted_apps.split(',').sort.join(',')
settings['Applications'] = sorted_apps

Ruby Hash parsed_response error

BACKGROUND
I am using HTTParty to parse an XML hash response. Unfortunately, when the hash response only has one entry(?), the resulting hash is not indexable. I have confirmed the resulting XML syntax is the same for single and multiple entry(?). I have also confirmed my code works when there are always multiple entries(?) in the hash.
QUESTION
How do I accommodate the single hash entry case and/or is there an easier way to accomplish what I am trying to do?
CODE
require 'httparty'
class Rest
include HTTParty
format :xml
end
def test_redeye
# rooms and devices
roomID = Hash.new
deviceID = Hash.new { |h,k| h[k] = Hash.new }
rooms = Rest.get(#reIp["theater"] + "/redeye/rooms/").parsed_response["rooms"]
puts "rooms #{rooms}"
rooms["room"].each do |room|
puts "room #{room}"
roomID[room["name"].downcase.strip] = "/redeye/rooms/" + room["roomId"]
puts "roomid #{roomID}"
devices = Rest.get(#reIp["theater"] + roomID[room["name"].downcase.strip] + "/devices/").parsed_response["devices"]
puts "devices #{devices}"
devices["device"].each do |device|
puts "device #{device}"
deviceID[room["name"].downcase.strip][device["displayName"].downcase.strip] = "/devices/" + device["deviceId"]
puts "deviceid #{deviceID}"
end
end
say "Done"
end
XML - SINGLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
XML - MULTIPLE ENTRY
<?xml version="1.0" encoding="UTF-8" ?>
<devices>
<device manufacturerName="Denon" description="" portType="infrared" deviceType="6" modelName="Avr-3311ci" displayName="AVR" deviceId="77" />
<device manufacturerName="Philips" description="" portType="infrared" deviceType="0" modelName="" displayName="TV" deviceId="82" />
</devices>
RESULTING ERROR
[Info - Plugin Manager] Matches, executing block
rooms {"room"=>[{"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}, {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}, {"name"=>"Theater", "currentActivityId"=>"-1", "roomId"=>"80", "description"=>"1st Floor"}]}
room {"name"=>"Home Theater", "currentActivityId"=>"78", "roomId"=>"-1", "description"=>""}
roomid {"home theater"=>"/redeye/rooms/-1"}
devices {"device"=>[{"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}, {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}]}
device {"manufacturerName"=>"Denon", "description"=>"", "portType"=>"infrared", "deviceType"=>"6", "modelName"=>"Avr-3311ci", "displayName"=>"AVR", "deviceId"=>"77"}
deviceid {"home theater"=>{"avr"=>"/devices/77"}}
device {"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}
deviceid {"home theater"=>{"avr"=>"/devices/77", "tv"=>"/devices/82"}}
room {"name"=>"Living", "currentActivityId"=>"-1", "roomId"=>"81", "description"=>"2nd Floor"}
roomid {"home theater"=>"/redeye/rooms/-1", "living"=>"/redeye/rooms/81"}
devices {"device"=>{"manufacturerName"=>"Philips", "description"=>"", "portType"=>"infrared", "deviceType"=>"0", "modelName"=>"", "displayName"=>"TV", "deviceId"=>"82"}}
device ["manufacturerName", "Philips"]
/usr/local/rvm/gems/ruby-1.9.3-p374#SiriProxy/gems/siriproxy-0.3.2/plugins/siriproxy-redeye/lib/siriproxy-redeye.rb:145:in `[]': can't convert String into Integer (TypeError)
There are a couple of options I see. If you control the endpoint, you could modify the XML being sent to accomodate HTTParty's underlying XML parser, Crack by putting a type="array" attribute on the devices XML element.
Otherwise, you could check to see what class the device is before indexing into it:
case devices["device"]
when Array
# act on the collection
else
# act on the single element
end
It's much less than ideal whenever you have to do type-checking in a dynamic language, so if you find yourself doing this more than once it may be worth introducing polymorphism or at the very least extracting a method to do this.

Resources