I am getting the amounts from an xml file but I need to sum them to check.
I am using Ruby on rails with the Nokogiri gem
Example from xml file:
<cfdi:Concepto ClaveProdServ="15101514" NoIdentificacion="PL/762/EXP/ES/2015-16665610" Cantidad="52.967" ClaveUnidad="LTR" Descripcion="MAGNA (LT)" ValorUnitario="16.34" Importe="865.74">
<cfdi:Traslado Base="842.59" Impuesto="002" TipoFactor="Tasa" TasaOCuota="0.160000" Importe="134.81"/>
<cfdi:Concepto ClaveProdServ="15101514" NoIdentificacion="PL/767/EXP/ES/2015-8515840" Cantidad="35.045" ClaveUnidad="LTR" Descripcion="MAGNA (LT)" ValorUnitario="16.34" Importe="572.80">
<cfdi:Traslado Base="557.49" Impuesto="002" TipoFactor="Tasa" TasaOCuota="0.160000" Importe="89.20"/>
<cfdi:Concepto ClaveProdServ="15101514" NoIdentificacion="PL/762/EXP/ES/2015-16665910" Cantidad="21.992" ClaveUnidad="LTR" Descripcion="MAGNA (LT)" ValorUnitario="16.34" Importe="359.45">
<cfdi:Traslado Base="349.84" Impuesto="002" TipoFactor="Tasa" TasaOCuota="0.160000" Importe="55.97"/>
<cfdi:Concepto ClaveProdServ="15101514" NoIdentificacion="PL/762/EXP/ES/2015-16665560" Cantidad="25.002" ClaveUnidad="LTR" Descripcion="MAGNA (LT)" ValorUnitario="16.34" Importe="408.62">
<cfdi:Traslado Base="397.69" Impuesto="002" TipoFactor="Tasa" TasaOCuota="0.160000" Importe="63.63"/>
I managed to obtain all the amounts and taxes with these line of code:
array = []
array_i = []
file = Nokogiri::XML([:consumption][:factura]))
doc_pass = file.xpath("//cfdi:Comprobante/cfdi:Conceptos/cfdi:Concepto")
doc_pass.each do |pass|
hash_importe = {}
hash_importe[:total] = pass['Importe']
array << hash_importe
doc_pass2 = file.xpath("//cfdi:Comprobante/cfdi:Conceptos/cfdi:Concepto/cfdi:Impuestos/cfdi:Traslados/cfdi:Traslado")
doc_pass2.each do |pass2|
hash_impuesto = {}
hash_impuesto[:tax] = pass2['Importe']
array_i << hash_impuesto
these are the results I get from the xml file:
(byebug) array
[{:importe=>"865.74"}, {:importe=>"572.80"}, {:importe=>"359.45"}, {:importe=>"408.62"}, {:importe=>"324.48"}, {:importe=>"649.64"}, {:importe=>"823.45"}, {:importe=>"545.15"}, {:importe=>"428.02"}, {:importe=>"527.21"}, {:importe=>"487.67"}, {:importe=>"331.72"}, {:importe=>"511.64"}, {:importe=>"406.67"}, {:importe=>"820.81"}, {:importe=>"1635.54"}, {:importe=>"484.14"}, {:importe=>"564.83"}, {:importe=>"1463.30"}]
(byebug) array_i
[{:importe=>"134.81"}, {:importe=>"89.20"}, {:importe=>"55.97"}, {:importe=>"63.63"}, {:importe=>"50.52"}, {:importe=>"101.18"}, {:importe=>"128.21"}, {:importe=>"84.88"}, {:importe=>"66.73"}, {:importe=>"82.10"}, {:importe=>"75.90"}, {:importe=>"51.58"}, {:importe=>"79.67"}, {:importe=>"63.33"}, {:importe=>"127.80"}, {:importe=>"254.69"}, {:importe=>"75.36"}, {:importe=>"87.92"}, {:importe=>"227.84"}]
now what I want is to sum both values(importe + impuesto) for example:
865.74 + 134.81
572.80 + 89.20
359.45 + 55.97
I am new with rails, I would appreciate your help
You can return an array with results if both arrays have the same size(I think yes), like this:
(0..array.size - 1).each_with_object([]) { |i, obj| obj << array[i][:importe].to_f + array_i[i][:importe].to_f }
[1000.55, 662.0, 415.41999999999996, 472.25, 375.0, 750.8199999999999, 951.6600000000001, 630.03, 494.75, 609.3100000000001, 563.57, 383.3, 591.31, 470.0, 948.6099999999999, 1890.23, 559.5, 652.75, 1691.1399999999999]
Use zip method to combine values at corresponding index of two arrays
result =
.map { |importe, impuesto| importe[:importe].to_f + impuesto[:importe].to_f }
Or can be simplified more for your concrete data structure
result = { |hashes| hashes.sum {|h| h[:importe].to_f }}
Better approach would be if you extract Concepto object with Impuesto and Importe values directly from xml, then you don't need to combine different arrays, but use nicely structured object.
I am trying to crawl a website. I want to do it for different dates. So i am storing date in a list. But while trying to access items of list, crawler works only for 1st value in list. Please help. following is my code:
class SpidyQuotesViewStateSpider(scrapy.Spider):
name = 'retail_price'
def start_requests(self):
print "start request"
urls = ""
yield scrapy.Request(url=urls, callback=self.parse)
def parse(self, response):
dated = ["05/03/2017","04/03/2017"]
urls = ""
#frmdata =
cookies1 ={}
val = response.headers.getlist('Set-Cookie')
print "login session values" ,response.headers.getlist('Set-Cookie')
if(len(val) != 0):
cookies1['ASP.NET_SessionId'] = str(response.headers.getlist('Set-Cookie')[0].split(";")[0].split("=")[1])
cookies1['path'] = str(response.headers.getlist('Set-Cookie')[0].split(";")[1].split("=")[1])
print cookies1;
for i in range(len(dated)):
yield scrapy.FormRequest(url=urls, callback=self.parse1, formdata={'ctl00$MainContent$btn_getdata1':"Get Data",
'ctl00$MainContent$Ddl_Rpt_Option0':"Daily Prices",
'ctl00$MainContent$Rbl_Rpt_type':"Price report",
'ctl00_MainContent_ToolkitScriptManager1_HiddenField':";;AjaxControlToolkit,+Version=4.1.51116.0,+Culture=neutral,+PublicKeyToken=28f01b0e84b6d53e:en-US:fd384f95-1b49-47cf-9b47-2fa2a921a36a:475a4ef5:addc6819:5546a2b:d2e10b12:effe2a26:37e2e5c9:5a682656:c7029a2:e9e598a9"},method='POST',cookies = cookies1)
def parse1(self, response):
path1 = "id('Panel1')"
value1 = response.xpath(path1).extract_first()
print value1
First of all, you are sending the spider more time on the same site, though with different form parameters. You have therefore to use dont_filter=True in the request, otherwise Scrapy blocks duplicate calls.
Then it seems to me that the site you are scraping don't allow you to make more than one request during the same session. Try for example to go to with your browser, compile the form, get the data and than to go back to the initial page: It's impossible. So you have to modify your spider. Here's a very rough code just to give an idea. It works for me, but please don't use it in production!
class SpidyQuotesViewStateSpider(scrapy.Spider):
name = 'retail_price'
urls = ""
def start_requests(self):
dated = ["01/03/2017","05/03/2017","04/03/2017"]
for i in dated:
request = scrapy.Request(url=self.urls, dont_filter=True, callback=self.parse)
request.meta['question'] = i
yield request
def parse(self, response):
thedate = response.meta['question']
cookies1 ={}
val = response.headers.getlist('Set-Cookie')
print("login session values" ,response.headers.getlist('Set-Cookie'))
if(len(val) != 0):
cookies1['ASP.NET_SessionId'] = str(str(response.headers.getlist('Set-Cookie')[0]).split(";")[0].split("=")[1])
cookies1['path'] = str(str(response.headers.getlist('Set-Cookie')[0]).split(";")[1].split("=")[1])
yield scrapy.FormRequest(url=self.urls, dont_filter=True, callback=self.parse1, formdata={'ctl00$MainContent$btn_getdata1':"Get Data",
'ctl00$MainContent$Txt_FrmDate': thedate,
'ctl00$MainContent$Ddl_Rpt_Option0':"Daily Prices",
'ctl00$MainContent$Rbl_Rpt_type':"Price report",
'ctl00_MainContent_ToolkitScriptManager1_HiddenField':";;AjaxControlToolkit,+Version=4.1.51116.0,+Culture=neutral,+PublicKeyToken=28f01b0e84b6d53e:en-US:fd384f95-1b49-47cf-9b47-2fa2a921a36a:475a4ef5:addc6819:5546a2b:d2e10b12:effe2a26:37e2e5c9:5a682656:c7029a2:e9e598a9"},method='POST',cookies = cookies1)
def parse1(self, response):
path1 = "id('Panel1')"
value1 = response.xpath(path1).extract_first()[:574]
I need to map some intervals (actually these are intervals of addresses) to object ids.
I tried to use boost's interval_map, the example looks very pretty, it easily enumerates all intervals like:
while(it != party.end())
interval<ptime>::type when = it->first;
// Who is at the party within the time interval 'when' ?
GuestSetT who = (*it++).second;
cout << when << ": " << who << endl;
Which outputs:
----- History of party guests -------------------------
[2008-May-20 19:30:00, 2008-May-20 20:10:00): Harry Mary
[2008-May-20 20:10:00, 2008-May-20 22:15:00): Diana Harry Mary Susan
[2008-May-20 22:15:00, 2008-May-20 23:00:00): Diana Harry Mary Peter Susan
[2008-May-20 23:00:00, 2008-May-21 00:00:00): Diana Peter Susan
[2008-May-21 00:00:00, 2008-May-21 00:30:00): Peter
but it cannot do something like this:
interval<ptime>::type when =
time_from_string("2008-05-20 22:00"),
time_from_string("2008-05-20 22:01"));
GuestSetT who = party[when];
cout << when << ": " << who << endl;
it outputs: error: no match for 'operator[]' in 'party[when]'
it looks strange, since the main function of map is in operator[]
so I cannot get information "who were at the party at a given time"
Is there a ready-to-use solution for this problem?
It's somewhat counter-intuitive, but the () operator is what you're looking for. From the docs, operator() is defined as "Return[ing] the mapped value for a key x. The operator is only available for total maps."
I'm a Java guy, new to Ruby. I've been playing with it just to see what it can do, and I'm running into an issue that I can't solve.
I decided to try out Sinatra, again, just to see what it can do, and decided to play with the ESPN API and see if I can pull the venue of a team via the API.
I'm able to make the call and get the data back, but I am having trouble parsing it:
{"sports"=>[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>""}, "news"=>{"href"=>""}, "notes"=>{"href"=>""}}, "web"=>{"teams"=>{"href"=>""}}, "mobile"=>{"teams"=>{"href"=>""}}}}]}]}], "resultsOffset"=>0, "resultsLimit"=>50, "resultsCount"=>1, "timestamp"=>"2013-08-04T14:47:13Z", "status"=>"success"}
I want to pull the venues part of the object, specifically the name value. Every time I try to parse it I end up getting an error along the lines of "cannot change from nil to string" and then also I've gotten an integer to string error.
Here's what i have so far:
get '/venue/:team' do
id = ids[params[:team]]
url = '' + id + '?enable=venues&apikey=' + $key
resp = Net::HTTP.get_response(URI.parse(url))
data = resp.body
parsed = JSON.parse(resp.body)
#venueData = parsed["sports"]
"Looking for the venue of the #{params[:team]}, which has id " + id + ", and here's the data returned: " + venueData.to_s
When I do parsed["sports"} I get:
[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>""}, "news"=>{"href"=>""}, "notes"=>{"href"=>""}}, "web"=>{"teams"=>{"href"=>""}}, "mobile"=>{"teams"=>{"href"=>""}}}}]}]}]
But nothing else parses. Please help!
Like I said, I'm not trying to do anything fancy, just figure out Ruby a little for fun, but I have been stuck on this issue for days now. Any help would be appreciated!
JSON straight from the API:
{"sports" :[{"name" :"baseball","id" :1,"uid" :"s:1","leagues" :[{"name" :"Major League Baseball","abbreviation" :"mlb","id" :10,"uid" :"s:1~l:10","groupId" :9,"shortName" :"MLB","teams" :[{"id" :17,"uid" :"s:1~l:10~t:17","location" :"Cincinnati","name" :"Reds","abbreviation" :"CIN","color" :"D60042","venues" :[{"id" :83,"name" :"Great American Ball Park","city" :"Cincinnati","state" :"Ohio","country" :"","capacity" :0}],"links" :{"api" :{"teams" :{"href" :""},"news" :{"href" :""},"notes" :{"href" :""}},"web" :{"teams" :{"href" :""}},"mobile" :{"teams" :{"href" :""}}}}]}]}],"resultsOffset" :0,"resultsLimit" :50,"resultsCount" :1,"timestamp" :"2013-08-05T19:44:32Z","status" :"success"}
The result of data.inspect:
"{\"sports\" :[{\"name\" :\"baseball\",\"id\" :1,\"uid\" :\"s:1\",\"leagues\" :[{\"name\" :\"Major League Baseball\",\"abbreviation\" :\"mlb\",\"id\" :10,\"uid\" :\"s:1~l:10\",\"groupId\" :9,\"shortName\" :\"MLB\",\"teams\" :[{\"id\" :17,\"uid\" :\"s:1~l:10~t:17\",\"location\" :\"Cincinnati\",\"name\" :\"Reds\",\"abbreviation\" :\"CIN\",\"color\" :\"D60042\",\"venues\" :[{\"id\" :83,\"name\" :\"Great American Ball Park\",\"city\" :\"Cincinnati\",\"state\" :\"Ohio\",\"country\" :\"\",\"capacity\" :0}],\"links\" :{\"api\" :{\"teams\" :{\"href\" :\"\"},\"news\" :{\"href\" :\"\"},\"notes\" :{\"href\" :\"\"}},\"web\" :{\"teams\" :{\"href\" :\"\"}},\"mobile\" :{\"teams\" :{\"href\" :\"\"}}}}]}]}],\"resultsOffset\" :0,\"resultsLimit\" :50,\"resultsCount\" :1,\"timestamp\" :\"2013-08-05T19:44:24Z\",\"status\" :\"success\"}"
parsed["sports"] does not exist, parse your input and inspect it/ dump it
With the data you've provided in the question, you can get to the venues information like this:
require 'json'
json = JSON.parse data
# => [{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}]
By replacing each of the first calls with an iterator, you can search through without knowing where the data is:
venues = h["venues"].map{|h| h["name"]}.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
This outputs:
name: Cincinnati Reds venues: Great American Ball Park
Depending on how stable the response data is you may be able to cut out several of the iterators:
venues = h["venues"].map{|h| h["name"] }.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
and you'll most likely want to save the data, so something like each_with_object is helpful:
team_and_venues = json["sports"].first["leagues"]
venues = h["venues"].map{|h| h["name"]}.join(", ")
xs << %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
Notice that when an iterator declares variables, even if there is a variable with the same name outside the block, the scope of the block is respected and the block's variables remain local.
That's some pretty ugly code if you ask me, but it's a place to start.
I am using REXML for a sample XML file:
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = ""/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = ""/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
I am reading the XML like this:
#data = ( file).root
#dataarr = ##testdata.elements.to_a("//Account")
Now I want to get the username of the frenchcustomer, so I tried this:
this fails, I do not want to use the array index, for example
will work, but I don't want to do that, is there something that i m missing here. I want to use the array and get the username of the french user using the Account name.
Thanks a lot.
I recommend you to use XPath.
For the first match, you can use first method, for an array, just use match.
The code above returns the username for the Account "frenchcustomer" :
REXML::XPath.first(yourREXMLDocument, "//Account[#name='frenchcustomer']/username/#name").value
If you really want to use the array created with ##testdata.elements.to_a("//Account"), you could use find method :
french_cust_elt = the_array.find { |elt| elt.attributes['name'].eql?('frenchcustomer') }
french_username = french_cust_elt.elements["username"].attributes["name"]
puts #data.elements["//Account[#name='frenchcustomer']"]
If you want to iterate over multiple identical names:
#data.elements.each("//Account[#name='frenchcustomer']") do |fc|
puts fc.elements["username"].attributes["name"]
I don't know what your ##testdata are, I tried with the following testcode:
require "rexml/document"
#data = ( DATA).root
#dataarr = #data.elements.to_a("//Account")
# Works
p #dataarr[1].elements["username"].attributes["name"]
#Works not
#~ p #dataarr[#name='fenchcustomer'].elements["username"].attributes["name"]
##dataarr is an array
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
##dataarr is an array
puts "===Array#each"
next unless acc.attributes['name'] =='frenchcustomer'
p acc.elements["username"].attributes["name"]
puts "===XPATH"
p acc.elements["username"].attributes["name"]
<Accounts title="This is the test title">
<Account name="frenchcustomer">
<username name = "frencu"/>
<password pw = "hello34"/>
<accountdn dn = ""/>
<exporttest name="basic">
<exportname name = "basicexport"/>
<exportterm term = "oldschool"/>
<Account name="britishcustomer">
<username name = "britishcu"/>
<password pw = "mellow34"/>
<accountdn dn = ""/>
<exporttest name="existingsearch">
<exportname name = "largexpo"/>
<exportterm term = "greatschool"/>
I'm not very familiar with rexml, so I expect there is a better solution. But perhaps aomebody can take my code to build a better solution.