Parsing XML message - ruby

I'm attempting to parse the following XML:
<marketstat><type id="18">
<buy><volume>33000000</volume><avg>40.53</avg><max>65.57</max><min>6.55</min><stddev>26.61</stddev><median>58.56</median><percentile>65.57</percentile></buy>
<sell><volume>494489</volume><avg>69.47</avg><max>69.47</max><min>69.47</min><stddev>0.00</stddev><median>69.47</median><percentile>69.47</percentile></sell>
<all><volume>33494489</volume><avg>40.96</avg><max>69.47</max><min>6.55</min><stddev>26.77</stddev><median>58.56</median><percentile>6.55</percentile></all>
</type><type id="19">
<buy><volume>270000</volume><avg>1707.31</avg><max>3549.38</max><min>239.74</min><stddev>1554.26</stddev><median>239.75</median><percentile>3549.34</percentile></buy>
<sell><volume>48599</volume><avg>24930.45</avg><max>29869.95</max><min>5200.00</min><stddev>9875.66</stddev><median>29869.93</median><percentile>5232.20</percentile></sell>
<all><volume>280926</volume><avg>1957.07</avg><max>10750.00</max><min>239.74</min><stddev>3352.87</stddev><median>1874.31</median><percentile>239.74</percentile></all>
</type></marketstat>
</evec_api>
The pieces of information that I want to retrieve are the minimum sell and maximum buy values, associated with the ID, found here: <sell><min>69.47</min></sell>.
I'm currently using the following to get the XML: marketData = Nokogiri::XML(open(api))

Use xpath to pull out the nodes of interest, then convert them to Floats and pick the value you want. The path to your minimum sell node is /marketstat/type/sell/min, or if you want to use shorthand, // says "anywhere in the document", so you can specify just //sell/min to get all of the minimum sell nodes and //buy/max to get all of the maximum buys.
sells = market_data.xpath('//sell/min').map(&:content).map(&:to_f)
buys = market_data.xpath('//buy/max').map(&:content).map(&:to_f)
puts sells.min, buys.max

The following will print the ID and its corresponding min/max:
marketData = Nokogiri::XML(open(api))
marketData.xpath("//type").each do |i|
puts "#{i.attr('id')}: #{i.xpath('.//max').map {|j| j.text.to_f}.max}"
puts "#{i.attr('id')}: #{i.xpath('.//min').map {|j| j.text.to_f}.min}"
end
Output:
18: 69.47
18: 6.55
19: 29869.95
19: 239.74

Related

Scan/Match incorrect input error messages

I am trying to count the correct inputs from the user. An input looks like:
m = "<ex=1>test xxxx <ex=1>test xxxxx test <ex=1>"
The tag ex=1 and the word test have to be connected and in this particular order to count as correct. In case of an invalid input, I want to send the user an error message that explains the error.
I tried to do it as written below:
ex_test_size = m.scan(/<ex=1>test/).size # => 2
test_size = m.scan(/test/).size # => 3
ex_size = m.scan(/<ex=1>/).size # => 3
puts "lack of tags(<ex=1>)" if ex_test_size < ex_size
puts "Lack of the word(test)" if ex_test_size < test_size
I believe it can be written in a better way as the way I wrote, I guess, is prone to errors. How can I make sure that all the errors will be found and shown to the user?
You might use negative lookarounds:
#⇒ ["xxx test", "<ex=1>"]
m.scan(/<ex=1>(?!test).{,4}|.{,4}(?<!<ex=1>)test/).map do |msg|
"<ex=1>test expected, #{msg} got"
end.join(', ')
We scan the string for either <ex=1> not followed by test or vice versa. Also, we grab up to 4 characters that violate the rule for the more descriptive message.

Ruby splitting a record into multiple records based on contents of a field

Record layout contains two fields:
Requistion
Test Names
Example record:
R00000001,"4 Calprotectin, 1 Luminex xTAG, 8 H. pylori stool antigen (IgA), 9 Lactoferrin, 3 Anti-gliadin IgA, 10 H. pylori Panel, 6 Fecal Fat, 11 Antibiotic Resistance Panel, 2 C. difficile Tox A/ Tox B, 5 Elastase, 7 Fecal Occult Blood, 12 Shigella"
The current Ruby code snippet that is used in the LIMS (Lab Info Management System) system is this:
subj.get_value('Tests').join(', ')
What I need to be able to do in the Ruby code snippet is create a new record off each comma-separated value in the second field.
NOTE:
the amount of values in the 'Test Names' field varies from 1 to 20...or more.
There can be 100's of Requistion records
Final result would be:
R00000001,"4 Calprotectin"
R00000001,"1 Luminex xTAG"
R00000001,"8 H. pylori stool antigen (IgA)"
R00000001,"9 Lactoferrin"
R00000001,"3 Anti-gliadin IgA"
R00000001,"10 H. pylori Panel"
R00000001,"6 Fecal Fat"
R00000001,"11 Antibiotic Resistance Panel"
R00000001,"2 C. difficile Tox A/ Tox B"
R00000001,"5 Elastase"
R00000001,"7 Fecal Occult Blood"
R00000001,"12 Shigella"
If your data is a reliable string which you've shown in your example, here's your method:
data = subj.get_value('Tests').join(', ') # assuming this gives your string obj.
def split_data(data)
arr = data.gsub('"','').split(',')
arr.map {|l| "#{arr[0]} \"#{l.strip}\""}[1..-1]
end
puts split_data(data)

`*': negative argument (ArgumentError)

I'm trying to sort in descending order an array of photo objects from Flickr API based on the number of comments(count_comments) of each photo. I'm using the following code.
def rank_photos(photos)
photos.sort_by { |photo| photo.count_comments * -1 }
end
However I get the following error message.
*': negative argument (ArgumentError)
Here is what the Array looks like
[{"id"=>"38280904752", "owner"=>"131718287#N07",
"secret"=>"abe0b93180", "server"=>"4583", "farm"=>5,
"title"=>"IMG_3640", "ispublic"=>1, "isfriend"=>0, "isfamily"=>0,
"count_comments"=>"0", "tags"=>"washington post dc web women codeher17
dctech tech technology",
"url_m"=>"https://farm5.staticflickr.com/4583/38280904752_abe0b93180.jpg", "height_m"=>"333", "width_m"=>"500"}, {"id"=>"38312540901",
"owner"=>"131718287#N07", "secret"=>"7b6e6805d4", "server"=>"4568",
"farm"=>5, "title"=>"IMG_3458", "ispublic"=>1, "isfriend"=>0,
"isfamily"=>0, "count_comments"=>"0", "tags"=>"washington post dc web
women codeher17 dctech tech technology",
"url_m"=>"https://farm5.staticflickr.com/4568/38312540901_7b6e6805d4.jpg", "height_m"=>"500", "width_m"=>"333"}, {"id"=>"38281453252",
"owner"=>"131718287#N07", "secret"=>"438293cffd", "server"=>"4539",
"farm"=>5, "title"=>"IMG_3460", "ispublic"=>1, "isfriend"=>0,
"isfamily"=>0, "count_comments"=>"0", "tags"=>"washington post dc web
women codeher17 dctech tech technology",
"url_m"=>"https://farm5.staticflickr.com/4539/38281453252_438293cffd.jpg", "height_m"=>"333", "width_m"=>"500"}
Why is throwing this error?
count_comments is a string, so you should convert it to a number first. In the process you can also eliminate the multiplication altogether.
def rank_photos(photos)
photos.sort_by { |photo| -photo.count_comments.to_i }
end

I want to create a new local variable from the sum of two others in Ruby but I'm stuck

I am trying to change one example to take a user input rather than using hard coded values then use those local variables to work out items needed.
So far my code looks like this:
print "Number of cars available today."
cars = gets.chomp()
print "Number of available seats in the car."
space_in_a_car = gets.chomp()
print "Number of drivers available."
drivers = gets.chomp()
print "Number of passagers that need transport."
passangers = gets.chomp
cars_not_driven = #{cars} - #{drivers}
cars_driven = drivers
carpool_capacity = #{cars_driven} * #{space_in_a_car}
average_passanger_per_car = #{passangers} / #{drivers}
print "The number of cars being driven today is #{cars_driven}.\n"
print "The number of cars not being driven today is #{cars_not_driven}.\n"
print "We have #{carpool_capacity} cars available.\n"
print "So we need to carry #{average_passanger_per_car} passangers per car to make sure we can transport everyone.\n"
The code will run without throwing any errors but of course because I am not getting the correct commands in:
cars_not_driven = #{cars} - #{drivers}
cars_driven = drivers
carpool_capacity = #{cars_driven} * #{space_in_a_car}
average_passanger_per_car = #{passangers} / #{drivers}
the only value I am getting in the return is:
print "The number of cars being driven today is #{cars_driven}.\n"
How should I be writing:
cars_not_driven = #{cars} - #{drivers} etc
to get the number of cars_not_driven?
I don't understand very well why you use this #{var} and <br>. If I am right that you want to use plain ruby, this should be the solution:
print "Number of cars available today."
cars = gets.chomp().to_i
print "Number of available seats in the car."
space_in_a_car = gets.chomp().to_i
print "Number of drivers available."
drivers = gets.chomp().to_i
print "Number of passagers that need transport."
passangers = gets.chomp.to_i
cars_not_driven = cars - drivers
cars_driven = drivers
carpool_capacity = cars_driven * space_in_a_car
average_passanger_per_car = passangers / drivers
print "The number of cars being driven today is #{cars_driven}.\n"
print "The number of cars not being driven today is #{cars_not_driven}.\n"
print "We have #{carpool_capacity} cars available.\n"
print "So we need to carry #{average_passanger_per_car} passangers per car to make sure we can transport everyone.\n"
When you use gets Ruby is expecting and returning a String. So variables car, drivers etc are all String.
In order to do integer operation over them, you need to convert them to integers. In Ruby you can do this using .to_i.
Now with that information, try:
cars_not_driven = cars.to_i - drivers.to_i
cars_driven = drivers.to_i
carpool_capacity = cars_driven * space_in_a_car.to_i
average_passanger_per_car = passangers.to_i / drivers.to_i
You can check the class of a variable using .class. Here:
cars = gets.chomp()
10
# => "10"
cars.class
# => String
drivers = gets.chomp()
20
# => "20"
drivers.class
# => String
Now lets add them:
cars + drivers
# => "1020"
Since they are string, + operator is adding two strings to one. Not something you intended. Now try this:
cars.to_i + drivers.to_i
# => 30

Sorting and Balancing Across Multiple Columns

Problem
I have a Hash of data that looks something like this.
{ "GROUP_A" => [22, 440],
"GROUP_B" => [14, 70],
"GROUP_C" => [60, 620],
"GROUP_D" => [174, 40],
"GROUP_E" => [4, 12]
# ...few hundred more
}
GROUP_A has 22 accounts and they are using 440GB of data...and so on. There are a couple hundred of these groups. Some have a lot of accounts but use very little storage and some have only a few users and use A LOT of storage, some are just average.
I have X number of buckets (servers) that I want to put these groups of accounts into, and I want there to be approximately the same number of accounts per bucket and have each bucket also contain approximately the same amount of data. Number of groups is not important, so if a bucket had 1 group of 1000 accounts using 500GB of data and the next bucket had 10 groups of 97 accounts (970 total) using 450GB of data...I'd call it good.
So far I've not come up with an algorithm that will do this. In my mind I'm thinking of something like this perhaps?
PASS 1
Bucket 1: Group with largest data, 60 users.
Bucket 2: Next largest data group, 37 users.
Bucket 3: Next largest data group, 72 users.
Bucket 4: etc....
PASS 2
Bucket 1: Add a group with small amount of data, but more users than average.
# There's probably a ratio I can calculate to figure this out...divide users/datavmaybe?
Bucket 2: Find a "small data" group where sum of users in Bucket 1 ~= sum of users in Bucket 2
# But then there's no guarantee that the data usages will be close enough
Bucket 3: etc...
PASS 3
Bucket 1: Now what? Back to next largest data group?
I still think there's a better way to figure this out but it's not coming to me. If anyone has any thoughts I'm open to suggestions.
Matt
Solution 1.1 - Brute Force Update
Well....here's an update to the first attempt. This is still not a "knapsack-problem" solution. Just brute forcing the data so the accounts balance across buckets. This time I added some logic so that if a bucket has a higher full percentage of accounts vs. data...it will find the largest group (by data) that fits best based on number of accounts. I get a lot better distribution of data now vs. my first attempt (see the edit history if you want to look at the first attempt).
Right now I load each bucket in sequence, filling bucket one, then bucket two, etc... I think if I was to modify the code so that I filled them simultaneously (or nearly so) I'd get a better data balance.
e.g. 1st department into bucket 1, 2nd department into bucket 2, etc...until all buckets have one department... Then start back with bucket 1 again.
dept_arr_sorted_by_acct = dept_hsh.sort_by {|key, value| value[0]}
ap "MAX ACCTS: #{max_accts} AVG ACCTS: #{avg_accts}"
ap "MAX SIZE: #{max_size} AVG SIZE: #{avg_data}"
# puts dept_arr_sorted_by_acct
# exit
bucket_arr = Array.new
used_hsh = Hash.new
server_names.each do |s|
bucket_hsh = Hash.new
this_accts=0
this_data=0
my_key=""
my_val=[]
accts=0
data=0
accts_space_pct_used = 0
data_space_pct_used = 0
while this_accts < avg_accts
if accts_space_pct_used <= data_space_pct_used
# This loop runs if the % used of accts is less than % used of data
dept_arr_sorted_by_acct.each do |val|
# Sorted by num accts - ascending. Loop until we find the last entry in the array that has <= accts than what we need
next if used_hsh.has_key?(val[0])
#do nothing
if val[1][0] <= avg_accts-this_accts
my_key = val[0]
my_val = val[1]
accts = val[1][0]
data = val[1][1]
end
end
else
# This loop runs if the % used of data is less than % used of accts
dept_arr_sorted_by_data = dept_arr_sorted_by_acct.sort { |a,b| b[1][1] <=> a[1][1] }
dept_arr_sorted_by_data.each do |val|
# Sorted by size - descending. Find the first (largest data) entry where accts <= what we need
next if used_hsh.has_key?(val[0])
# do nothing
if val[1][0] <= avg_accts-this_accts
my_key = val[0]
my_val = val[1]
accts = val[1][0]
data = val[1][1]
break
end
end
end
used_hsh[my_key] = my_val
bucket_hsh[my_key] = my_val
this_accts = this_accts + accts
this_data = this_data + data
accts_space_pct_used = this_accts.to_f / avg_accts * 100
data_space_pct_used = this_data.to_f / avg_data * 100
end
bucket_arr << [this_accts, this_data, bucket_hsh]
end
x=0
while x < bucket_arr.size do
th = bucket_arr[x][2]
list_of_depts = []
th.each_key do |key|
list_of_depts << key
end
ap "Bucket #{x}: #{bucket_arr[x][0]} accounts :: #{bucket_arr[x][1]} data :: #{list_of_depts.size} departments"
#ap list_of_depts
x = x+1
end
...and the results...
"MAX ACCTS: 2279 AVG ACCTS: 379"
"MAX SIZE: 1693315 AVG SIZE: 282219"
"Bucket 0: 379 accounts :: 251670 data :: 7 departments"
"Bucket 1: 379 accounts :: 286747 data :: 10 departments"
"Bucket 2: 379 accounts :: 278226 data :: 14 departments"
"Bucket 3: 379 accounts :: 281292 data :: 19 departments"
"Bucket 4: 379 accounts :: 293777 data :: 28 departments"
"Bucket 5: 379 accounts :: 298675 data :: 78 departments"
(379 * 6 <> 2279) I still need to figure out how to account for when the MAX_ACCTS are not evenly divisible by the number of buckets. I tried adding a 1% pad to the AVG_ACCTS value, which in this case means the average would be 383 I think, but then all the buckets say they have 383 accounts in them...which can't be true because then there are more accounts in the buckets than MAX_ACCTS. I've got a mistake in the code somewhere that I haven't found yet.
This is an example of the knapsack problem. There are a few solutions, but it's a really tricky problem and it's better to research a good solution than to try and make your own.

Resources