Input: <ArrayOfSMSIncomingMessage xmlns=\"http://sms2.cdyne.com\" xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\"><SMSIncomingMessage><FromPhoneNumber>19176230250</FromPhoneNumber><IncomingMessageID>cf8ef62d-9169-4908-a527-891fca056475</IncomingMessageID><MatchedMessageID>6838594b-288f-4e9a-863c-3ad9f4d501ca</MatchedMessageID><Message>This is a test</Message><ResponseReceiveDate>2013-04-07T17:19:06.953</ResponseReceiveDate><ToPhoneNumber>13146667368</ToPhoneNumber></SMSIncomingMessage><SMSIncomingMessage><FromPhoneNumber>19176230250</FromPhoneNumber><IncomingMessageID>ebf11b38-c176-439a-a2d0-7a2bb35390df</IncomingMessageID><MatchedMessageID>6838594b-288f-4e9a-863c-3ad9f4d501ca</MatchedMessageID><Message>Does it wotk</Message><ResponseReceiveDate>2013-04-07T17:19:17.303</ResponseReceiveDate><ToPhoneNumber>13146667368</ToPhoneNumber></SMSIncomingMessage></ArrayOfSMSIncomingMessage>
Expected Output: [["191760250", "This is a test", "2013-04-07T17:19:06.953", "13146636 8"],["191760250", "Does it wotk", "2013-04-07T17:19:17.303", "131466368"]]
I am a newbie but i can't solve this problem or find an answer. The objective is to parse a text. The problem is that I put the information into an array b and then I put array b into array c. However, what happens is that c[0] becomes equal to c[1] even thought they should have different information. I don't know how to fix this.
data='"<ArrayOfSMSIncomingMessage xmlns=\"http://sms2.cdyne.com\" xmlns:i=\" <FromPhoneNumber>191760250</FromPhoneNumber>'
data=data+'<Message>This is a test</Message><ResponseReceiveDate>2013-04-07T17:19:06.953</ResponseReceiveDate>'
data=data+'<ToPhoneNumber>13146636 8</ToPhoneNumber></SMSIncomingMessage><SMSIncomingMessage><FromPhoneNumber>191760250'
data=data+'</FromPhoneNumber><Message>Does it wotk</Message><ResponseReceiveDate>2013-04-07T17:19:17.303</ResponseRecei'
data=data+'veDate><ToPhoneNumber>131466368</ToPhoneNumber></SMSIncomingMessage></ArrayOfSMSIncomingMessage>'
a=[['<FromPhoneNumber>','</FromPhoneNumber>'],['<Message>','</Message>'],
['<ResponseReceiveDate>','</ResponseReceiveDate>'],['<ToPhoneNumber>','</ToPhoneNumber>']]
b=[]
c=[]
d=true
ii=-1
while data.index(a[0][0])!=nil do
ii+=1
for i in 0..3
print "\ni is #{i} first term: #{a[i][0]} second term #{a[i][1]}\n"
b[i]= data[data.index(a[i][0])+a[i][0].length..data.index(a[i][1])-1]
print "b[i] is #{b[i]}\n"
end
print "b is #{b}\n"
print "c is #{c}\n"
c.push(b)
print "c is #{c}\n"
d=data.slice!(0,data.index('</SMSIncomingMessage>')+5)
print "d is #{d}\n"
print "data is #{data}\n"
end
I really don't understand what your code is trying to accomplish, but regarding what you say isn't working as you expect, (However, what happens is that c[0] becomes equal to c[1] even thought they should have different information.), the issue is that you are pushing b (which is a reference) onto c, so when you change b, you get the appearance of the contents of c changing.
Change
c.push(b)
to
c.push(b.dup)
if you want what you push onto c to stay the same even after you change b.
You are parsing XML. Don't waste time trying to manipulate strings, because all you'll do is generate fragile code.
Instead, use a real XML parser, which lets you navigate through the structure, and pick what you want.
First, your XML is malformed, but I worked around that by supplying a closing tag, turning it into damaged XML, but not fatally so.
require 'nokogiri'
xml = '<ArrayOfSMSIncomingMessage xmlns="http://sms2.cdyne.com" xmlns:i="">
<SMSIncomingMessage>
<FromPhoneNumber>191760250</FromPhoneNumber>
<Message>This is a test</Message>
<ResponseReceiveDate>2013-04-07T17:19:06.953</ResponseReceiveDate>
<ToPhoneNumber>131466368</ToPhoneNumber>
</SMSIncomingMessage>
<SMSIncomingMessage>
<FromPhoneNumber>191760250</FromPhoneNumber>
<Message>Does it wotk</Message>
<ResponseReceiveDate>2013-04-07T17:19:17.303</ResponseReceiveDate>
<ToPhoneNumber>131466368</ToPhoneNumber>
</SMSIncomingMessage>
</ArrayOfSMSIncomingMessage>'
doc = Nokogiri::XML(xml)
pp doc.search('SMSIncomingMessage').map{ |incoming_msg|
%w[FromPhoneNumber Message ResponseReceiveDate ToPhoneNumber].map{ |n| incoming_msg.at(n).text }
}
Which outputs:
[["191760250", "This is a test", "2013-04-07T17:19:06.953", "131466368"],
["191760250", "Does it wotk", "2013-04-07T17:19:17.303", "131466368"]]
Related
I've read through tons of questions and solutions to determine whether this was already answered elsewhere, but it seems that none of the things I found were exactly what I was trying to get at.
I have an XML document that has hundreds of entries of text, and each entry also lists a URL. Each URL is a string (within tags), ending with a unique 4-digit number. The XML file is basically formatted like so:
<entry>
[other content]
<id>http://www.URL.com/blahblahblah-1234</id>
[other content]
</entry>
I want to essentially single out only the URLs that have a particular number at the end, out of a list of numbers. I put all of the numbers in an array, with the values set as strings ( numbers = ["1234", "8649", etc.]). I've been using nokogiri for other parts of my script, and when I am only looking for a particular string, I just use include?, which works perfectly. However, I'm not sure how to automate this when I have hundreds of strings within the "numbers" array. This is essentially what I logistically need to happen:
id = nokodoc.css("id")
id.each { |id|
hyperlink = id.text
if hyperlink.include?(numbers)
puts "yes!"
else
puts "no :("
end
}
Obviously this doesn't work, because include? expects a string, whereas I'm passing an entire array. (For instance, if I do include?(numbers[0]), it works.) I've tried this with any? but it doesn't seem to work in this case.
Is there a Ruby method that I'm not aware of, that can tell me whether any of the values within an array is present in any of the nodes that I'm looping through? Let me know if any of this needs to be clarified—phrasing the proper question is often the hardest part!
Edit: As a sidenote, ultimately I'd like to remove all entries that correspond to any links that do not end with one of the numbers in the array, i.e.
if hyperlink.include? (any number from the array)
puts "this one is good"
else
id.parent.remove
So I would somehow need the final product to remain parsable with nokogiri.
Thank you so much in advance, for any and all insight!
You can do this:
numbers = ['1234', '8649', ..]
urls = nokodoc.css('id').map(&:text)
urls = urls.select { |url| numbers.any? { |n| url.include? n } }
But it's not efficient. If you know the pattern -- extract the number, and then check if it's in the array. For example, if it's always last 4 digits:
numbers = ['1234', '8649', ..]
urls = nokodoc.css('id').map(&:text)
urls = urls.select { |url| numbers.include? url[-4..-1] }
UPDATE
For the change in the question:
numbers = ['1234', '8649', ..]
nodes = nokodoc.css('id')
nodes.each do |node|
url = node.text
if numbers.any? { |n| url.include? n }
puts 'this one is good'
else
node.parent.remove
end
end
For a project that I am working on for school, one of the parts of the project asks us to take a collection of all the Federalist papers and run it through a program that essentially splits up the text and writes new files (per different Federalist paper).
The logic I decided to go with is to run a search, and every time the search is positive for "Federalist No." it would save into a new file everything until the next "Federalist No".
This is the algorithm that I have so far:
file_name = "Federalist"
section_number = "1"
new_text = File.open(file_name + section_number, 'w')
i = 0
n= 1
while i < l.length
if (l[i]!= "federalist") and (l[i+1]!= "No")
new_text.puts l[i]
i = i + i
else
new_text.close
section_number = (section_number.to_i +1).to_s
new_text = File.open(file_name + section_number, "w")
new_text.puts(l[i])
new_text.puts(l[i+1])
i=i+2
end
end
After debugging the code as much as I could (I am a beginner at Ruby), the problem that I run into now is that because the while function always holds true, it never proceeds to the else command.
In terms of going about this in a different way, my TA suggested the following:
Put the entire text in one string by looping through the array(l) and adding each line to the one big string each time.
Split the string using the split method and the key word "FEDERALIST No." This will create an array with each element being one section of the text:
arrayName = bigString.split("FEDERALIST No.")
You can then loop through this new array to create files for each element using a similar method you use in your program.
But as simple as it may sound, I'm having an extremely difficult time putting even that code together.
i = i + i
i starts at 0, and 0 gets added to it, which gives 0, which will always be less than l, whatever that value is/means.
Since this is a school assignment, I hesitate to give you a straight-up answer. That's really not what SO is for, and I'm glad that you haven't solicited a full solution either.
So I'll direct you to some useful methods in Ruby instead that could help.
In Array: .join, .each or .map
In String: .split
Fyi, your TA's suggestion is far simpler than the algorithm you've decided to embark on... although technically, it is not wrong. Merely more complex.
Consider the following Ruby code:
a = ["x"] * 3 # or a = Array.new(3, "x")
a[0].insert(0, "a")
a.each {|i| puts i}
I would expect the output to be ax, x, x (on new lines of course). However, with Ruby 1.9.1 the output is ax, ax, ax. What's going on? I've narrowed the problem down to the way the array a is defined. If I explicitly write out
a = ["x", "x", "x"]
then the code works as expected, but either version in the original code gives me this unexpected behaviour. It appears that the */initializer means the copies are actually references to the same copy of the string "x". However, if instead of the insert command I write
a[0] = "a" + a[0]
Then I get the desired output. Is this a bug, or is there some feature at work which I'm not understanding?
The documentation to Array.new(size=0, obj=nil):
... it is created with size copies of obj (that is, size references to the same obj).
and Array * int:
... returns a new array built by concatenating the int copies of self
So in both of the forms you're surprised by, you end up with three references to the same "x" object, just as you figured out. I'd say you might argue about the design decision, but it's a documented intentional behavior, not a bug.
The best way I know to get the behavior you want without manually writing the array literal (["x", "x", "x"]) is
a = Array.new(3) {"x"}
Or course, with just three elements, it doesn't much matter, but with anything much bigger, this form comes in handy.
In short, although "x" is just a literal, it is an object. You use ["x'] * 3 so a is containing 3 same object. You insert 'a' to one of them, they will be all changed.
I believe that I may be missing something here, so please bear with me as I explain two scenarios in hopes to reconcile my misunderstanding:
My end goal is to create a dataset that's acceptable by Highcharts via lazy_high_charts, however in this quest, I'm finding that it is rather particular about the format of data that it receives.
A) I have found that when data is formatted like this going into it, it draws the points just fine:
[0.0000001240,0.0000000267,0.0000000722, ..., 0.0000000512]
I'm able to generate an array like this simply with:
array = Array.new
data.each do |row|
array.push row[:datapoint1].to_f
end
B) Yet, if I attempt to use the map function, I end up with a result like and Highcharts fails to render this data:
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
From code like:
array = data.map{|row| [(row.datapoint1.to_f)] }
Is there a way to coax the map function to produce results in B that more akin to the scenario A resultant data structure?
This get's more involved as I have to also add datetime into this, however that's another topic and I just want to understand this first and what can be done to perhaps further control where I'm going.
Ultimately, EVEN SCENARIO B SHOULD WORK according to the data in the example here: http://www.highcharts.com/demo/spline-irregular-time (press the "View options" button at bottom)
Heck, I'll send you a sucker in the mail if you can fill me in on that part! ;)
You can fix arrays like this
[[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], ..., [3.79e-09]]
that have nested arrays inside them by using the flatten method on the array.
But you should be able to avoid generating nested arrays in the first place. Just remove the square brackets from your map line:
array = data.map{|row| row.datapoint1.to_f }
Code
a = [[6.67e-09],[4.39e-09],[2.1e-09],[2.52e-09], [3.79e-09]]
b = a.flatten.map{|el| "%.10f" % el }
puts b.inspect
Output
["0.0000000067", "0.0000000044", "0.0000000021", "0.0000000025", "0.0000000038"]
Unless I, too, am missing something, your problem is that you're returning a single-element array from your block (thereby creating an array of arrays) instead of just the value. This should do you:
array = data.map {|row| row.datapoint1.to_f }
# => [ 6.67e-09, 4.39e-09, 2.1e-09, 2.52e-09, ..., 3.79e-09 ]
As far as I understood, matrices are very inflexible to work with. Therefor, I'm trying to get an array of vectors do deal with. My needs are: to be able to add vectors and make arithmetical operations on their components. Writing the code below,
require 'matrix'
x = Matrix.rows( IO.readlines("input.txt").each {|line| line.split} )
puts x.row_vectors
ruby falls into an exception. Why?
matrix.rb:1265:in `to_s': undefined method `join' for "1.2357 2.1742 -5.4834 -2.0735":String (NoMethodError)
OK then, I've calmed down and tried another approach. I wrote:
a = Array.[]( IO.readlines("input.txt").each {|line| Vector.[](line.split) } )
But the only way I can access my vectors inside an array is adressing the second index:
puts a[0][0]
This means, that when I would like to access desired scalar inside a vector, I'll will be forced to use the third index, like:
puts a[0][0][1]
So, the second question is - where the hell that second index comes from? How to get rid of it? Am I missing something when reading data into array?
I can't reproduce your first problem. Extracting what looks like input.txt, I can execute that first expression without an exception.
As to the second question, your expression seems kind of complex. How about:
b = IO.readlines("input.txt").map { |x| x.split(' ') }
This will get you a "2D" array of arrays, and you will need only two subscripts. (As to your question about where did the extra array come from, you got one from the Array constructor, one from IO.readlines, and one from the Vector constructor . . . I think.)
Or maybe:
result = []
IO.foreach('input.txt') { |ln| result << ln.split(' ') }