ruby string splitting problem - ruby

i have this string:
"asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
i want to get the value after between the ACK and the & symbol, the value between the ACK and the & symbol can be changed...
thanks
i want the solution in ruby.

require "cgi"
query_string = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asmda=asdakmsd"
parsed_query_string = CGI.parse(query_string)
#=> { "asdasda" => ["asdaskdmasd"],
# "asmda" => ["asdasmda", "asdakmsd"],
# "ACK" => ["Success"] }
parsed_query_string["ACK"].first
#=> "Success"
If you also want to reconstruct the query string (especially together with the rest of a URL), I would recommend looking into the addressable gem.
require "addressable/uri"
# Note the leading '?'
query_string = "?asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asmda=asdakmsd"
parsed_uri = Addressable::URI.parse(query_string)
parsed_uri.query_values["ACK"]
#=> "Success"
parsed_uri.query_values = parsed_uri.query_values.merge("ACK" => "Changed")
parsed_uri.to_s
#=> "?ACK=Changed&asdasda=asdaskdmasd&asmda=asdakmsd"
# Note how the order has changed and the duplicate key has been removed due to
# Addressable's built-in normalisation.

"asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"[/ACK=([^&]*)&/]
$1 # => 'Success'

A quick approach:
s = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
s.gsub(/ACK[=\w]+&/,"ACK[changedValue]&")
#=> asdasda=asdaskdmasd&asmda=asdasmda&ACK[changedValue]&asdmas=asdakmsd&asmda=adasda

s = "asdasda=asdaskdmasd&asmda=asdasmda&ACK=Success&asdmas=asdakmsd&asmda=adasda"
m = s.match /.*ACK=(.*?)&/
puts m[1]
and just for fun without regexp:
Hash[s.split("&").map{|p| p.split("=")}]["ACK"]

Related

How to read multiple XML files then output to multiple CSV files with the same XML filenames

I am trying to parse multiple XML files then output them into CSV files to list out the proper rows and columns.
I was able to do so by processing one file at a time by defining the filename, and specifically output them into a defined output file name:
File.open('H:/output/xmloutput.csv','w')
I would like to write into multiple files and make their name the same as the XML filenames without hard coding it. I tried doing it multiple ways but have had no luck so far.
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<record:root>
<record:Dataload_Request>
<record:name>Bob Chuck</record:name>
<record:Address_Data>
<record:Street_Address>123 Main St</record:Street_Address>
<record:Postal_Code>12345</record:Postal_Code>
</record:Address_Data>
<record:Age>45</record:Age>
</record:Dataload_Request>
</record:root>
Here is what I've tried:
require 'nokogiri'
require 'set'
files = ''
input_folder = "H:/input"
output_folder = "H:/output"
if input_folder[input_folder.length-1,1] == '/'
input_folder = input_folder[0,input_folder.length-1]
end
if output_folder[output_folder.length-1,1] != '/'
output_folder = output_folder + '/'
end
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
doc = Nokogiri::XML(file)
record = {} # hashes
keys = Set.new
records = [] # array
csv = ""
doc.traverse do |node|
value = node.text.gsub(/\n +/, '')
if node.name != "text" # skip these nodes: if class isnt text then skip
if value.length > 0 # skip empty nodes
key = node.name.gsub(/wd:/,'').to_sym
if key == :Dataload_Request && !record.empty?
records << record
record = {}
elsif key[/^root$|^document$/]
# neglect these keys
else
key = node.name.gsub(/wd:/,'').to_sym
# in case our value is html instead of text
record[key] = Nokogiri::HTML.parse(value).text
# add to our key set only if not already in the set
keys << key
end
end
end
end
# build our csv
File.open('H:/output/.*csv', 'w') do |file|
file.puts %Q{"#{keys.to_a.join('","')}"}
records.each do |record|
keys.each do |key|
file.write %Q{"#{record[key]}",}
end
file.write "\n"
end
print ''
print 'output files ready!'
print ''
end
I have been getting 'read memory': no implicit conversion of Array into String (TypeError) and other errors.
Here's a quick peer-review of your code, something like you'd get in a corporate environment...
Instead of writing:
input_folder = "H:/input"
input_folder[input_folder.length-1,1] == '/' # => false
Consider doing it using the -1 offset from the end of the string to access the character:
input_folder[-1] # => "t"
That simplifies your logic making it more readable because it's lacking unnecessary visual noise:
input_folder[-1] == '/' # => false
See [] and []= in the String documentation.
This looks like a bug to me:
files = Dir[input_folder + '/*.xml'].sort_by{ |f| File.mtime(f)}
file = File.read(input_folder + '/' + files)
files is an array of filenames. input_folder + '/' + files is appending an array to a string:
foo = ['1', '2'] # => ["1", "2"]
'/parent/' + foo # =>
# ~> -:9:in `+': no implicit conversion of Array into String (TypeError)
# ~> from -:9:in `<main>'
How you want to deal with that is left as an exercise for the programmer.
doc.traverse do |node|
is icky because it sidesteps the power of Nokogiri being able to search for a particular tag using accessors. Very rarely do we need to iterate over a document tag by tag, usually only when we're peeking at its structure and layout. traverse is slower so use it as a very last resort.
length is nice but isn't needed when checking whether a string has content:
value = 'foo'
value.length > 0 # => true
value > '' # => true
value = ''
value.length > 0 # => false
value > '' # => false
Programmers coming from Java like to use the accessors but I like being lazy, probably because of my C and Perl backgrounds.
Be careful with sub and gsub as they don't do what you're thinking they do. Both expect a regular expression, but will take a string which they do a escape on before beginning their scan.
You're passing in a regular expression, which is OK in this case, but it could cause unexpected problems if you don't remember all the rules for pattern matching and that gsub scans until the end of the string:
foo = 'wd:barwd:' # => "wd:barwd:"
key = foo.gsub(/wd:/,'') # => "bar"
In general I recommend people think a couple times before using regular expressions. I've seen some gaping holes opened up in logic written by fairly advanced programmers because they didn't know what the engine was going to do. They're wonderfully powerful, but need to be used surgically, not as a universal solution.
The same thing happens with a string, because gsub doesn't know when to quit:
key = foo.gsub('wd:','') # => "bar"
So, if you're looking to change just the first instance use sub:
key = foo.sub('wd:','') # => "barwd:"
I'd do it a little differently though.
foo = 'wd:bar'
I can check to see what the first three characters are:
foo[0,3] # => "wd:"
Or I can replace them with something else using string indexing:
foo[0,3] = ''
foo # => "bar"
There's more but I think that's enough for now.
You should use Ruby's CSV class. Also, you don't need to do any string matching or regex stuff. Use Nokogiri to target elements. If you know the node names in the XML will be consistent it should be pretty simple. I'm not exactly sure if this is the output you want, but this should get you in the right direction:
require 'nokogiri'
require 'csv'
def xml_to_csv(filename)
xml_str = File.read(filename)
xml_str.gsub!('record:','') # remove the record: namespace
doc = Nokogiri::XML xml_str
csv_filename = filename.gsub('.xml', '.csv')
CSV.open(csv_filename, 'wb' ) do |row|
row << ['name', 'street_address', 'postal_code', 'age']
row << [
doc.xpath('//name').text,
doc.xpath('//Street_Address').text,
doc.xpath('//Postal_Code').text,
doc.xpath('//Age').text,
]
end
end
# iterate over all xml files
Dir.glob('*.xml').each { |filename| xml_to_csv(filename) }

Extract url params in ruby

I would like to extract parameters from url. I have following path pattern:
pattern = "/foo/:foo_id/bar/:bar_id"
And example url:
url = "/foo/1/bar/2"
I would like to get {foo_id: 1, bar_id: 2}. I tried to convert pattern into something like this:
"\/foo\/(?<foo_id>.*)\/bar\/(?<bar_id>.*)"
I failed on first step when I wanted to replace backslash in url:
formatted = pattern.gsub("/", "\/")
Do you know how to fix this gsub? Maybe you know better solution to do this.
EDIT:
It is plain Ruby. I am not using RoR.
As I said above, you only need to escape slashes in a Regexp literal, e.g. /foo\/bar/. When defining a Regexp from a string it's not necessary: Regexp.new("foo/bar") produces the same Regexp as /foo\/bar/.
As to your larger problem, here's how I'd solve it, which I'm guessing is pretty much how you'd been planning to solve it:
PATTERN_PART_MATCH = /:(\w+)/
PATTERN_PART_REPLACE = '(?<\1>.+?)'
def pattern_to_regexp(pattern)
expr = Regexp.escape(pattern) # just in case
.gsub(PATTERN_PART_MATCH, PATTERN_PART_REPLACE)
Regexp.new(expr)
end
pattern = "/foo/:foo_id/bar/:bar_id"
expr = pattern_to_regexp(pattern)
# => /\/foo\/(?<foo_id>.+?)\/bar\/(?<bar_id>.+?)/
str = "/foo/1/bar/2"
expr.match(str)
# => #<MatchData "/foo/1/bar/2" foo_id:"1" bar_id:"2">
Try this:
regex = /\/foo\/(?<foo_id>.*)\/bar\/(?<bar_id>.*)/i
matches = "/foo/1/bar/2".match(regex)
Hash[matches.names.zip(matches[1..-1])]
IRB output:
2.3.1 :032 > regex = /\/foo\/(?<foo_id>.*)\/bar\/(?<bar_id>.*)/i
=> /\/foo\/(?<foo_id>.*)\/bar\/(?<bar_id>.*)/i
2.3.1 :033 > matches = "/foo/1/bar/2".match(regex)
=> #<MatchData "/foo/1/bar/2" foo_id:"1" bar_id:"2">
2.3.1 :034 > Hash[matches.names.zip(matches[1..-1])]
=> {"foo_id"=>"1", "bar_id"=>"2"}
I'd advise reading this article on how Rack parses query params. The above works for your example you gave, but is not extensible for other params.
http://codefol.io/posts/How-Does-Rack-Parse-Query-Params-With-parse-nested-query
This might help you, the foo id and bar id will be dynamic.
require 'json'
#url to scan
url = "/foo/1/bar/2"
#scanning ids from url
id = url.scan(/\d/)
#gsub method to replacing values from url
url_with_id = url.gsub(url, "{foo_id: #{id[0]}, bar_id: #{id[1]}}")
#output
=> "{foo_id: 1, bar_id: 2}"
If you want to change string to hash
url_hash = eval(url_with_id)
=>{:foo_id=>1, :bar_id=>2}

ngram a database file in Ruby

I am trying to ngram my database file. It works when I ngram a parsed string, but I do not know how to do the same for my database file.
I have the following code so far:
(hopefully I am in the right track)
require 'ngram'
require 'sqlite3'
ngram = NGram.new({
:size => 2,
:word_separator => " ",
:padchar => "_"
})
p ngram.parse('something')
# => ["__", "_t", "te", "es", "st", "t_", "__"]
p ngram.parse('test phrase')
db = SQLite3::Database.new("sample.db") #opens db
#ngram sample.db
Help is very much appreciated!
From the github code of ngram gem's parse method:
def parse(phrase)
words = phrase.split(#separator)
if words.length == 1
process(phrase)
else
words.map { |w| process(w) }
end
end
So, it's expecting a string object so that it can call String#split on it. That's why it works with your first example where you pass a string as an argument to the ngram.parse method.
I am not exactly sure what you want to accomplish here, but as long as you pass a string to the ngram.parse method, it would work. Or, at least, pass an argument that responds to the split method.

How can I parse out elements in a "< tag >"?

I have a string:
string = <RECALL>first_name</RECALL>, I'd like to send you something. It'll help you learn more about both me and yourself. What is your email?"
I want to pull out the value "first_name" of the tag <RECALL>.
I used gem crack, but it doesn't behave as I expected:
parsed = Crack::XML.parse(string) =>
{"RECALL"=>"first_name, I'd like to send you something. It'll help you learn more about both me and yourself. What is your email?"}
Maybe XML parsing isn't the right way. What is the way so that I could get the following, desired behavior, instead?
{"RECALL"=>"first_name"}
Does not look like valid XML to me. I would just try to use an REGEXP here:
string = "<RECALL>first_name</RECALL>, I'd like to send you something..."
/<RECALL>(.*)<\/RECALL>/.match(string)[1]
#=> "first_name"
Here's two ways you could get the content of the tags:
string = "<RECALL>first_name</RECALL>"
firstname = string[/<RECALL>([^<]+)</, 1]
firstname # => "first_name"
Parsing strings containing tags gets tricky. It's doable for simple content, but once tags are nested or additional < or > show up, it gets a lot harder.
You can use a trick using an XML parser:
require 'nokogiri'
string = "foo <RECALL>first_name</RECALL> bar"
doc = Nokogiri::XML::DocumentFragment.parse(string)
doc.at('RECALL').text # => "first_name"
Note that I'm using Nokogiri::XML::DocumentFragment.parse. That tells Nokogiri to only expect a partial XML document and relaxes a lot of its normally strict XML rules. Then I can tell the parser to find the <RECALL> tag and grab its contained text.
...wondering if there's a way to extract it (I use Crack to extract it, but it only works if the <tag> is at the end of the string.
This pattern matches mid-string:
str = "foo <RECALL>first_name</RECALL> bar"
str[%r!<RECALL>([^<]+)</RECALL>!, 1] # => "first_name"
This pattern fails if the tag is not at the end of the string:
str[%r!<RECALL>([^<]+)</RECALL>\z!, 1] # => nil
And succeeds if it is at the end of the string:
str = "foo <RECALL>first_name</RECALL>"
str[%r!<RECALL>([^<]+)</RECALL>\z!, 1] # => "first_name"
This is one place where a regexp pattern makes it easier to do something than using a parser.
Using a parser:
require 'nokogiri'
Normally we don't care where a tag occurs in a DOM, but if it's important we can figure out where it is in relation to the other tags. It won't always be this straightforward though:
This returns nil if the tag isn't at the end of the string/DOM:
str = "foo <RECALL>first_name</RECALL> bar"
doc = Nokogiri::XML::DocumentFragment.parse(str)
recall_node = doc.at('RECALL')
recall_node == doc.children.last ? doc.at('RECALL').text : nil # => nil
This returns the text of the node because it is at the end of the DOM:
str = "foo <RECALL>first_name</RECALL>"
doc = Nokogiri::XML::DocumentFragment.parse(str)
recall_node = doc.at('RECALL')
recall_node == doc.children.last ? doc.at('RECALL').text : nil # => "first_name"
This works because every node in a document has an identifier and we can ask whether the node of interest matches the last node in the DOM:
require 'nokogiri'
doc = Nokogiri::XML::DocumentFragment.parse("<node>first_name</node> text")
# => #(DocumentFragment:0x3ffc89c3d3e8 {
# name = "#document-fragment",
# children = [
# #(Element:0x3ffc89c3cf9c {
# name = "node",
# children = [ #(Text "first_name")]
# }),
# #(Text " text")]
# })
doc.at('node').object_id.to_s(16) # => "3ffc89c3cf9c"
doc.children.last.object_id.to_s(16) # => "3ffc89c3cec0"
doc = Nokogiri::XML::DocumentFragment.parse("<node>first_name</node>")
# => #(DocumentFragment:0x3ffc89c345cc {
# name = "#document-fragment",
# children = [
# #(Element:0x3ffc89c342c0 {
# name = "node",
# children = [ #(Text "first_name")]
# })]
# })
doc.at('node').object_id.to_s(16) # => "3ffc89c342c0"
doc.children.last.object_id.to_s(16) # => "3ffc89c342c0"

In Ruby, how do I replace the question mark character in a string?

In Ruby, I have:
require 'uri'
foo = "et tu, brutus?"
bar = URI.encode(foo) # => "et%20tu,%20brutus?"
I'm trying to get bar to equal "et%20tu,%20brutus%3f" ("?" replaced with "%3F") When I try to add this:
bar["?"] = "%3f"
the "?" matches everything, and I get
=> "%3f"
I've tried
bar["\?"]
bar['?']
bar["/[?]"]
bar["/[\?]"]
And a few other things, none of which work.
require 'cgi' and call CGI.escape
There is only one good way to do this right now in Ruby:
require "addressable/uri"
Addressable::URI.encode_component(
"et tu, brutus?",
Addressable::URI::CharacterClasses::PATH
)
# => "et%20tu,%20brutus%3F"
But if you're doing stuff with URIs you should really be using Addressable anyways.
sudo gem install addressable
Here's a sample irb session:
irb(main):001:0> x = "geo?"
=> "geo?"
irb(main):002:0> x.sub!("?","a")
=> "geoa"
irb(main):003:0>
However, sub will only replace the first character. If you want to replace all the question marks in a string, use the gsub method like this:
str.gsub!("?","replacement")
If you know which characters you accept, you can remove those that don't match.
accepted_chars = 'A-z0-9\s,'
foo = "et tu, brutus?"
bar = foo.gsub(/[^#{accepted_chars}]/, '')
URI.escape accepts the optional parameter to tell which characters you want to escape. It overrides defaults so you'll have to call it twice.
> URI.escape URI.escape("et tu, brutus?"), "?"
=> "et%20tu,%20brutus%3F"

Resources