Ruby Regex text parsing - ruby

I am trying to parse a string that contains a bunch of values represented different ways. Here is what I have so far:
acc=acc.scan(/"([achievement|stat]).([\w]+.)*":[0-9]+/).flatten
which works for:
{"stat.crouchOneCm":2392,"stat.craftItem.minecraft.bed":1,"stat.craftItem.minecraft.wooden_pickaxe":1
I get (exactly what I expect):
s
crouchOneCm
2392
s
bed
1
s
wooden_pickaxe
1
but doesnt work for something like this, in addition to the stuff above:
"achievement.exploreAllBiomes":{"value":0,"progress":["Desert","DesertHills","Ocean","Beach","Savanna Plateau M","Savanna","Savanna Plateau","River"]},"stat.craftItem.minecraft.iron_pickaxe":1,"
My goal was something like:
a
exploreAllBiomes
["Desert","DesertHills","Ocean","Beach","Savanna Plateau M","Savanna","Savanna Plateau","River"]
s
iron_pickaxe
1
Anybody have an idea?

Sigh... the example string is not complete, but you're dealing with JSON...
require 'json'
hash = JSON['{"stat.crouchOneCm":2392,"stat.craftItem.minecraft.bed":1,"stat.craftItem.minecraft.wooden_pickaxe":1}']
hash
# => {"stat.crouchOneCm"=>2392,
# "stat.craftItem.minecraft.bed"=>1,
# "stat.craftItem.minecraft.wooden_pickaxe"=>1}

Related

Ruby gsub with string manipulation

I am new to ruby and writing the expression to replace the string between the xml tags by hashing the value inside that.
I did the following to replace with the new password
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,'New \0')
RESULT: <password>New check1</password> (EXPECTED)
My expectation is to get the result like this (Md5 checksum of the value "New check1")
<password>6aaf125b14c97b307c85fc6e681c410e</password>
I tried it in the following ways and none of them was successful (I have included the required libraries "require 'digest'").
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,Digest::MD5.hexdigest('\0'))
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/,Digest::MD5.hexdigest '\0')
puts "<password>check1</password>".gsub(/(?<=password\>)[^\/]+(?=\<\/password)/, "Digest::MD5.hexdigest \0")
Any help on this to achieve the expectation is very much appreciated
This will work:
require 'digest'
line = "<other>stuff</other><password>check1</password><more>more</more>"
line.sub(/<password>(?<pwd>[^<]+)<\/password>/, Digest::SHA2.hexdigest(pwd))
=> "<other>stuff</other>8a859fd2a56cc37285bc3e307ef0d9fc1d2ec054ea3c7d0ec0ff547cbfacf8dd<more>more</more>"
Make sure the input is one line at a time, and you'll probably want sub, not gsub
P.S.: agree with Tom Lord's comment.. if your XML is not gargantuan in size, try to use an XML library to parse it... Ox or Nokogiri perhaps?
Different libraries have different advantages.
This is a variant of Tilo's answer.
require 'digest'
line = "<other>stuff</other><password>check1</password><more>more</more>"
r = /(?<=<password>).+?(?=<\/password>)/
line.sub(r) { |pwd| Digest::SHA2.hexdigest(pwd) }
#=> "<other>stuff</other><password>8a859fd2a56cc37285bc3e307ef0d9f
# c1d2ec054ea3c7d0ec0ff547cbfacf8dd</password><more>more</more>"
(I've displayed the returned string on two lines so make it readable without the need for horizontal scrolling.)
The regular expression reads, "match '<password>' in a positive lookbehind ((?<=...)), followed by any number of characters, lazily ('?'), followed by the string '</password>' in a positive lookahead ((?=...)).

Ruby - Extra punctuation in file when using regex and csv class to write to a file

I'm using regex to grab parameters from an html file.
I've tested the regexp and it seems to be fine- it appears that the csv conversion is what's causing the issue, but I'm not sure.
Here is what I have:
mechanics_file= File.read(filename)
mechanics= mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/)
id_file= File.read(filename)
id=id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/)
puts id.zip(mechanics)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
The puts output looks like this:
2073
Acting
2689
Action / Movement Programming
But the contents of the csv look like this:
"[""2073""]","[""Acting""]"
"[""2689""]","[""Action / Movement Programming""]"
How do I get rid of all of the extra quotes and brackets? Am I doing something wrong in the process of writing to a csv?
This is my first project in ruby so I would appreciate a child-friendly explanation :) Thanks in advance!
String#scan returns an Array of Arrays (bold emphasis mine):
scan(pattern) → array
Both forms iterate through str, matching the pattern (which may be a Regexp or a String). For each match, a result is generated and either added to the result array or passed to the block. If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
a = "cruel world"
# […]
a.scan(/(...)/) #=> [["cru"], ["el "], ["wor"]]
So, id looks like this:
id == [['2073'], ['2689']]
and mechanics looks like this:
mechanics == [['Acting'], ['Action / Movement Programming']]
id.zip(movements) then looks like this:
id.zip(movements) == [[['2073'], ['Acting']], [['2689'], ['Action / Movement Programming']]]
Which means that in your loop, each row looks like this:
row == [['2073'], ['Acting']]
row == [['2689'], ['Action / Movement Programming']]
CSV#<< expects an Array of Strings, or things that can be converted to Strings as an argument. You are passing it an Array of Arrays, which it will happily convert to an Array of Strings for you by calling Array#to_s on each element, and that looks like this:
[['2073'], ['Acting']].map(&:to_s) == [ '["2073"]', '["Acting"]' ]
[['2689'], ['Action / Movement Programming']].map(&:to_s) == [ '["2689"]', '["Action / Movement Programming"]' ]
Lastly, " is the string delimiter in CSV, and needs to be escaped by doubling it, so what actually gets written to the CSV file is this:
"[""2073""]", "[""Acting""]"
"[""2689""]", "[""Action / Movement Programming""]"
The simplest way to correct this, would be to flatten the return values of the scans (and maybe also convert the IDs to Integers, assuming that they are, in fact, Integers):
mechanics_file = File.read(filename)
mechanics = mechanics_file.scan(/(?<=70%">)(.*)(?=<\/td)/).flatten
id_file = File.read(filename)
id = id_file.scan(/(?<="propertyids\[]" value=")(.*)(?=")/).flatten.map(&:to_i)
CSV.open('csvfile.csv', 'w') do |csv|
id.zip(mechanics) { |row| csv << row }
end
Another suggestion would be to forgo the Regexps completely and use an HTML parser to parse the HTML.

Change string to hash with ruby

I have ugly string that looks like this:
"\"New\"=>\"0\""
Which will be the best way to converting it into hash object?
Problem with "\"New\"=>\"0\"" is it does not look like a Hash. So first step should be to manipulate it to look like a Hash:
"{" + a + "}"
# => "{\"New\"=>\"0\"}"
Now once you have a hash looking string you can convert it into Hash like this:
eval "{" + a + "}"
# => {"New"=>"0"}
However there is still one issue, eval is not safe and inadvisable to use. So lets manipulate the string further to make it look json-like and use JSON.parse:
require `json`
JSON.parse ("{" + a + "}").gsub("=>",":")
# => {"New"=>"0"}
How about JSON.parse(string.gsub("=>", ":"))
You can use regex to pull out the key and value. Then create Hash directly
Hash[*"\"New\"=>\"0\"".scan(/".*?"/)]
Hard to nail down the best way if you can't tell us exactly the general format of those strings. You may not even need the regex. eg
Hash[*"\"New\"=>\"0\"".split('"').values_at(1,3)]
Also works for "\"Rocket\"=>\"=>\""

Extract a single line string having "foo: XXXX"

I have a file with one or more key:value lines, and I want to pull a key:value out if key=foo. How can I do this?
I can get as far as this:
if File.exist?('/file_name')
content = open('/file_name').grep(/foo:??/)
I am unsure about the grep portion, and also once I get the content, how do I extract the value?
People like to slurp the files into memory, which, if the file will always be small, is a reasonable solution. However, slurping isn't scalable, and the practice can lead to excessive CPU and I/O waits as content is read.
Instead, because you could have multiple hits in a file, and you're comparing the content line-by-line, read it line-by-line. Line I/O is very fast and avoids the scalability problems. Ruby's File.foreach is the way to go:
File.foreach('path/to/file') do |li|
puts $1 if li[/foo:\s*(\w+)/]
end
Because there are no samples of actual key/value pairs, we're shooting in the dark for valid regex patterns, but this is the basis for how I'd solve the problem.
Try this:
IO.readlines('key_values.txt').find_all{|line| line.match('key1')}
i would recommend to read the file into array and select only lines you need:
regex = /\A\s?key\s?:/
results = File.readlines('file').inject([]) do |f,l|
l =~ regex ? f << "key = %s" % l.sub(regex, '') : f
end
this will detect lines starting with key: and adding them to results like key = value,
where value is the portion going after key:
so if you have a file like this:
key:1
foo
key:2
bar
key:3
you'll get results like this:
key = 1
key = 2
key = 3
makes sense?
value = File.open('/file_name').read.match("key:(.*)").captures[0] rescue nil
File.read('file_name')[/foo: (.*)/, 1]
#=> XXXX

How should I parse a fixed length record file in Ruby?

I was wondering if anyone had any advice on parsing a file with fixed length records in Ruby. The file has several sections, each section has a header, n data elements and a footer. For example (This is total nonsense - but has roughly similar content)
1923 000-230SomeHeader 0303030
209231-231992395 MoreData
293894-329899834 SomeData
298342-323423409 OtherData
3 3423942Footer record 9832422
Headers, Footers and Data rows each begin with a specific number (1,2 & 3) in this example.
I have looked at http://rubyforge.org/projects/file-formatter/ and it looks good - except that the documentation is light and I can't see how to have n data elements.
Cheers,
Dan
There are a number of ways to do this. The unpack method of string could be used to define a pattern of fields as follows :-
"209231-231992395 MoreData".unpack('aa5A1A9a4Z*')
This returns an array as follows :-
["2", "09231", "-", "231992395", " ", "MoreData"]
See the documentation for a description of the pack/unpack format.
Several options exist as usual.
If you want to do it manually I would suggest something like this:
very pseudo-code:
Read file
while lines in file
handle_line(line)
end
def handle_line
type=first_char
parse_line(type)
end
def parse_line
split into elements and do_whatever_to_them
end
Splitting the line into elements of fixed with can be done with for instance unpack()
irb(main):001:0> line="1923 000-230SomeHeader 0303030"
=> "1923 000-230SomeHeader 0303030"
irb(main):002:0* list=line.unpack("A1A5A7a15A10")
=> ["1", "923", "000-230", "SomeHeader ", "0303030"]
irb(main):003:0>
The pattern used for unpack() will vary with field lengths on the different kinds of records and the code will depend on wether you want trailing spaces and such. See unpack reference for details.

Resources