Ruby regex string into key value pairs - ruby

I have a String like this:
price<=>656000<br>bathrooms<=>1<br>bedrooms<=>3<br>pets<=>1<br>surface<=>60<br>brokerfree<=>1
model<=>opel/corsa<br>mileage<=>67000<br>vinnumber<=>unknown<br>price<=>145000<br>year<=>2010<br>condition<=>2<br>transmission<=>unknown<br>cartype<=>1
I want a Hash:
:model => 'opel/corsa'
etc etc... the string is variable so this is also valid:
year<=>2015<br>condition<=>1<br>price<=>2100mileage<=>22000<br>price<=>120000<br>year<=>2012<br>condition<=>2
or this
price<=>656000<br>bathrooms<=>1<br>bedrooms<=>3<br>pets<=>1<br>surface<=>60<br>brokerfree<=>1
model<=>opel/corsa<br>mileage<=>67000<br>vinnumber<=>unknown<br>price<=>145000<br>year<=>2010<br>condition<=>2<br>transmission<=>unknown<br>cartype<=>1

You don't need a regex. You can use plain ruby methods.
array = string.split('<br>')
hash = Hash[array.map {|el| el.split('<=>') }]

You can also use .to_h method of Array, see the following example.
string = "model<=>opel/corsa<br>mileage<=>67000<br>vinnumber<=>unknown<br>price<=>145000<br>year<=>2010<br>condition<=>2<br>transmission<=>unknown<br>cartype<=>1"
hash = string.split('<br>').map{|a| a.split('<=>')}.to_h
## OUTPUT
{"model"=>"opel/corsa", "mileage"=>"67000", "vinnumber"=>"unknown", "price"=>"145000", "year"=>"2010", "condition"=>"2", "transmission"=>"unknown", "cartype"=>"1"}

If str is your string, you can construct your hash as follows:
Hash[*str.split(/<=>|<br>/)]
#=> {"price"=>"145000", "bathrooms"=>"1", "bedrooms"=>"3", "pets"=>"1",
# "surface"=>"60", "brokerfree"=>"1", "model"=>"opel/corsa",
# "mileage"=>"67000", "vinnumber"=>"unknown", "year"=>"2010",
# "condition"=>"2", "transmission"=>"unknown", "cartype"=>"1"}
A second example:
str = "year<=>2015<br>condition<=>1<br>price<=>2100<br>mileage<=>22000"+
"<br>price<=>120000<br>year<=>2012<br>condition<=>2"
Hash[*str.split(/<=>|<br>/)]
#=> {"year"=>"2012", "condition"=>"2", "price"=>"120000", "mileage"=>"22000"}

Related

Ruby regex method

I need to get the expected output in ruby by using any method like scan or match.
Input string:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1"
"http://test.com?t&r12=1&r124=1"
Expected:
r12=1,r122=1, r1=1, r124=1
r12=1,r124=1
How can I get the expected output using regex?
Use regex /r\d+=\d+/:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/r\d+=\d+/)
# => ["r12=1", "r122=1", "r1=1", "r124=1"]
"http://test.com?t&r12=1&r124=1".scan(/r\d+=\d+/)
# => ["r12=1", "r124=1"]
You can use join to get a string output. Here:
"http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/r\d+=\d+/).join(',')
# => "r12=1,r122=1,r1=1,r124=1"
Update
If the URL contains other parameters that may include r in end, the regex can be made stricter:
a = []
"http://test.com?r1=2&r12=1&r122=1&r1=1&r124=1&ar1=2&tr2=3&xy4=5".scan(/(&|\?)(r+\d+=\d+)/) {|x,y| a << y}
a.join(',')
# => "r12=1,r122=1,r1=1,r124=1"
While input strings are urls with queries, I would safeguard myself from the false positives:
input = "http://test.com?t&r12=1&r122=1&r1=1&r124=1"
query_params = input.split('?').last.split('&')
#β‡’ ["t", "r12=1", "r122=1", "r1=1", "r124=1"]
r_params = query_params.select { |e| e =~ /\Ar\d+=\d+/ }
#β‡’ ["r12=1", "r122=1", "r1=1", "r124=1"]
r_params.join(',')
#β‡’ "r12=1,r122=1,r1=1,r124=1"
It’s safer than just scan the original input for any regexp.
If you really need to do it with regex correctly, you'll need to use a regex like this:
puts "http://test.com?t&r12=1&r122=1&r1=1&r124=1".scan(/(?:http.*?\?t|(?<!^)\G)\&*(\br\d*=\d*)(?=.*$)/i).join(',')
puts "http://test.com?t&r12=1&r124=1".scan(/(?:http.*?\?t|(?<!^)\G)\&*(\br\d*=\d*)(?=.*$)/i).join(',')
Sample program output:
r12=1,r122=1,r1=1,r124=1
r12=1,r124=1

Check if a string includes any of the keys in a hash and return the value of the key it contains

I have a hash with multiple keys and a string which contains either none or one of the keys in the hash.
h = {"k1"=>"v1", "k2"=>"v2", "k3"=>"v3"}
s = "this is an example string that might occur with a key somewhere in the string k1(with special characters like (^&*$##!^&&*))"
What would be the best way to check if s contains any of the keys in h and if it does, return the value of the key that it contains?
For instance, for the above examples of h and s, the output should be v1.
Edit: Only the string would be user defined. The hash will always be the same.
I find this way readable:
hash_key_in_s = s[Regexp.union(h.keys)]
p h[hash_key_in_s] #=> "v1"
Or in one line:
p h.fetch s[Regexp.union(h.keys)] #=> "v1"
And here is a version not using regexp:
p h.fetch( h.keys.find{|key|s[key]} ) #=> "v1"
create a regex out of Hash h keys and match in string:
h[s.match(/#{h.keys.join('|')}/).to_s]
# => "v1"
Or as Amadan suggested using Regexp#escape for safety:
h[s.match(/#{h.keys.map(&Regexp.method(:escape)).join('|')}/).to_s]
# => "v1"
If String s was evenly spaced, we could have done something like this too:
s = "this is an example string that might occur with a key somewhere in the string k1 (with special characters like (^&*$\##!^&&*))"
h[(s.split & h.keys).first]
# => "v1"

keys of a hash that loads from a json string

We know that Ruby has a feature of symbol, typically a symbol is used as a hash key that save space vs a string object. Say:
myhash[:mykey] = "myvalue"
But if I load a hash from json string, say:
str = '{"mykey": "myvalue"}'
myhash = JSON.parse(str)
Then I must use string key to access the hash:
puts myhash["mykey"] # myvalue
Is this reasonable? Why JSON.parse just put symbol for hash keys?
Returning keys as strings is JSON default behavior. You can override by providing additional symbolize_names argument.
str = '{"mykey": "myvalue"}'
JSON.parse(str)
#=> {"mykey"=>"myvalue"}
JSON.parse(str, {:symbolize_names => true})
#=> {:mykey=>"myvalue"}
As #Matt said, in his comment, if the key happens to have whitespace (eg: my key ), it will key it as- :"my key".

Regex with named capture groups getting all matches in Ruby

I have a string:
s="123--abc,123--abc,123--abc"
I tried using Ruby 1.9's new feature "named groups" to fetch all named group info:
/(?<number>\d*)--(?<chars>\s*)/
Is there an API like Python's findall which returns a matchdata collection? In this case I need to return two matches, because 123 and abc repeat twice. Each match data contains of detail of each named capture info so I can use m['number'] to get the match value.
Named captures are suitable only for one matching result.
Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Chiming in super-late, but here's a simple way of replicating String#scan but getting the matchdata instead:
matches = []
foo.scan(regex){ matches << $~ }
matches now contains the MatchData objects that correspond to scanning the string.
You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.
class String
def scan2(regexp)
names = regexp.names
scan(regexp).collect do |match|
Hash[names.zip(match)]
end
end
end
Usage:
>> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ddd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
=> [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
#Nakilon is correct showing scan with a regex, however you don't even need to venture into regex land if you don't want to:
s = "123--abc,123--abc,123--abc"
s.split(',')
#=> ["123--abc", "123--abc", "123--abc"]
s.split(',').inject([]) { |a,s| a << s.split('--'); a }
#=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
This returns an array of arrays, which is convenient if you have multiple occurrences and need to see/process them all.
s.split(',').inject({}) { |h,s| n,v = s.split('--'); h[n] = v; h }
#=> {"123"=>"abc"}
This returns a hash, which, because the elements have the same key, has only the unique key value. This is good when you have a bunch of duplicate keys but want the unique ones. Its downside occurs if you need the unique values associated with the keys, but that appears to be a different question.
If using ruby >=1.9 and the named captures, you could:
class String
def scan2(regexp2_str, placeholders = {})
return regexp2_str.to_re(placeholders).match(self)
end
def to_re(placeholders = {})
re2 = self.dup
separator = placeholders.delete(:SEPARATOR) || '' #Returns and removes separator if :SEPARATOR is set.
#Search for the pattern placeholders and replace them with the regex
placeholders.each do |placeholder, regex|
re2.sub!(separator + placeholder.to_s + separator, "(?<#{placeholder}>#{regex})")
end
return Regexp.new(re2, Regexp::MULTILINE) #Returns regex using named captures.
end
end
Usage (ruby >=1.9):
> "1234:Kalle".scan2("num4:name", num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
or
> re="num4:name".to_re(num4:'\d{4}', name:'\w+')
=> /(?<num4>\d{4}):(?<name>\w+)/m
> m=re.match("1234:Kalle")
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
> m[:num4]
=> "1234"
> m[:name]
=> "Kalle"
Using the separator option:
> "1234:Kalle".scan2("#num4#:#name#", SEPARATOR:'#', num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.
class String
# This method will return an array of MatchData's rather than the
# array of strings returned by the vanilla `scan`.
def match_all(regex)
match_str = self
match_datas = []
while match_str.length > 0 do
md = match_str.match(regex)
break unless md
match_datas << md
match_str = md.post_match
end
return match_datas
end
end
Running your sample data in the REPL results in the following:
> "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
=> [#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">]
You may also find my test code useful:
describe String do
describe :match_all do
it "it works like scan, but uses MatchData objects instead of arrays and strings" do
mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
mds[0][:word].should == "ABC"
mds[0][:number].should == "123"
mds[1][:word].should == "DEF"
mds[1][:number].should == "456"
mds[2][:word].should == "GHI"
mds[2][:number].should == "098"
end
end
end
I really liked #Umut-Utkan's solution, but it didn't quite do what I wanted so I rewrote it a bit (note, the below might not be beautiful code, but it seems to work)
class String
def scan2(regexp)
names = regexp.names
captures = Hash.new
scan(regexp).collect do |match|
nzip = names.zip(match)
nzip.each do |m|
captgrp = m[0].to_sym
captures.add(captgrp, m[1])
end
end
return captures
end
end
Now, if you do
p '12f3g4g5h5h6j7j7j'.scan2(/(?<alpha>[a-zA-Z])(?<digit>[0-9])/)
You get
{:alpha=>["f", "g", "g", "h", "h", "j", "j"], :digit=>["3", "4", "5", "5", "6", "7", "7"]}
(ie. all the alpha characters found in one array, and all the digits found in another array). Depending on your purpose for scanning, this might be useful. Anyway, I love seeing examples of how easy it is to rewrite or extend core Ruby functionality with just a few lines!
A year ago I wanted regular expressions that were more easy to read and named the captures, so I made the following addition to String (should maybe not be there, but it was convenient at the time):
scan2.rb:
class String
#Works as scan but stores the result in a hash indexed by variable/constant names (regexp PLACEHOLDERS) within parantheses.
#Example: Given the (constant) strings BTF, RCVR and SNDR and the regexp /#BTF# (#RCVR#) (#SNDR#)/
#the matches will be returned in a hash like: match[:RCVR] = <the match> and match[:SNDR] = <the match>
#Note: The #STRING_VARIABLE_OR_CONST# syntax has to be used. All occurences of #STRING# will work as #{STRING}
#but is needed for the method to see the names to be used as indices.
def scan2(regexp2_str, mark='#')
regexp = regexp2_str.to_re(mark) #Evaluates the strings. Note: Must be reachable from here!
hash_indices_array = regexp2_str.scan(/\(#{mark}(.*?)#{mark}\)/).flatten #Look for string variable names within (#VAR#) or # replaced by <mark>
match_array = self.scan(regexp)
#Save matches in hash indexed by string variable names:
match_hash = Hash.new
match_array.flatten.each_with_index do |m, i|
match_hash[hash_indices_array[i].to_sym] = m
end
return match_hash
end
def to_re(mark='#')
re = /#{mark}(.*?)#{mark}/
return Regexp.new(self.gsub(re){eval $1}, Regexp::MULTILINE) #Evaluates the strings, creates RE. Note: Variables must be reachable from here!
end
end
Example usage (irb1.9):
> load 'scan2.rb'
> AREA = '\d+'
> PHONE = '\d+'
> NAME = '\w+'
> "1234-567890 Glenn".scan2('(#AREA#)-(#PHONE#) (#NAME#)')
=> {:AREA=>"1234", :PHONE=>"567890", :NAME=>"Glenn"}
Notes:
Of course it would have been more elegant to put the patterns (e.g. AREA, PHONE...) in a hash and add this hash with patterns to the arguments of scan2.
Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:
class ::Regexp
def match_all(str)
matches = []
str.scan(self) { matches << $~ }
matches
end
end
which can be used as /(?<letter>\w)/.match_all('word'), and returns:
[#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]
This relies on, as others have said, the use of $~ in the scan block for the match data.
I like the match_all given by John, but I think it has an error.
The line:
match_datas << md
works if there are no captures () in the regex.
This code gives the whole line up to and including the pattern matched/captured by the regex. (The [0] part of MatchData) If the regex has capture (), then this result is probably not what the user (me) wants in the eventual output.
I think in the case where there are captures () in regex, the correct code should be:
match_datas << md[1]
The eventual output of match_datas will be an array of pattern capture matches starting from match_datas[0]. This is not quite what may be expected if a normal MatchData is wanted which includes a match_datas[0] value which is the whole matched substring followed by match_datas[1], match_datas[[2],.. which are the captures (if any) in the regex pattern.
Things are complex - which may be why match_all was not included in native MatchData.

Remove a character at an index position in Ruby

Basically what the question says. How can I delete a character at a given index position in a string? The String class doesn't seem to have any methods to do this.
If I have a string "HELLO" I want the output to be this
["ELLO", "HLLO", "HELO", "HELO", "HELL"]
I do that using
d = Array.new(c.length){|i| c.slice(0, i)+c.slice(i+1, c.length)}
I dont know if using slice! will work here, because it will modify the original string, right?
Won't Str.slice! do it? From ruby-doc.org:
str.slice!(fixnum) => fixnum or nil [...]
Deletes the specified portion from str, and returns the portion deleted.
If you're using Ruby 1.8, you can use delete_at (mixed in from Enumerable), otherwise in 1.9 you can use slice!.
Example:
mystring = "hello"
mystring.slice!(1) # mystring is now "hllo"
# now do something with mystring
$ cat m.rb
class String
def maulin! n
slice! n
self
end
def maulin n
dup.maulin! n
end
end
$ irb
>> require 'm'
=> true
>> s = 'hello'
=> "hello"
>> s.maulin(2)
=> "helo"
>> s
=> "hello"
>> s.maulin!(1)
=> "hllo"
>> s
=> "hllo"
To avoid needing to monkey patch String you can make use of tap:
"abc".tap {|s| s.slice!(2) }
=> "ab"
If you need to leave your original string unaltered, make use of dup, eg. abc.dup.tap.
I did something like this
c.slice(0, i)+c.slice(i+1, c.length)
Where c is the string and i is the index position I want to delete. Is there a better way?

Resources