How do I make these string substitutions using hash keys in Ruby? - ruby

I have a bunch of JSON files, processed in both Python and Ruby, that look something like this:
{
"KEY1": "foo",
"KEY2": "bar",
"URL": "https://{KEY2}.com/{KEY1}",
"IMPORTANT_THING": "repos/{KEY1}",
"NOTE": "This thing is {KEY1}{KEY2}ed",
"PYTHON_ONLY_THING": "{}/test/{}.py"
}
Note that the order that the keys will show up is not consistent, and I'd rather not change the JSON.
Here's my test code showing what I've tried so far:
my_config = {"KEY1"=>"foo",
"KEY2"=>"bar",
"URL"=>"https://{KEY2}.com/{KEY1}",
"IMPORTANT_THING"=>"repos/{KEY1}",
"NOTE"=>"This thing is {KEY1}{KEY2}ed",
"PYTHON_ONLY_THING"=>"{}/test/{}.py"}
my_config.each_key do |key|
# Braindead, hard-coded solution that works:
# my_config[key].gsub!("{KEY1}", my_config["KEY1"])
# my_config[key].gsub!("{KEY2}", my_config["KEY2"])
# More flexible (if it would work):
# my_config[key].gsub!(/{.*}/, my_config['\0'.slice(1,-2)])
my_config[key].gsub!(/{.*}/) {|s| my_config[s.slice(1,-2)]}
end
puts my_config
I'm using the braindead solution for now, which produces the expected output:
{"KEY1"=>"foo", "KEY2"=>"bar", "URL"=>"https://bar.com/foo", "IMPORTANT_THING"=>"repos/foo", "NOTE"=>"This thing is foobared", "PYTHON_ONLY_THING"=>"{}/test/{}.py"}
But I want to make it more flexible and maintainable. The first "better" solution throws an error apparently because slice operates on '\0' itself and not the match, plus I'm not sure it would match more than once. The currently uncommented solution doesn't work because the second part seems to operate on one letter at a time rather than each match like I expected, so it just removes the stuff in curly braces. Worse, it removes everything between the outer braces in the PYTHON_ONLY_THING, which is no good.
I figure I need to change both my regex and Ruby code if this is going to work, but I'm not sure where to look for more help. Or perhaps gsub isn't the right tool for this job. Any ideas?
I am using Ruby 2.3.7 on Linux x86_64.

Use String#gsub with an initial hash for replacements:
my_config.map do |k, v|
[
k,
v.gsub(/(?<={)[^}]+(?=})/, my_config).gsub(/{(?!})|(?<!{)}/, '')
]
end.to_h
#⇒ {"KEY1"=>"foo",
# "KEY2"=>"bar",
# "URL"=>"https://bar.com/foo",
# "IMPORTANT_THING"=>"repos/foo",
# "NOTE"=>"This thing is foobared",
# "PYTHON_ONLY_THING"=>"{}/test/{}.py"}
Starting with Ruby 2.4 (or using Rails) it might be done simpler using Hash#transform_values.
If you dislike the second gsubbing, transform the hash upfront:
my_substs = my_config.map { |k, v| ["{#{k}}", v] }.to_h
my_config.map do |k, v|
[k, v.gsub(/{[^}]+}/, my_substs)]
end.to_h

Here's a possible solution:
my_config = {"KEY1"=>"foo",
"KEY2"=>"bar",
"URL"=>"https://{KEY2}.com/{KEY1}",
"IMPORTANT_THING"=>"repos/{KEY1}",
"NOTE"=>"This thing is {KEY1}{KEY2}ed",
"PYTHON_ONLY_THING"=>"{}/test/{}.py"}
my_config.each_key do |key|
placeholders = my_config[key].scan(/{([^}]+)}/).flatten
placeholders.each do |placeholder|
my_config[key].gsub!("{#{placeholder}}", my_config[placeholder]) if my_config.keys.include?(placeholder)
end
end
puts my_config
By using scan, this will substitute all matches, not just the first match.
Using [[^}]+ in the regex, rather than .*, means you won't "swallow" too much in this part of the match. For example, if the input contains "{FOO} bar {BAZ}", then you want that pattern to only capture FOO and BAZ, not FOO} bar {BAZ.
Grouping the scan result, then calling flatten, is an easy way to reject what's outside the capture group, i.e. in this case the { and } characters. (This just makes the code a little less cryptic than using indexes like slice(1,-2)!
my_config.keys.include?(placeholder) checks whether this is actually . a known value, so you don't replace things with nil.

Related

Ruby get strings from array which contain substring

I've got an array of strings. A few of the strings in this array contain a certain substring I'm looking for. I want to get an array of those strings containing the substring.
I would hope to do it like this:
a = ["abc", "def", "ghi"]
o.select(&:include?("c"))
But that gives me this error:
(repl):2: syntax error, unexpected ')', expecting end-of-input
o.select(&:include?("c"))
^
If your array was a file lines.txt
abc
def
ghi
Then you would select the lines containing c with the grep command-line utility:
$ grep c lines.txt
abc
Ruby has adopted this as Enumerable#grep. You can pass a regular expression as the pattern and it returns the strings matching this pattern:
['abc', 'def', 'ghi'].grep(/c/)
#=> ["abc"]
More specifically, the result array contains all elements for which pattern === element is true:
/c/ === 'abc' #=> true
/c/ === 'def' #=> false
/c/ === 'ghi' #=> false
You can use the &-shorthand here. It's rather irrational (don't do this), but possible.
If you do manage to find an object and a method so you can make checks in your select like so:
o.select { |e| some_object.some_method(e) }
(the important part is that some_object and some_method need to be the same in all iterations)
...then you can use Object#method to get a block like that. It returns something that implements to_proc (a requirement for &-shorthand) and that proc, when called, calls some_method on some_object, forwarding its arguments to it. Kinda like:
o.m(a, b, c) # <=> o.method(:m).to_proc.call(a, b, c)
Here's how you use this with the &-shorthand:
collection.select(&some_object.method(:some_method))
In this particular case, /c/ and its method =~ do the job:
["abc", "def", "ghi"].select(&/c/.method(:=~))
Kinda verbose, readability is relatively bad.
Once again, don't do this here. But the trick can be helpful in other situations, particularly where the proc is passed in from the outside.
Note: you may have heard of this shorthand syntax in a pre-release of Ruby 2.7, which was, unfortunately, reverted and didn't make it to 2.7:
["abc", "def", "ghi"].select(&/c/.:=~)
You are almost there, you cannot pass parameter in &:. You can do something like:
o.select{ |e| e.include? 'c' }

Ruby creating title case method, can't handle words like McDuff or McComb

The method is supposed to take in a name of a book and return it in proper title case. All of my specs pass ( )handles non-letter characters, handles upper and mixed cases) except the last one which is to return special words like McDuff or McComb with a capital 3rd letter. Anyone see what I'm doing wrong? And, is there a way to simplify this, using the tools at hand and not some higher level shortcut?
class String
define_method(:title_case) do
sentence_array = self.downcase.split
no_caps = ["a", "an", "the", "at", "by", "and", "as", "but", "or", "for", "in", "nor", "on", "at", "up", "to", "on", "of", "from", "by"]
sentence_array.each do |word|
if no_caps.include?(word)
word
else
word.capitalize!
end
sentence_array.first.capitalize!
# Manage special words
if (word.include?("mc"))
letter_array = word.split!("") # word with mc changed to an array of letters
if (letter_array[0] == "m") && (letter_array[1] == "c") # 1st & 2nd letters
letter_array[2].capitalize!
word = letter_array.join
end
end
end
sentence_array.join(" ")
end
end
There are several issues with your "Mc" code:
if (word.include?("mc"))
This will always return false, because you have already capitalized word. It has to be:
if word.include?('Mc')
This line doesn't work either:
letter_array = word.split!("")
because there is no split! method, just split. There is however no reason to use a character array at all. String#[] allows you to access a string's characters (or sub-strings), so the next line becomes:
if (word[0] == 'M') && (word[1] == 'c')
or just:
if word[0, 2] == 'Mc'
or even better using start_with?:
if word.start_with?('Mc')
In fact, we can replace the first if with this one.
The next line is a bit tricky:
letter_array[2].capitalize!
Using String#[] this becomes:
word[2].capitalize!
But unfortunately, both don't work as expected. This is because [] returns a new object, so the bang method doesn't change the original object. Instead you have to call the element assignment method []=:
word[2] = word[2].upcase
Everything put together:
if word.start_with?('Mc')
word[2] = word[2].upcase
end
Or in a single line:
word[2] = word[2].upcase if word.start_with?('Mc')
First of all, please, don't monkey patch. This is bad design, just make a helper function that takes an argument you need (string in your case).
def title_case(string)
no_caps = %w(a an the at by and as but or for in nor on at up to on of from by)
no_caps_regex = /\b(#{no_caps.join('|')})\b/i # match separate words from above, case-insensitive
# you will need ActiveSupport (or Rails) for +String#titleize+ support
titleized = string.titleize
handle_special = titleized.gsub(/\b(mc)(.+?)\b/i) do |match|
[$1, $2].map(&:capitalize).join
end
no_capsed = handle_special.gsub(no_caps_regex) { |match| match.downcase }
end
title_case('mcdonalds is fast food, but mrmcduff is not')
# => "McDonalds Is Fast Food, but Mrmcduff Is Not"
UPDATE: I am sorry about that, it was really bad reading, but I still want to elaborate on the confused terms you noted:
Monkey patching is a technique, available for some dynamic languages (Ruby or Javascript, for example) where you can change (add or remove methods/properties) to already existing classes, such as String, Fixnum, DateTime and others. Often this technique is used for "enhancing" core types (exactly like you did in your code, adding method title_case to String).
The problem here is that if any other library developer chooses the same name and adds it to String class, and you eventually want to try his library in your project, your implementations will clash together and which one is added later wins (depending on the code loading time, usually yours). This will either brake your code or brake the library which is no good also.
Another similar problem, is when you try to "fix" some bug in third party library this way. You monkey patch it, everything works and you forget about it. Then 6 months later you decide to upgrade the library to a new version and suddenly everything blows up, because library code clashed with your changes and you may even not to remember about your monkey patch (or it may even be another developer, that doesn't even know about your monkey patch existence).
Helper function - is just some function that you can add a) to a separate file, called helper b) or just to the current controller/model (the place you need it).
\b is a mark in regex that tells regex engine to treat the following text as a separate word, i.e. /as/ regex can match for word as and also for word fast since it contains as. If you instead use /\bas\b/, only as will be matched.
Regexes are very powerful, please, find some time to learn them, you'll boost your text processing skills to a next level. Combined with some console tools knowledge (I mean commands in UNIX terminals, such as ls, ps, find, grep and etc.), they can be very powerful in day-to-day routines such as "whether yesterday logs contain some ip?", "what is the process name that eats all memory on my machine right now?" or "what are all files in my project that contain this function call?".
The classic book on this subject is J. Friedl's "Mastering regular expressions", highly recommended.
Have a nice day.
#Stefan, you were right!
str = "Lay on, Macduff, and damn'd be him that first cries, 'Hold, enough!'"
no_caps = ["a", "an", "the", "at", "by", "and", "as", "but", "or",
"for", "in", "nor", "on", "at", "up", "to", "on", "of",
"from", "by", "that", "lay"]
str.gsub(/\w+/) do |s|
(no_caps.include?(s.downcase) && $~.begin(0) > 0) ? s.downcase! : s.capitalize!
case s
when /^Mc./ then s[2] = s[2].upcase
when /^Mac./ then s[3] = s[3].upcase
end
s
end
# => "Lay on, MacDuff, and Damn'D Be Him that First Cries, 'Hold, Enough!'"

Ruby Regex Replace Last Occurence of Grouping

In Ruby regular expressions I would like to use gsub to replace a last occurrence of a grouping, if it occurs, otherwise, perform a replacement anyways at a default location. I am trying to replace the last occurrence of a number in the 40s (40...49). I have the following regular expression, which is correctly capturing the grouping I would like in '\3':
/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/
Some sample strings I am using this regex on are:
12a23b34c45d56eFoo
12a45b34c46d89eFoo
45aFoo
Foo
12a23bFoo
12a23b445cFoo
Using https://regex101.com/, I see the last number in 40s is captured in '\3'. I would then like to somehow perform string.gsub(regex, '\3' => 'NEW') to replace this last occurrence or append before Foo if not present. My desired results would be:
12a23b34cNEWd56eFoo
12a45b34cNEWd89eFoo
NEWaFoo
NEWFoo
12a23bNEWFoo
12a23b4NEWcFoo
If I correctly understood, you are interested in gsub with codeblock:
str.gsub(PATTERN) { |mtch|
puts mtch # the whole match
puts $~[3] # the third group
mtch.gsub($~[3], 'NEW') # the result
}
'abc'.gsub(/(b)(c)/) { |m| m.gsub($~[2], 'd') }
#⇒ "abd"
Probably you should handle the case when there are no 40-s occureneces at all, like:
gsub($~[1], "NEW$~[1]") if $~[3].nil?
To handle all the possible cases, one might declare the group for Foo:
# NOTE THE GROUP ⇓⇓⇓⇓⇓
▶ re = /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
#⇒ /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
▶ inp.gsub(re) do |mtch|
▷ $~[3].nil? ? mtch.gsub($~[4], "NEW#{$~[4]}") : mtch.gsub(/#{$~[3]}/, 'NEW')
▷ end
#⇒ "12a23b34cNEWd56eFoo\n12a45b34cNEWd89eFoo\nNEWaFoo\nNEWFoo\n12a23bNEWFoo"
Hope it helps.
I suggest the following:
'12a23b34c45d56eFoo'.gsub(/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/) {
if Regexp.last_match[3].nil? then
puts "Append before Foo"
else
puts "Replace group 3"
end
}
You'd need to find a way to append or replace accordingly or maybe someone can edit with a concise code...

Replace characters from string Ruby

I have the following string which has an array element in it and I will like to remove the quotes in the array element to the outside of the array:
"date":"2014-05-04","name":"John","products":["12","14","45"],"status":"completed"
Is there a way to remove the double quotes in [] and add double quotes to the start and end of []? Results:
"date":"2014-05-04","name":"John","products":"[12,14,45]","status":"completed"
Can that be done in ruby or is there a command line that I can use?
Your string looks like a json hash to me:
json = '{"date":"2014-05-04","name":"John","products":["12","14","45"],"status":"completed"}'
require 'json'
hash = JSON.load(json)
hash.update('products' => hash['products'].map(&:to_i))
puts hash.to_json
# => {"date":"2014-05-04","name":"John","products":[12,14,45],"status":"completed"}
Or if you really want to have the array represented as a string (what is not json anymore):
hash.update('products' => hash['products'].map(&:to_i).to_s) # note .to_s here
puts hash.to_json
# => {"date":"2014-05-04","name":"John","products":"[12,14,45]","status":"completed"}
The answer by #spickermann is pretty good, and the best way I can think of, but since I had fun trying to find an alternative without using json, here it goes:
def string_to_result(str)
str.match(/(?:\[)((?:")+(.)+(?:")+)+(?:\])/)
str.gsub($1, "#{$1.split(',').map{ |num| num.gsub('"', '') }.join(',')}").gsub(/\[/, '"[').gsub(/\]/, ']"').gsub(/String/, 'Results')
end
Is ugly as hell, but it works :P
I tried to do it on a single step, but that was way harder for my regexp skills.
Anyway, you should never parse something structured such as json or xml using only regexps, and this is merely for fun.
[EDIT] Had the bracket adjacent quotes wrong,sorry. Fixed.
Also, one more thing, this fails A LOT! An empty array or an array in other place in the string are just a few cases where it would fail.
You could use the form of String#gsub that takes a block:
str = '"2014-05-04","name":"John","products":["12","14","45"],"status":"completed"'
puts str.gsub(/\["(\d+)","(\d+)","(\d+)"\]/) { "\"[#{$1},#{$2},#{$3}]\"" }
#"2014-05-04","name":"John","products":"[12,14,45]","status":"completed"

Optimising ruby regexp -- lots of match groups

I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like:
/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/
I'm matching it to my string producing a MatchData where exactly one token is parsed:
bigregex =~ "\n ... garbage"
puts $~.inspect
Which outputs
#<MatchData
"\n"
__anonymous_-1038694222803470993:"\n"
__anonymous_-1394418499721420065:nil
__anonymous_3077187815313752157:nil
LET:nil
IN:nil
CLASS:nil
DEF:nil
DEFM:nil
MULTICLASS:nil
FUNCNAME:nil
ID:nil
STRING:nil
NUMBER:nil>
So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically).
I could not find any option other than iterating over #names:
m.names.each do |n|
if m[n]
type = n.to_sym
resolved_type = (n.start_with?('__anonymous_') ? nil : type)
val = m[n]
break
end
end
which verifies that the match group did have a match.
The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the #input[#pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the #pos in it).
You can check the full code at GH repo.
Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?
You can do this using the regexp methods .captures() and .names():
matching_string = "\n ...garbage" # or whatever this really is in your code
#input = matching_string.match bigregex # bigregex = your regex
arr = #input.captures
arr.each_with_index do |value, index|
if not value.nil?
the_name_you_want = #input.names[index]
end
end
Or if you expect multiple successful values, you could do:
success_names_arr = []
success_names_arr.push(#input.names[index]) #within the above loop
Pretty similar to your original idea, but if you're looking for efficiency .captures() method should help with that.
I may have misunderstood this completely but but I'm assuming that all but one token is not nil and that's the one your after?
If so then, depending on the flavour of regex you're using, you could use a negative lookahead to check for a non-nil value
([^\n:]+:(?!nil)[^\n\>]+)
This will match the whole token ie NAME:value.

Resources