Related
So this may seem odd, and I have done quite a bit of googling, however, I am not really a programmer, (sysops) and trying to figure out how to pass data to the AWS API in the required format, which does seem a little odd.
So, working with resources in AWS, I need to pass tags which are keys and values. The key is a string. The value is a comma separated string, in the first element of an array. So in Ruby terms, looks like this.
{env => ["stage,qa,dev"]}
and not
{env => ["stage","qa","dev"]}
I'm created an admittedly. not a very pretty little app that will allow me to run ssm documents on targeted instances in aws.
I can get the string into an array element using this class I created
class Tags
attr_accessor :tags
def initialize
#tags = {"env" => nil ,"os" => nil ,"group" => nil }
end
def set_values()
puts "please enter value/s for the following keys, using spaces or commas for multiple values"
#tags.each { |key,value|
print "enter #{key} value/s: "
#tags[key] = [gets.strip.chomp]
#tags[key] = Validate.multi_value(tags[key])
}
end
end
I then call this Validate.multi_value passing in the created Array, but it spits an array of my string value back.
class Validate
def self.multi_value(value)
if value.any?{ |sub_string| sub_string.include?(",") || sub_string.include?(" ") }
value = value[0].split(/[,\s]+/)
return value
else
return value
end
end
end
Using pry, I've seen it gets for example ["stage dev qa"] then the if statement does work, then it spits out ["stage","dev","qa"].
and I need it to output ["stage,dev,qa"] but for the life of me, I can't make it work.
I hope that's clear.
If you have any suggestions, I'd be most grateful.
I'm not hugely experienced at ruby and the may be class methods that I've missed.
If your arrays are always coming through in the format ["stage dev qa"] then first we need to split the one string into the parts we want:
arr = ["stage dev qa"]
arr.split(' ')
=> ["stage", "dev", "qa"]
Then we need to join them with the comma:
arr.split(' ').join(',')
=> "stage,dev,qa"
And finally we need to wrap it in an array:
[arr.first.split(' ').join(',')]
=> ["stage,dev,qa"]
All together:
def transform_array(arr)
[arr.first.split(' ').join(',')]
end
transform_array(['stage dev qa'])
=> ['stage,dev,qa']
More info: How do I convert an array of strings into a comma-separated string?
I see no point in creating a class here when a simple method would do.
def set_values
["env", "os", "group"].map do |tag|
puts "Please enter values for #{tag}, using spaces or commas"
print "to separate multiple values: "
gets.strip.gsub(/[ ,]+/, ',')
end
end
Suppose, when asked, the user enters, "stage dev,qa" (for"env"), "OS X" (for"OS") and "Hell's Angels" for "group". Then:
set_values
#=> ["stage,dev,qa", "OS,X", "Hell's,Angels"]
If, as I suspect, you only wish to convert spaces to commas for "env" and not for "os" or "group", write:
def set_values
puts "Please enter values for env, using spaces or commas"
print "to separate multiple values: "
[gets.strip.gsub(/[ ,]+/, ',')] +
["os", "group"].map do |tag|
print "Please enter value for #{tag}: "
gets.strip
end
end
set_values
#=> ["stage,dev,ga", "OS X", "Hell's Angels"]
See Array#map, String#gsub and Array#+.
gets.strip.gsub(/[ ,]+/, ',') merely chains the two operations s = gets.strip and s.gsub(/[ ,]+/, ','). Chaining is commonplace in Ruby.
The regular expression used by gsub reads, "match one or more spaces or commas", [ ,] being a character class, requiring one of the characters in the class be matched, + meaning that one or more of those spaces or commas are to be matched. If the string were "a , b,, c" there would be two matches, " , " and ",, "; gsub would convert both to a single comma.
Using print rather than puts displays the user's entry on the same line as the prompt, immediately after ": ", rather than on the next line. That is of course purely stylistic.
Often one would write gets.chomp rather than gets.strip. Both remove newlines and other whitespace at the end of the string, strip also removes any whitespace at the beginning of the string. strip is probably best in this case.
What do you think about this?, everything gets treated in the Validates method. I don't know if you wanted to remove repeated values, but, just in case I did, so a
"this string,, has too many,,, , spaces"
will become
"this,string,has,too,many,spaces"
and not
"this,,,,string,,,has,too,,many,,,,,,spaces"
Here's the code
class Tags
attr_accessor :tags
# initializes the class (no change)
#
def initialize
#tags = {"env" => nil ,"os" => nil ,"group" => nil }
end
# request and assign the values <- SOME CHANGES
#
def set_values
puts "please enter value/s for the following keys, using spaces or commas for multiple values"
#tags.each do |key,value|
print "enter #{key} value/s: "
#tags[key] = Validate.multi_value( gets )
end
end
end
class Validate
# Sets the array
#
def self.multi_value(value)
# Remove leading spaces, then remove special chars,
# replace all spaces with commas, then remove repetitions
#
[ value.strip.delete("\n","\r","\t","\rn").gsub(" ", ",").squeeze(",") ]
end
end
EDITED, thanks lacostenycoder
Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end
I've looked around but haven't been able to find a working solution to my problem.
I have an array of two strings input and want to test which element of the array contains an exact substring Test.
One thing I have tried (among numerous other attempts):
input = ["Test's string", "Test string"]
# Alternative input array that it needs to work on:
# ["Testing string", "some Test string"]
substring = "Test"
if (input[0].match(/\b#{substring}\b/))
puts "Test 0 "
# Do something...
elsif (input[1].match(/\b#{substring}\b/))
puts "Test 1"
# Do something different...
end
The desired result is a print of "Test 1". The input can be more complex but overall I am looking for a way to find an exact match of a substring in a longer string.
I feel like this should be a rather trivial regex but I haven't been able to come up with the correct pattern. Any help would be greatly appreciated!
Following code may be what you are looking for.
input = ["Testing string", "Test string"]
substring = "Test"
if (input[0].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 0 "
elsif (input[1].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 1"
end
The meaning of the pattern /[^|\s]#{substring}[\s|$]/ is
[^|\s] : left side of the substring is begining of string(^) or white space,
{substring} : subsring is matched exactly,
[\s|$] : right side of the substring is white space or end of string($).
One way to that is as follows:
input = ["Testing string", "Test"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 1"
input = ["Test", "Testing string"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 0"
\b is the regex denotes a word boundary.
Maybe you want a method to return the index of the first element of input that contains the word? That could be:
def matching_index(input, word)
input.index { |s| s[/\b#{word}\b/i] }
end
input = ["Testing string", "Test"]
matching_index(input, "Test") #=> 1
matching_index(input, "test") #=> 1
matching_index(input, "Testing") #=> 0
matching_index(input, "Testy") #=> nil
Then you could use it like this, for example:
word = 'Test'
puts "The matching element for '#{word}' is at index #{ matching_index(input, word) }"
#=> The matching element for 'Test' is at index 1
word = "Testing"
puts "The matching element for '#{word}' is '#{ input[matching_index(input, word)] }'"
#The matching element for 'Testing' is 'Testing string'
The problem is with your bounding. In your original question, the word Test will match the first string because the ' is will match the \b word boundary. It's a perfect match and is responding with "Test 0" correctly. You need to determine how you'll terminate your search. If your input contains special characters, I don't think the regex will work properly. /\bTest my $money.*/ will never match because the of the $ in your substring.
What happens if you have multiple matches in your input array? Do you want to do something to all of them or just the first one?
I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
end
This will eval regex twice :(
So...
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
end
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
end
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
end
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
end
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
So:
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info: http://www.regular-expressions.info/ruby.html
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s
I have a string:
s="123--abc,123--abc,123--abc"
I tried using Ruby 1.9's new feature "named groups" to fetch all named group info:
/(?<number>\d*)--(?<chars>\s*)/
Is there an API like Python's findall which returns a matchdata collection? In this case I need to return two matches, because 123 and abc repeat twice. Each match data contains of detail of each named capture info so I can use m['number'] to get the match value.
Named captures are suitable only for one matching result.
Ruby's analogue of findall is String#scan. You can either use scan result as an array, or pass a block to it:
irb> s = "123--abc,123--abc,123--abc"
=> "123--abc,123--abc,123--abc"
irb> s.scan(/(\d*)--([a-z]*)/)
=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
irb> s.scan(/(\d*)--([a-z]*)/) do |number, chars|
irb* p [number,chars]
irb> end
["123", "abc"]
["123", "abc"]
["123", "abc"]
=> "123--abc,123--abc,123--abc"
Chiming in super-late, but here's a simple way of replicating String#scan but getting the matchdata instead:
matches = []
foo.scan(regex){ matches << $~ }
matches now contains the MatchData objects that correspond to scanning the string.
You can extract the used variables from the regexp using names method. So what I did is, I used regular scan method to get the matches, then zipped names and every match to create a Hash.
class String
def scan2(regexp)
names = regexp.names
scan(regexp).collect do |match|
Hash[names.zip(match)]
end
end
end
Usage:
>> "aaa http://www.google.com.tr aaa https://www.yahoo.com.tr ddd".scan2 /(?<url>(?<protocol>https?):\/\/[\S]+)/
=> [{"url"=>"http://www.google.com.tr", "protocol"=>"http"}, {"url"=>"https://www.yahoo.com.tr", "protocol"=>"https"}]
#Nakilon is correct showing scan with a regex, however you don't even need to venture into regex land if you don't want to:
s = "123--abc,123--abc,123--abc"
s.split(',')
#=> ["123--abc", "123--abc", "123--abc"]
s.split(',').inject([]) { |a,s| a << s.split('--'); a }
#=> [["123", "abc"], ["123", "abc"], ["123", "abc"]]
This returns an array of arrays, which is convenient if you have multiple occurrences and need to see/process them all.
s.split(',').inject({}) { |h,s| n,v = s.split('--'); h[n] = v; h }
#=> {"123"=>"abc"}
This returns a hash, which, because the elements have the same key, has only the unique key value. This is good when you have a bunch of duplicate keys but want the unique ones. Its downside occurs if you need the unique values associated with the keys, but that appears to be a different question.
If using ruby >=1.9 and the named captures, you could:
class String
def scan2(regexp2_str, placeholders = {})
return regexp2_str.to_re(placeholders).match(self)
end
def to_re(placeholders = {})
re2 = self.dup
separator = placeholders.delete(:SEPARATOR) || '' #Returns and removes separator if :SEPARATOR is set.
#Search for the pattern placeholders and replace them with the regex
placeholders.each do |placeholder, regex|
re2.sub!(separator + placeholder.to_s + separator, "(?<#{placeholder}>#{regex})")
end
return Regexp.new(re2, Regexp::MULTILINE) #Returns regex using named captures.
end
end
Usage (ruby >=1.9):
> "1234:Kalle".scan2("num4:name", num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
or
> re="num4:name".to_re(num4:'\d{4}', name:'\w+')
=> /(?<num4>\d{4}):(?<name>\w+)/m
> m=re.match("1234:Kalle")
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
> m[:num4]
=> "1234"
> m[:name]
=> "Kalle"
Using the separator option:
> "1234:Kalle".scan2("#num4#:#name#", SEPARATOR:'#', num4:'\d{4}', name:'\w+')
=> #<MatchData "1234:Kalle" num4:"1234" name:"Kalle">
I needed something similar recently. This should work like String#scan, but return an array of MatchData objects instead.
class String
# This method will return an array of MatchData's rather than the
# array of strings returned by the vanilla `scan`.
def match_all(regex)
match_str = self
match_datas = []
while match_str.length > 0 do
md = match_str.match(regex)
break unless md
match_datas << md
match_str = md.post_match
end
return match_datas
end
end
Running your sample data in the REPL results in the following:
> "123--abc,123--abc,123--abc".match_all(/(?<number>\d*)--(?<chars>[a-z]*)/)
=> [#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">,
#<MatchData "123--abc" number:"123" chars:"abc">]
You may also find my test code useful:
describe String do
describe :match_all do
it "it works like scan, but uses MatchData objects instead of arrays and strings" do
mds = "ABC-123, DEF-456, GHI-098".match_all(/(?<word>[A-Z]+)-(?<number>[0-9]+)/)
mds[0][:word].should == "ABC"
mds[0][:number].should == "123"
mds[1][:word].should == "DEF"
mds[1][:number].should == "456"
mds[2][:word].should == "GHI"
mds[2][:number].should == "098"
end
end
end
I really liked #Umut-Utkan's solution, but it didn't quite do what I wanted so I rewrote it a bit (note, the below might not be beautiful code, but it seems to work)
class String
def scan2(regexp)
names = regexp.names
captures = Hash.new
scan(regexp).collect do |match|
nzip = names.zip(match)
nzip.each do |m|
captgrp = m[0].to_sym
captures.add(captgrp, m[1])
end
end
return captures
end
end
Now, if you do
p '12f3g4g5h5h6j7j7j'.scan2(/(?<alpha>[a-zA-Z])(?<digit>[0-9])/)
You get
{:alpha=>["f", "g", "g", "h", "h", "j", "j"], :digit=>["3", "4", "5", "5", "6", "7", "7"]}
(ie. all the alpha characters found in one array, and all the digits found in another array). Depending on your purpose for scanning, this might be useful. Anyway, I love seeing examples of how easy it is to rewrite or extend core Ruby functionality with just a few lines!
A year ago I wanted regular expressions that were more easy to read and named the captures, so I made the following addition to String (should maybe not be there, but it was convenient at the time):
scan2.rb:
class String
#Works as scan but stores the result in a hash indexed by variable/constant names (regexp PLACEHOLDERS) within parantheses.
#Example: Given the (constant) strings BTF, RCVR and SNDR and the regexp /#BTF# (#RCVR#) (#SNDR#)/
#the matches will be returned in a hash like: match[:RCVR] = <the match> and match[:SNDR] = <the match>
#Note: The #STRING_VARIABLE_OR_CONST# syntax has to be used. All occurences of #STRING# will work as #{STRING}
#but is needed for the method to see the names to be used as indices.
def scan2(regexp2_str, mark='#')
regexp = regexp2_str.to_re(mark) #Evaluates the strings. Note: Must be reachable from here!
hash_indices_array = regexp2_str.scan(/\(#{mark}(.*?)#{mark}\)/).flatten #Look for string variable names within (#VAR#) or # replaced by <mark>
match_array = self.scan(regexp)
#Save matches in hash indexed by string variable names:
match_hash = Hash.new
match_array.flatten.each_with_index do |m, i|
match_hash[hash_indices_array[i].to_sym] = m
end
return match_hash
end
def to_re(mark='#')
re = /#{mark}(.*?)#{mark}/
return Regexp.new(self.gsub(re){eval $1}, Regexp::MULTILINE) #Evaluates the strings, creates RE. Note: Variables must be reachable from here!
end
end
Example usage (irb1.9):
> load 'scan2.rb'
> AREA = '\d+'
> PHONE = '\d+'
> NAME = '\w+'
> "1234-567890 Glenn".scan2('(#AREA#)-(#PHONE#) (#NAME#)')
=> {:AREA=>"1234", :PHONE=>"567890", :NAME=>"Glenn"}
Notes:
Of course it would have been more elegant to put the patterns (e.g. AREA, PHONE...) in a hash and add this hash with patterns to the arguments of scan2.
Piggybacking off of Mark Hubbart's answer, I added the following monkey-patch:
class ::Regexp
def match_all(str)
matches = []
str.scan(self) { matches << $~ }
matches
end
end
which can be used as /(?<letter>\w)/.match_all('word'), and returns:
[#<MatchData "w" letter:"w">, #<MatchData "o" letter:"o">, #<MatchData "r" letter:"r">, #<MatchData "d" letter:"d">]
This relies on, as others have said, the use of $~ in the scan block for the match data.
I like the match_all given by John, but I think it has an error.
The line:
match_datas << md
works if there are no captures () in the regex.
This code gives the whole line up to and including the pattern matched/captured by the regex. (The [0] part of MatchData) If the regex has capture (), then this result is probably not what the user (me) wants in the eventual output.
I think in the case where there are captures () in regex, the correct code should be:
match_datas << md[1]
The eventual output of match_datas will be an array of pattern capture matches starting from match_datas[0]. This is not quite what may be expected if a normal MatchData is wanted which includes a match_datas[0] value which is the whole matched substring followed by match_datas[1], match_datas[[2],.. which are the captures (if any) in the regex pattern.
Things are complex - which may be why match_all was not included in native MatchData.