How to summarize strings into shell globs - ruby

Are there any Ruby idioms or popular libraries for consolidating strings into a shell glob that would generate them? For example, given the strings,
abc1
abc2
abc3
I want to produce the string abc{1..3} or abc{1,2,3}. It's much like summarizing subnets in IP addressing.
I imagine a Rubyesque approach might involve sorting the strings and then building an array of their constituent characters, placing characters that don't overlap into nested arrays, recursively. However if there is something already out there, I'd rather not reinvent that wheel.

I borrowed the accepted solution to Finding common string in array of strings (ruby) and used a recursive algorithm:
def longest_common_substr(strings)
shortest = strings.min_by &:length
maxlen = shortest.length
maxlen.downto(0) do |len|
0.upto(maxlen - len) do |start|
substr = shortest[start,len]
return substr if strings.all?{|str| str.include? substr }
end
end
end
def create_glob(files)
return '' if files.compact.empty?
stub=longest_common_substr(files)
if stub.length == 0
expansion = files.uniq.join(',')
return '' if expansion == ''
return '{' + expansion + '}'
end
pre = []
post = []
files.each do |file|
i = file.index(stub)
pre << file[0, i]
post << file[i+stub.length..-1]
end
return create_glob(pre) + stub + create_glob(post)
end
That works well for your example:
puts create_glob(['abc1',
'abc2',
'abc3'
])
#=> abc{1,2,3}
It also covers more complicated cases such as variable prefixes:
puts create_glob(['first.abc1',
'second.abc2',
'third.abc3'
])
#=> {first,second,third}.abc{1,2,3}
And even missing values:
puts create_glob(['.abc1',
'abc',
'abc3'
])
#=> {.,}abc{1,,3}
Notice brace expansion might produce strings that are not files. For instance, the last example expands like this:
$ echo {.,}abc{1,,3}
.abc1 .abc .abc3 abc1 abc abc3
So you have to be just a little bit careful using the output.

Related

How to match and replace pattern without regex?

I was recently asked this in an interview and was figuring out a way to do this without using regex in Ruby as I was told it would be a bonus if you can solve it without using regex.
Question: Assume that the hash has 1 million key, value pairs and we have to be able to sub the variables in the string that are between % % this pattern. How would I be able to do this without regex.
We have a string str = "%greet%! Hi there, %var_1% that can be any other %var_2% injected to the %var_3%. Nice!, goodbye)"
we have a hash called dict = { greet: 'Hi there', var_1: 'FIRST VARIABLE', var_2: 'values', var_3: 'string', }
This was my solution:
def template(str, dict)
vars = value.scan(/%(.*?)%/).flatten
vars.each do |var|
value = value.gsub("%#{var}%", dict[var.to_sym])
end
value
end
There are many ways to solve this, but you will probably need some kind of parsing and / or lexical analysis if you don't want to use built-in pattern matching.
Let's keep it very simple and say that your string's content falls into two categories: text and variable which are separated by %, e.g. (you could also think of the variables being enclosed by %, but that's harder to implement)
str = "Hello %name%, hope to see you %when%!"
# TTTTTT VVVV TTTTTTTTTTTTTTTTTT VVVV T
As you can see, the categories are alternating. We can utilize this and write a little helper method that turns a string into a list of [type, value] pairs, something like this:
def each_part(str)
return enum_for(__method__, str) unless block_given?
type = [:text, :var].cycle
buf = ''
str.each_char do |char|
if char != '%'
buf << char
else
yield type.next, buf
buf = ''
end
end
yield type.next, buf
end
It starts by defining an enumerator that will cycle between the two types and an empty buffer. It will then read each_char from the string. If the char is not %, it will just append it to the buffer and keep reading. Once it encounters a %, it will yield the current buffer along with the type and start a new buffer (next will also switch the type). After the loop ends, it will yield once more to output the remaining characters.
It outputs this kind of data:
each_part(str).to_a
#=> [[:text, "Hello "],
# [:var, "name"],
# [:text, ", hope to see you "],
# [:var, "when"],
# [:text, "!"]]
We can use this to convert the string:
dict = { name: 'Tom', when: 'soon' }
output = ''
each_part(str) do |type, value|
case type
when :text
output << value
when :var
output << dict[value.to_sym]
end
end
p output
#=> "Hello Tom, hope to see you soon!"
You could of course combine parsing and evaluation, but I like the separation. An full-fledged parser might involve even more steps.
A very simple approach:
First, split the string on '%':
str = "%greet%! Hi there, %var_1% that can be any other %var_2% injected to the %var_3%. Nice!, goodbye)"
chunks = str.split('%')
Now we can assume given the way the problem has been specified, that every other "chunk" will be a key to replace. Iterating with the index will make that easier to figure out.
chunks.each_with_index { |c, i| chunks[i] = (i.even? ? c : dict[c.to_sym]) }.join
Result:
"Hi there! Hi there, FIRST VARIABLE that can be any other values injected to the string. Nice!, goodbye)"
Note: this does not handle malformed input well at all.

ruby: Grab numbers only within quotes

I would like the following sub-string
1100110011110000
from
foo = "bar9-9 '11001100 11110000 A'A\n"
I have so far used the below, which yields
puts foo.split(',').map!(&:strip)[0].gsub(/\D/, '')
>> 991100110011110000
Getting rid of the 2 leading 9's is not too difficult in this scenario, but I would like a general solution which grabs numbers only within the ' ' single quotes
You can find the quoted part first with scan and then remove non-digits:
> results = "bar9-9 '11001100 11110000 A'A\n".scan(/'[^']*'/).map{|m| m.gsub(/\D/, '')}
# => ["1100110011110000"]
> results[0]
# => "1100110011110000"
The zeros and ones within the quoted string can be extracted using String#gsub with a regular expression, as opposed to methods that convert the string to an array of strings, modify the array and converted it back to a string. Here are three ways of doing that.
str ="bar9-9 '11001100 11110000 A'A\n"
#1: Extract the substring of interest and then remove characters other than zero and one
def extract(str)
str[str.index("'")+1, str.rindex("'")-1].gsub(/[^01]/,'')
end
extract str
#=> "1100110011110000"
#2 Use a flag to indicate when zeroes and ones are to be kept
def extract(str)
found = false
str.gsub(/./m) do |c|
found = !found if c == "'"
(found && (c =~ /[01]/)) ? c : ''
end
end
extract str
#=> "1100110011110000"
Here the regular expression requires the m modifier (to enable multiline mode) in order to convert the newline character to an empty string. (One could alternatively write str.chomp.gsub(/./)....)
Notice that this second method works when there are multiple single-quoted substrings.
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"
#3 Use the flip-flop operator (variant of #2)
def extract(str)
str.gsub(/./m) do |c|
next '' if (c=="'") .. (c=="'")
c =~ /[01]/ ? c : ''
end
end
extract str
#=> "1100110011110000"
extract "bar9-9 '11001100 11110000 A'A'10x1y'\n"
#=> "1100110011110000101"
foo.slice(/'.*?'/).scan(/\d+/).join
#=> "1100110011110000"

Ruby, looping through a string deleting groups of characters until a desired output is achieved

I have a coding problem I solved and want to refactor. I know there has to be a cleaner way of doing what I did.
The goal is to write a method that takes a string of "!" and "?" and reduces the string by eliminating all odd groupings of each symbol.
Example - a string "????!!!" would have an odd grouping of "!!!" because there are three in a row. These would be deleted from the string.
If there is only one "!" or "?" its left because it is not in a group.
Ex -
remove("!????!!!?") answer == "!"
# => ("!????!!!?" --> "!?????" --> "!")
In the first string, the only odd grouping is "!!!", once removed, it leaves a new string with an odd grouping "?????". You remove the next odd grouping so you're left with "!". This fits the desired output.
Another example
remove("!???!!") == ""
# => ("!???!!" --> "!!!" --> "")
Current code:
def remove(s)
arr = [s]
i = 0
until i == arr[0].length
s = s.chars.chunk{|c|c}.map{ |n,a| a.join }.select{|x| x if x.length.even? || x.length <= 1}.join
arr << s
i += 1
end
return arr[-1]
end
My code solves this problem and all test cases. I have a suspicion that my until loop can be removed/refactored so that I could solve this problem in one line and have spent hours trying to figure it out with no luck.
Suppose
str = "???!!!???!"
If we first remove the two groups "???" we are left with "!!!!", which cannot be reduced further.
If we first remove the group "!!!" we are left with "??????!", which cannot be reduced further.
If we are permitted to remove all odd groups of either character without reference to the effect that either has on the other, we obtain !, which cannot be reduced further.
It's not clear what rule is to be used. Here are three possibilities and code to implement each.
I will use the following two regular expressions, and in the first two cases a helper method.
Rq = /
(?<!\?) # do not match a question mark, negative lookbehind
\? # match a question mark
(\?{2})+ # match two question marks one or more times
(?!\?) # do not match a question mark, negative lookahead
/x # free-spacing regex definition mode
which is commonly written /(?<!\?)\?(\?{2})+(?!\?)/.
Similarly,
Rx = /(?<!!)!(!{2})+(?!!)/
def sequential(str, first_regex, second_regex)
s = str.dup
loop do
size = s.size
s = s.gsub(first_regex,'').gsub(second_regex,'')
return s if s.size == size
end
end
I apply each of the three methods below to two example strings:
str1 = "???!!!???!"
str2 = 50.times.map { ['?', '!'].sample }.join
#=> "?!!!?!!!?!??????!!!?!!??!!???!?!????!?!!!?!?!???!?"
Replace all odd groups of "?" then odd groups of "!" then repeat until no further removals are possible
def question_before_exclamation(str)
sequential(str, Rq, Rx)
end
question_before_exclamation str1 #=> "!!!!"
question_before_exclamation str2 #=> "??!??!?!!?!?!!?"
Replace all odd groups of "!" then odd groups of "?" then repeat until no further removals are possible
def exclamation_before_question(str)
sequential(str, Rx, Rq)
end
exclamation_before_question str1 #=> "??????!"
exclamation_before_question str2 #=> "??!????!!?!?!!?!?!!?"
Replace all odd groups of both "?" and "!" then repeat until no further removals are possible
Rqx = /#{Rq}|#{Rx}/
#=> /(?-mix:(?<!\?)\?(\?{2})+(?!\?))|(?-mix:(?<!!)!(!{2})+(?!!))/
def question_and_explanation(str)
s = str.dup
loop do
size = s.size
s = s.gsub(Rqx,'')
return s if s.size == size
end
end
question_and_explanation str1 #=> "!"
question_and_explanation str2 #=> "??!?!!?!?!!?!?!!?"
I don't know the exact Ruby syntax for this, but you could simplify your solution by using regular expressions:
Gather all matches of consecutive characters
if all matches are of even length or 1 exit
Test if matches are an odd length
if an odd length, replace with the empty string
else do nothing
Goto step 1
A solution in Perl would be:
#!perl
use strict;
use warnings;
use feature qw(say);
my $string = '!????!!!?';
sub reduce {
my ($s) = #_;
while ( my #matches = $s =~ m/((.)\2+)/g ) {
last if ! grep { length($_) > 1 && length($_) % 2 == 1 } #matches;
foreach my $match ( #matches ) {
$s =~ s/\Q$match// if length($match) > 1 && length($match) % 2 == 1;
}
}
return $s;
}
say reduce($string);
I could be wrong (this is ruby, after all) but I don't think you'll find a one-liner for this because ruby's utility functions generally aren't recursive. But you can use regex to simplify your logic, at the very least:
def remove(s)
while s =~ /(?<!\!)\!([\!]{2})+(?!\!)/ || s =~ /(?<!\?)\?([\?]{2})+(?!\?)/
s.gsub! /(?<!\!)\!([\!]{2})+(?!\!)/, "" # remove odd !
s.gsub! /(?<!\?)\?([\?]{2})+(?!\?)/, "" # remove odd ?
end
return s
end
To make the regex less mind-boggling, it helps to look at them with 'a' instead of '?' and '!':
/(?<!a)a([a]{2})+(?!a)/ #regex for 'a'
(?<!a) #negative lookbehind: the match cannot start with an 'a'
a([a]{2})+ #the match should be an 'a' followed by 1 or more pairs
(?!a) #negative lookahead: the match cannot end with an 'a'
It should be simple enough with a regular expression replacement
def remove(string)
begin
original = string
string.gsub!(/(\!{3,})|(\?{3,})/) { |s| s.length.even? ? s : '' }
end until original == string
string
end
puts remove("!????!!!?").inspect # answer == "!"
puts remove("!???!!").inspect # answer == ""
puts remove("!????!!").inspect # answer == "!????!!"

Taking a string and returning it with vowels removed

I'm attempting to write a function that takes a string and returns it with all vowels removed. Below is my code.
def vowel(str)
result = ""
new = str.split(" ")
i = 0
while i < new.length
if new[i] == "a"
i = i + 1
elsif new[i] != "a"
result = new[i] + result
end
i = i + 1
end
return result
end
When I run the code, it returns the exact string that I entered for (str). For example, if I enter "apple", it returns "apple".
This was my original code. It had the same result.
def vowel(str)
result = ""
new = str.split(" ")
i = 0
while i < new.length
if new[i] != "a"
result = new[i] + result
end
i = i + 1
end
return result
end
I need to know what I am doing wrong using this methodology. What am I doing wrong?
Finding the bug
Let's see what's wrong with your original code by executing your method's code in IRB:
$ irb
irb(main):001:0> str = "apple"
#=> "apple"
irb(main):002:0> new = str.split(" ")
#=> ["apple"]
Bingo! ["apple"] is not the expected result. What does the documentation for String#split say?
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array of these substrings.
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
Our pattern is a single space, so split returns an array of words. This is definitely not what we want. To get the desired result, i.e. an array of characters, we could pass an empty string as the pattern:
irb(main):003:0> new = str.split("")
#=> ["a", "p", "p", "l", "e"]
"split on empty string" feels a bit hacky and indeed there's another method that does exactly what we want: String#chars
chars → an_array
Returns an array of characters in str. This is a shorthand for str.each_char.to_a.
Let's give it a try:
irb(main):004:0> new = str.chars
#=> ["a", "p", "p", "l", "e"]
Perfect, just as advertised.
Another bug
With the new method in place, your code still doesn't return the expected result (I'm going to omit the IRB prompt from now on):
vowel("apple") #=> "elpp"
This is because
result = new[i] + result
prepends the character to the result string. To append it, we have to write
result = result + new[i]
Or even better, use the append method String#<<:
result << new[i]
Let's try it:
def vowel(str)
result = ""
new = str.chars
i = 0
while i < new.length
if new[i] != "a"
result << new[i]
end
i = i + 1
end
return result
end
vowel("apple") #=> "pple"
That looks good, "a" has been removed ("e" is still there, because you only check for "a").
Now for some refactoring.
Removing the explicit loop counter
Instead of a while loop with an explicit loop counter, it's more idiomatic to use something like Integer#times:
new.length.times do |i|
# ...
end
or Range#each:
(0...new.length).each do |i|
# ...
end
or Array#each_index:
new.each_index do |i|
# ...
end
Let's apply the latter:
def vowel(str)
result = ""
new = str.chars
new.each_index do |i|
if new[i] != "a"
result << new[i]
end
end
return result
end
Much better. We don't have to worry about initializing the loop counter (i = 0) or incrementing it (i = i + 1) any more.
Avoiding character indices
Instead of iterating over the character indices via each_index:
new.each_index do |i|
if new[i] != "a"
result << new[i]
end
end
we can iterate over the characters themselves using Array#each:
new.each do |char|
if char != "a"
result << char
end
end
Removing the character array
We don't even have to create the new character array. Remember the documentation for chars?
This is a shorthand for str.each_char.to_a.
String#each_char passes each character to the given block:
def vowel(str)
result = ""
str.each_char do |char|
if char != "a"
result << char
end
end
return result
end
The return keyword is optional. We could just write result instead of return result, because a method's return value is the last expression that was evaluated.
Removing the explicit string
Ruby even allows you to pass an object into the loop using Enumerator#with_object, thus eliminating the explicit result string:
def vowel(str)
str.each_char.with_object("") do |char, result|
if char != "a"
result << char
end
end
end
with_object passes "" into the block as result and returns it (after the characters have been appended within the block). It is also the last expression in the method, i.e. its return value.
You could also use if as a modifier, i.e.:
result << char if char != "a"
Alternatives
There are many different ways to remove characters from a string.
Another approach is to filter out the vowel characters using Enumerable#reject (it returns a new array containing the remaining characters) and then join the characters (see Nathan's answer for a version to remove all vowels):
def vowel(str)
str.each_char.reject { |char| char == "a" }.join
end
For basic operations like string manipulation however, Ruby usually already provides a method. Check out the other answers for built-in alternatives:
str.delete('aeiouAEIOU') as shown in Gagan Gami's answer
str.tr('aeiouAEIOU', '') as shown in Cary Swoveland's answer
str.gsub(/[aeiou]/i, '') as shown in Avinash Raj's answer
Naming things
Cary Swoveland pointed out that vowel is not the best name for your method. Choose the names for your methods, variables and classes carefully. It's desirable to have a short and succinct method name, but it should also communicate its intent.
vowel(str) obviously has something to do with vowels, but it's not clear what it is. Does it return a vowel or all vowels from str? Does it check whether str is a vowel or contains a vowel?
remove_vowels or delete_vowels would probably be a better choice.
Same for variables: new is an array of characters. Why not call it characters (or chars if space is an issue)?
Bottom line: read the fine manual and get to know your tools. Most of the time, an IRB session is all you need to debug your code.
I should use regex.
str.gsub(/[aeiou]/i, "")
> string= "This Is my sAmple tExt to removE vowels"
#=> "This Is my sAmple tExt to removE vowels"
> string.delete 'aeiouAEIOU'
#=> "Ths s my smpl txt t rmv vwls"
You can create a method like this:
def remove_vowel(str)
result = str.delete 'aeiouAEIOU'
return result
end
remove_vowel("Hello World, This is my sample text")
# output : "Hll Wrld, Ths s my smpl txt"
Live Demo
Assuming you're trying to learn about the basics of programming, rather than finding the quickest one-liner to do this (which would be to use a regular expression as Avinash has said), you have a number of problems with your code you need to change.
new = str.split(" ")
This line is likely the culprit, because it splits the string based on spaces. So your input string would have to be "a p p l e" to have the effect you're looking for.
new = str.split("")
You should also remove the duplicate i = i+1 once you've changed that.
As others have already identified the problems with the OP's code, I will merely suggest an alternative; namely, you could use String#tr:
"Now is the time for all good people...".tr('aeiouAEIOU', '')
#=> "Nw s th tm fr ll gd ppl..."
If regex is not allowed, you can do it this way:
def remove_vowels(string)
string.split("").delete_if { |letter| %w[a e i o u].include? letter }.join
end

How to implement addition operator in math parser (ruby)

I'm trying to build my own evaluator for mathematical expressions in ruby, and before doing that am trying to implement a parser to break the expression into a tree(of arrays). It correctly breaks down expressions with parenthesis, but I am having lots of trouble trying to figure out how to make it correctly break up an expression with operator precedence for addition.
Right now, a string like 1+2*3+4 becomes 1+[2*[3+4]] instead of 1+[2*3]+4. I'm trying to do the simplest solution possible.
Here is my code:
#d = 0
#error = false
#manipulate an array by reference
def calc_expr expr, array
until #d == expr.length
c = expr[#d]
case c
when "("
#d += 1
array.push calc_expr(expr, Array.new)
when ")"
#d += 1
return array
when /[\*\/]/
#d +=1
array.push c
when /[\+\-]/
#d+=1
array.push c
when /[0-9]/
x = 0
matched = false
expr[#d]
until matched == true
y = expr.match(/[0-9]+/,#d).to_s
case expr[#d+x]
when /[0-9]/
x+=1
else matched = true
end
end
array.push expr[#d,x].to_i
#d +=(x)
else
unless #error
#error = true
puts "Problem evaluating expression at index:#{#d}"
puts "Char '#{expr[#d]}' not recognized"
end
return
end
end
return array
end
#expression = ("(34+45)+(34+67)").gsub(" ","")
evaluated = calc #expression
puts evaluated.inspect
For fun, here's a fun regex-based 'parser' that uses the nice "inside-out" approach suggested by #DavidLjungMadison. It performs simple "a*b" multiplication and division first, followed by "a+b" addition and subtraction, and then unwraps any number left in parenthesis (a), and then starts over.
For simplicity in the regex I've only chosen to support integers; expanding each -?\d+ to something more robust, and replacing the .to_i with .to_f would allow it to work with floating point values.
module Math
def self.eval( expr )
expr = expr.dup
go = true
while go
go = false
go = true while expr.sub!(/(-?\d+)\s*([*\/])\s*(-?\d+)/) do
m,op,n = $1.to_i, $2, $3.to_i
op=="*" ? m*n : m/n
end
go = true while expr.sub!(/(-?\d+)\s*([+-])\s*(-?\d+)/) do
a,op,b = $1.to_i, $2, $3.to_i
op=="+" ? a+b : a-b
end
go = true while expr.gsub!(/\(\s*(-?\d+)\s*\)/,'\1')
end
expr.to_i
end
end
And here's a bit of testing for it:
tests = {
"1" => 1,
"1+1" => 2,
"1 + 1" => 2,
"1 - 1" => 0,
"-1" => -1,
"1 + -1" => 0,
"1 - -1" => 2,
"2*3+1" => 7,
"1+2*3" => 7,
"(1+2)*3" => 9,
"(2+(3-4) *3 ) * -6 * ( 3--4)" => 42,
"4*6/3*2" => 16
}
tests.each do |expr,expected|
actual = Math.eval expr
puts [expr.inspect,'=>',actual,'instead of',expected].join(' ') unless actual == expected
end
Note that I use sub! instead of gsub! on the operators in order to survive the last test case. If I had used gsub! then "4*6/3*2" would first be turned into "24/6" and thus result in 4, instead of the correct expansion "24/3*2" → "8*2" → 16.
If you really need to do the expression parsing yourself, then you should search for both sides of an expression (such as '2*3') and replace that with either your answer (if you are trying to calculate the answer) or an expression object (such as your tree of arrays, if you want to keep the structure of the expressions and evaluate later). If you do this in the order of precedence, then precedence will be preserved.
As a simplified example, your expression parser should:
Repeatedly search for all inner parens: /(([^)+]))/ and replace that with a call to the expression parser of $1 (sorry about the ugly regexp :)
Now all parens are gone, so you are looking at math operations between numbers and/or expression objects - treat them the same
Search for multiplication: /(expr|number)*(expr|number)/
Replace this with either the answer or encapsulate the two expressions in
a new expression. Again, depending on whether you need the answer now or
if you need the expression tree.
Search for addition: ... etc ...
If you are calculating the answer now then this is easy, each call to the expression parser eventually (after necessary recursion) returns a number which you can just replace the original expression with. It's a different story if you want to build the expression tree, and how you deal with a mixture of strings and expression objects so you can run a regexp on it is up to you, you could encode a pointer to the expression object in the string or else replace the entire string at the outside with an array of objects and use something similar to regexp to search the array.
You should also consider dealing with unary operators: "3*+3"
(It might simplify things if the very first step you take is to convert all numbers to a simple expression object just containing the number, you might be able to deal with unary operators here, but that can involve tricky situations like "-3++1")
Or just find an expression parsing library as suggested. :)

Resources