Is there ruby methods to select string between other strings? - ruby

I'm starting in programming and I'm looking to make a program for extracting all the words contained between two words within a text (in order store them in a variable )
For example with the words "START" & "STOP":
"START 1 2 3 STOP 5 6 START 7 8 STOP 9 10"
I would like to store in variables: 1 2 3 7 8
I started to do it with Ruby as you can see in the code below, my current idea was to convert the string "global" into an array and then number the position of string1 and string2; then create an array ‘string1’ with the values of the initial array # string1 + 1,… string2 -1.
Unfortunately, it works only once because the .index function only works on the first occurence...would there be a better way to do that ?
Thank you in advance for your help
text = "0 start 2 3 4 stop 6 7 start 9 10 stop 12"
start= text.split(' ')
a = start.index('start')
b = start.index('stop')
puts a
puts b
puts c = start[a+1,b-a-1].join(" ")
# returns
#1
#5
#2 3 4 ```

You could start with the scan-method and a regular expression:
text = "0 start 2 3 4 stop 6 7 start 9 10 stop 12"
res1 = text.scan(/start\s*(.*?)\s*stop/) #[["2 3 4"], ["9 10"]]
res2 = res1.flatten #["2 3 4", "9 10"]
or without the intermediate variables:
res = text.scan(/start(.*?)stop/).flatten #["2 3 4", "9 10"]
Explanation:
See https://apidock.com/ruby/String/scan for the scan method.
The regular expression /start\s*(.*?)\s*stop/ is the combination of
start
\s*: any space character
(.*?):
The (and ) is responsible to remember the content.
. means any character, * means a repetition (zero or more characters), ? restrict the result to the shortest possibility (see below for details)
\s*: any space character
stop
The result is an array with hits of the regular expression. The regular expression could contain different parts to detect (multiple ()-pairs). So it is an array of arrays. In our case, each inner array has one element, so you can use flatten to get a 'flat' array.
If you would not use the ? in the regular expression, then you would find 2 3 4 stop 6 7 start 9 10 instead of the shorter parts.

You are not exactly getting an error, codereview might be a better place to ask. But since you are new in the community, here is a regular expression with lookaround assertions that does the job:
text = "0 start 2 3 4 stop 6 7 start 9 10 stop 12"
text.scan(/start ((?:(?!start).)*?) stop/).join(' ')
# => "2 3 4 9 10"
Btw, a great place to test you regular expressions in Ruby is https://rubular.com/
I hope you find this helpful.

A One-Line Method Chain
Here's an approach based on String#scan:
text = "0 start 2 3 4 stop 6 7 start 9 10 stop 12"
text.scan(/\bstart\s+(.*?)\s+stop\b/i).flat_map { _1.flat_map &:split }
#=> ["2", "3", "4", "9", "10"]
The idea here is to:
Extract all string segments that are bracketed between case-insensitive start and stop keywords.
text.scan /\bstart\s+(.*?)\s+stop\b/i
#=> [["2 3 4"], ["9 10"]]
Extract words separated by whitespace from between your keywords.
[["2 3 4"], ["9 10"]].flat_map { _1.flat_map &:split }
#=> ["2", "3", "4", "9", "10"]
Caveats
Notable caveats to the approach outlined above include:
String#scan creates nested arrays, and the repeated calls to Enumerable#flat_map used to handle them are less elegant than I might prefer.
\b is a zero-width assertion, so looking for word boundaries can cause #scan to include leading and trailing whitespace in the results that then need to be handled by String#strip or String#split.
Substituting \s+ for \b handles some edge cases while creating others.
It doesn't do anything to guard against unbalanced pairs, e.g. "start 0 start 2 3 4 stop 6 stop".
For simple use cases, String#scan with a tuned regex is probably all you need. The more varied and unpredictable your input and data structures are, the more edge cases your parsing routines will need to handle.

Option using array: as a starting point I could suggest using Enumerable#slice_before after String#split
Given your command and the stop-words:
command = "START 1 2 3 STOP 5 6 START 7 8 STOP 9 10"
start = 'START'
stop = 'STOP'
You can use it something like that:
grouped_cmd = command.split.slice_before { |e| [start, stop].include? e } # .to_a
#=> [["START", "1", "2", "3"], ["STOP", "5", "6"], ["START", "7", "8"], ["STOP", "9", "10"]]
Then you can manipulate as you like, for example:
grouped_cmd.select { |first, *rest| first == start }
#=> [["START", "1", "2", "3"], ["START", "7", "8"]]
Or
grouped_cmd.each_with_object([]) { |(first, *rest), ary| ary << rest if first == start }
#=> [["1", "2", "3"], ["7", "8"]]
Or even
grouped_cmd.each_slice(2).map { |(start, *stt), (stop, *stp)| { start.downcase.to_sym => stt, stop.downcase.to_sym => stp } }
#=> [{:start=>["1", "2", "3"], :stop=>["5", "6"]}, {:start=>["7", "8"], :stop=>["9", "10"]}]
And so on.

Related

How to make a repeated string to the left be deleted without using While?

For example, I have this string of only numbers:
0009102
If I convert it to integer Ruby automatically gives me this value:
9102
That's correct. But my program gives me different types of numbers:
2229102 desired output => 9102
9999102 desired output => 102
If you look at them I have treated 2 and 9 as zeros since they are automatically deleted, well, it is easy to delete that with an while but I must avoid it.
In other words, how do you make 'n' on the left be considered a zero for Ruby?
"2229102".sub(/\A(\d)\1*/, "") #=> "9102"`.
The regular expression reads, "match the first digit in the string (\A is the beginning-of-string anchor) in capture group 1 ((\d)), followed by zero or more characters (*) that equal the contents of capture group 1 (\1). String#gsub converts that match to an empty string.
Try with Enumerable#chunk_while:
s = '222910222'
s.each_char.chunk_while(&:==).drop(1).join
#=> "910222"
Where s.each_char.chunk_while(&:==).to_a #=> [["2", "2", "2"], ["9"], ["1"], ["0"], ["2", "2", "2"]]
Similar to the solution of iGian you could also use drop_while.
s = '222910222'
s.each_char.each_cons(2).drop_while { |a, b| a == b }.map(&:last).join
#=> "910222"
# or
s.each_char.drop_while.with_index(-1) { |c, i| i < 0 || c == s[i] }.join
#=> "910222"
You can also try this way:
s = '9999102938'
s.chars.then{ |chars| chars[chars.index(chars.uniq[1])..-1] }.join
=> "102938"

How to extract number from array of string? (I m using regex)

I have a array of string
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
I want to grab the number after Test: which in the above array of string are 3, 4, 5, 6, 7 ( range 3-7) and 9
Desired output:
["3","4","5","6","7","9"]
What I tried so far
test.join.scan(/(?<=Test: )[0-9]+/)
=> ["3", "7"]
How to deal with range?
Second test case:
test= ["ChangeServer<br/>Test: 3-7<br/>PinCode:DFSFSDFB04008<br/>ShipCode:DFADFSDFSDM-000D3<br/>SomeCode:sdfsdf", "werwerwe", "adfsdfsd",
"sdfsdfsdfsd<br/>Test: 9<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>", "sdfsdfsdfsd<br/>Test: 15-18<br/>PinCode:ADFSDF4NS0<br/>ShipCode:FADFSDFD-9ZM170<br/>"]
Desired output:
["3","4","5","6","7","9","15","16","17","18"]
There are a lot of ways you could solve this. I'd probably do it this way:
test.flat_map do |s|
_, m, n = *s.match(/Test:\s*(\d+)(?:-(\d+))?/)
m ? (m..n||m).to_a : []
end
See it in action on repl.it: https://repl.it/JFwT/13
Or, more succinctly:
test.flat_map {|s| s.match(/Test:\s*(\d+)(?:-(\d+))?/) { $1..($2||$1) }.to_a }
https://repl.it/JFwT/11
You could create a new Range for each range found (i.e N-N) using the splat operator (i.e. *) and combine the results, like this 1:
test.join.scan(/(?<=Test: )[0-9-]+/)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
#=> ["3", "4", "5", "6", "7", "9"]
This will work for both examples.
1 Notice the the added - next to 0-9 in the regex.
Is the a way where we can include both Test: 1 (with space between
Test: and 1) and Test:1 (without space between Test: and 1)?
Yes, update your regex (change where space is placed) and add an additional map to get rid of those spaces:
test.join
.scan(/(?<=Test:)[ 0-9-]+/)
.map(&:strip)
.flat_map { |r| Range.new(*r.split('-').values_at(0, -1)).to_a }
And here's shortened option using two captures in the regex, as suggested by Jordan.
test.join
.scan(/Test:\s*(\d+)(?:-(\d+))?/)
.flat_map { |m,n| (m..n||m).to_a }
Just out of curiosity:
test.
join.
scan(/(?<=Test: )[\d-]+/).
map { |e| e.gsub(/\A\d+\Z/) { |m| "#{m}..#{m}" }.gsub('-', '..') }.
map(&method(:eval)).
flat_map(&:to_a)

Adding/Deleting numbers in an array via user input

I want to add/remove numbers in an array based on user input. Here's what I tried:
a = %w[1 2 3 4 5 6 7 8 9]
delete_list = []
puts a
puts "pick 1-9 to del"
input = gets.to_i
input << a
puts a
The last line is to check if it worked, and I get "no implicit conversion of Array into Integer". Is this because I used %w and the array isn't integer based?
a = %w[1 2 3 4 5 6 7 8 9]
a.map! {|e| e.to_i}
puts a
puts "pick 1-9 to del"
input = gets.chomp
a.delete(input)
puts a
Well, I changed it up like so. But I don't seem to be having success with the a.delete(input) command, as my array still prints out 1-9. What am I doing wrong?
To remove an element at specific position use Array#delete_at:
input = gets.to_i
a.delete_at(input - 1) # index starts from `0`
If you want to delete item not by position, but by value, use Array#delete.
input = gets.chomp # `a` contains strings; used `.chomp` instead of `.to_i`
a.delete(input)
Yes. It is because the argument to Fixnum#<< has to be an integer, not an array.
Focusing on the key lines of code:
a = %w[1 2 3 4 5 6 7 8 9]
This makes the variable "a" an array of string elements:
=> ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
Then you set this variable:
input = gets.to_i
This gets a string from the user ("gets" - like an abbreviation of the name getStringFromUser) and then .to_i turns it to an integer.
This would have likely resulted in a "0" (if letters entered) or whatever integer was entered:
=>0 OR => #some integer
Then you tried to put an array into the integer:
input << a
Ruby tried to take the "a" array of elements (class Array) and cram it into that integer (aka: class Fixnum) variable "input". This is where you got your error - Ruby can't put an array into an integer using a method like "<<".
If you replaced the line:
input << a
With:
a << input
You'll at least get a functional result.
If the "gets" was say, input=9, then your last puts a would give you:
=> ["1", "2", "3", "4", "5", "6", "7", "8", "9", 9]
Which is an Array element that consists of a bunch of string elements and an integer element that was pushed to the end.
Now, from your puts "pick 1-9 to del", it seems like you want to delete an element from the array.
First, you'll want your array to be integers and not strings... something like:
a.map! {|e|e.to_i}
=> [1, 2, 3, 4, 5, 6, 7, 8, 9]
(if you hadn't converted the input to an integer, you could skip that last step... or oddly convert the "input" back to a string with input.to_s)
Now that "a" is an array of integers, you can delete one using the "delete" method for Arrays and telling it to delete the value of the "input" variable:
a.delete(input)
=> 9
#it returns the value you deleted.
Your last puts a would return:
=> [1, 2, 3, 4, 5, 6, 7, 8]
It's a long step-wise answer, but hopefully that helps.

Ruby regex for a split every four characters not working

I'm trying to split a sizeable string every four characters. This is how I'm trying to do it:
big_string.split(/..../)
This is yielding a nil array. As far as I can see, this should be working. It even does when I plug it into an online ruby regex test.
Try scan instead:
$ irb
>> "abcd1234beefcake".scan(/..../)
=> ["abcd", "1234", "beef", "cake"]
or
>> "abcd1234beefcake".scan(/.{4}/)
=> ["abcd", "1234", "beef", "cake"]
If the number of characters isn't divisible by 4, you can also grab the remaining characters:
>> "abcd1234beefcakexyz".scan(/.{1,4}/)
=> ["abcd", "1234", "beef", "cake", "xyz"]
(The {1,4} will greedily grab between 1 and 4 characters)
Hmm, I don't know what Rubular is doing there and why - but
big_string.split(/..../)
does translate into
split the string at every 4-character-sequence
which should correctly result into something like
["", "", "", "abc"]
Whoops.
str = 'asdfasdfasdf'
c = 0
out = []
inum = 4
(str.length / inum).round.times do |s|
out.push(str[c, round(s * inum)])
c += inum
end

Get numbers from string

I got a string:
"1|2 3 4 oh 5 oh oh|e eewrewr|7|".
I want to get the digits between first pipes (|), returning "2 3 4 5".
Can anyone help me with the regular expression to do that?
Does this work?
"1|2 3 4 oh 5 oh oh|e eewrewr|7|".split('|')[1].scan(/\d/)
Arun's answer is perfect if you want only digits.
i.e.
"1|2 3 4 oh 5 oh oh|e eewrewr|7|".split('|')[1].scan(/\d/)
# Will return ["2", "3", "4", "5"]
"1|2 3 4 oh 55 oh oh|e eewrewr|7|".split('|')[1].scan(/\d/)
# Will return ["2", "3", "4", "5", "5"]
If you want numbers instead,
# Just adding a '+' in the regex:
"1|2 3 4 oh 55 oh oh|e eewrewr|7|".split('|')[1].scan(/\d+/)
# Will return ["2", "3", "4", "55"]
if you want to use just regex...
\|[\d\s\w]+\|
and then
\d
but that's probably not the best solution

Resources