How to split a string and skip whitespace? - ruby

I have a string like " This is a test ". I want to split the string by the space character. I do it like this:
puts " This is a test ".strip.each(' ') {|s| puts s.strip}
The result is:
This
is
a
test
This is a test
Why is there the last line "This is a test"?
And I need, that if there are two or more space characters between two words, that this should not return a "row".
I only want to get the words splitted in a given string.
Does anyone have an idea?

irb(main):002:0> " This is a test ".split
=> ["This", "is", "a", "test"]
irb(main):016:0* puts " This is a test ".split
This
is
a
test
str.split(pattern=$;, [limit]) => anArray
If pattern is omitted, the value of $;
is used. If $; is nil (which is the
default), str is split on whitespace
as if ` ’ were specified.

You should do
" This is a test ".strip.each(' ') {|s| puts s.strip}
If you don't want the last "this is a test"
Because
irb>>> puts " This is a test ".strip.each(' ') {}
This is a test

The first command "puts" will be put after the each-block is excecuted.
omit the first "puts" and you are done

Related

Regular Expression lookahead / lookback for punctuation patterns

Using Ruby, I want to find a regular expression that correctly identifies sentence boundaries, which I am defining as any string that ends in [.!?] except when these punctuation marks exist within quotation marks, as in
My friend said "John isn't here!" and then he left.
My current code that is falling short is:
text = para.text.scan(/[^\.!?]+[(?<!(.?!)\"|.!?] /).map(&:strip)
I've pored over the regex docs, but still can't seem to understand lookbacks/lookaheads correctly.
How about something like this?
/(?:"(?>[^"]|\\.)+"|[a-z]\.[a-z]\.|[^.?!])+[!.?]/gi
Demo: https://regex101.com/r/bJ8hM5/2
How it works:
The regex, will at each position in the string, check for the following
A quoted string in the form of "quote" which can contain anything up until the ending quote. You can also have escaped quotes, such as "hell\"o".
Match any letter, followed by a dot, followed by another letter, and finally a dot. This is to match your special case of U.S. etc.
Match everything else that isn't a punctation character .?!.
Repeat up until we reach a punctation character.
Here's a partial-regex solution that disregards sentence terminators that are contained between double-quotes.
Code
def extract_sentences(str, da_terminators)
start_with_quote = (str[0] == '"')
str.split(/(\".*?\")/)
.flat_map.with_index { |b,i|
(start_with_quote == i.even?) ? b : b.split(/([#{da_terminators}])/) }
.slice_after(/^[#{da_terminators}]$/)
.map { |sb| sb.join.strip }
end
Example
puts extract_sentences(str, '!?.')
# My friend said "John isn't here!", then "I'm outta' here" and then he left.
# Let's go!
# Later, he said "Aren't you coming?"
Explanation
For str above and
da_terminators = '!?.'
We will need the following later:
start_with_quote = (str[0] == '"')
#=> false
Split the string on "...". We need to make \".*?\" a capture group in order to keep it in the split. The result is an array, block that alternately has strings surrounded by double quotes and other strings. start_with_quote tells us which is which.
blocks = str.split(/(\".*?\")/)
#=> ["My friend said ",
# "\"John isn't here!\"",
# ", then ",
# "\"I'm outta' here\"",
# " and then he left. Let's go! Later, he said ",
# "\"Aren't you coming?\""]
Split the string elements that are not surrounded by double quotes. The split is on any of the sentence terminating characters. Again it must be in a capture group in order to keep the separator.
new_blocks = blocks.flat_map.with_index { |b,i|
(start_with_quote == i.even?) ? b : b.split(/([#{da_terminators}])/) }
#=> ["My friend said ",
# "\"John isn't here!\"",
# ", then ",
# "\"I'm outta' here\"",
# " and then he left",
# ".",
# " Let's go",
# "!",
# " Later, he said ",
# "\"Aren't you coming?\""
sentence_blocks_enum = new_blocks.slice_after(/^[#{da_terminators}]$/)
# #<Enumerator:0x007f9a3b853478>
Convert this enumerator to an array to see what it will pass into its block:
sentence_blocks_enum.to_a
#=> [["My friend said ",
# "\"John isn't here!\"",
# ", then ",
# "\"I'm outta' here\"",
# " and then he left", "."],
# [" Let's go", "!"],
# [" Later, he said ", "\"Aren't you coming?\""]]
Combine the blocks of each sentence and strip whitespace, and return the array:
sentence_blocks_enum.map { |sb| sb.join.strip }
#=> ["My friend said \"John isn't here!\", then \"I'm outta' here\" and then he left.",
# "Let's go!",
# "Later, he said \"Aren't you coming?\""]

Regex to match exact word in string

I've looked around but haven't been able to find a working solution to my problem.
I have an array of two strings input and want to test which element of the array contains an exact substring Test.
One thing I have tried (among numerous other attempts):
input = ["Test's string", "Test string"]
# Alternative input array that it needs to work on:
# ["Testing string", "some Test string"]
substring = "Test"
if (input[0].match(/\b#{substring}\b/))
puts "Test 0 "
# Do something...
elsif (input[1].match(/\b#{substring}\b/))
puts "Test 1"
# Do something different...
end
The desired result is a print of "Test 1". The input can be more complex but overall I am looking for a way to find an exact match of a substring in a longer string.
I feel like this should be a rather trivial regex but I haven't been able to come up with the correct pattern. Any help would be greatly appreciated!
Following code may be what you are looking for.
input = ["Testing string", "Test string"]
substring = "Test"
if (input[0].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 0 "
elsif (input[1].match(/[^|\s]#{substring}[\s|$]/)
puts "Test 1"
end
The meaning of the pattern /[^|\s]#{substring}[\s|$]/ is
[^|\s] : left side of the substring is begining of string(^) or white space,
{substring} : subsring is matched exactly,
[\s|$] : right side of the substring is white space or end of string($).
One way to that is as follows:
input = ["Testing string", "Test"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 1"
input = ["Test", "Testing string"]
"Test #{ input.index { |s| s[/\bTest\b/] } }"
#=> "Test 0"
\b is the regex denotes a word boundary.
Maybe you want a method to return the index of the first element of input that contains the word? That could be:
def matching_index(input, word)
input.index { |s| s[/\b#{word}\b/i] }
end
input = ["Testing string", "Test"]
matching_index(input, "Test") #=> 1
matching_index(input, "test") #=> 1
matching_index(input, "Testing") #=> 0
matching_index(input, "Testy") #=> nil
Then you could use it like this, for example:
word = 'Test'
puts "The matching element for '#{word}' is at index #{ matching_index(input, word) }"
#=> The matching element for 'Test' is at index 1
word = "Testing"
puts "The matching element for '#{word}' is '#{ input[matching_index(input, word)] }'"
#The matching element for 'Testing' is 'Testing string'
The problem is with your bounding. In your original question, the word Test will match the first string because the ' is will match the \b word boundary. It's a perfect match and is responding with "Test 0" correctly. You need to determine how you'll terminate your search. If your input contains special characters, I don't think the regex will work properly. /\bTest my $money.*/ will never match because the of the $ in your substring.
What happens if you have multiple matches in your input array? Do you want to do something to all of them or just the first one?

Split a string at every occurrence of particular character?

I would like to pass a sequence of characters into a function as a string and have it return to me that string split at the following characters:
# # $ % ^ & *
such that if the string is
'hey#man^you*are#awesome'
the program returns
'hey man you are awesome'
How can I do this?
To split the string you can use String#split
'hey#man^you*are#awesome'.split(/[##$%^&*]/)
#=> ["hey", "man", "you", "are", "awesome"]
to bring it back together, you can use Array#join
'hey#man^you*are#awesome'.split(/[##$%^&*]/).join(' ')
#=> "hey man you are awesome"
split and join should be self-explanatory. The interesting part is the regular expression /[##$%^&*]/ which matches any of the characters inside the character class [...]. The above code is essentially equivalent to
'hey#man^you*are#awesome'.gsub(/[##$%^&*]/, ' ')
#=> "hey man you are awesome"
where the gsub means "globally substitute any occurence of ##$%^&* with a space".
You could also use String#tr, which avoids the need to convert an array back to a string:
'hey#man^you*are#awesome'.tr('##$%^&*', ' ')
#=> "hey man you are awesome"

Ruby -- capitalize first letter of every sentence in a paragraph

Using Ruby language, I would like to capitalize the first letter of every sentence, and also get rid of any space before the period at the end of every sentence. Nothing else should change.
Input = "this is the First Sentence . this is the Second Sentence ."
Output = "This is the First Sentence. This is the Second Sentence."
Thank you folks.
Using regular expression (String#gsub):
Input = "this is the First Sentence . this is the Second Sentence ."
Input.gsub(/[a-z][^.?!]*/) { |match| match[0].upcase + match[1..-1].rstrip }
# => "This is the First Sentence. This is the Second Sentence."
Input.gsub(/([a-z])([^.?!]*)/) { $1.upcase + $2.rstrip } # Using capturing group
# => "This is the First Sentence. This is the Second Sentence."
I assumed the setence ends with ., ?, !.
UPDATE
input = "TESTest me is agreat. testme 5 is awesome"
input.gsub(/([a-z])((?:[^.?!]|\.(?=[a-z]))*)/i) { $1.upcase + $2.rstrip }
# => "TESTest me is agreat. Testme 5 is awesome"
input = "I'm headed to stackoverflow.com"
input.gsub(/([a-z])((?:[^.?!]|\.(?=[a-z]))*)/i) { $1.upcase + $2.rstrip }
# => "I'm headed to stackoverflow.com"
Input.split('.').map(&:strip).map { |s|
s[0].upcase + s[1..-1] + '.'
}.join(' ')
=> "This is the First Sentence. This is the Second Sentence."
My second approach is cleaner but produces a slightly different output:
Input.split('.').map(&:strip).map(&:capitalize).join('. ') + '.'
=> "This is the first sentence. This is the second sentence."
I'm not sure if you're fine with it.

How do I reverse words not the characters using Ruby?

I want to reverse the words of a text file:
If my input is:
Hello World
My output should be:
World Hello
I tried this:
File.open('teste.txt').each_line do |line|
print line.reverse.gsub(/\n/,"")
end
but I got the characters reversed.
"Hello World".split.reverse.join(" ")
=> "World Hello"
It splits the string into an array with a whitespace being the default delimiter. Then it reverses the array and concatenates the strings in the array using a white space as well.
Your solution should look like this:
File.open("test.txt").each_line do |line|
puts line.split.reverse.join(" ")
end
puts appends a linebreak after the output, while print does not. This is neccessary, because split discards the original linebreak on each line, when splitting it into an array of words.
Break string into words and reverse that.
"Hello World".split.reverse.join(' ') # => "World Hello"
Split the string on spaces, reverse it, then join them together.
You could make a method to do it as such:
def reverse_words(string)
return string.split(" ").reverse.join(" ")
end
then later, call that method with:
print reverse_words("Hello World")
Or set a string to the returned value:
reversed_string = reverseWords("Hello World")

Resources