get groups of characters n at a time - ruby

I'm trying to group a string by three (but could be any number) characters at a time. Using this code:
"this gets three at a time".scan(/\w\w\w/)
I get:
["thi","get","thr","tim"]
But what I'm trying to get is:
["thi","sge","tst","hre","eat","ati","me"]

\w matches letters digits and underscores (i.e. it's shorthand for [a-zA-Z0-9_]), not spaces. It does not magically skip spaces though, as you seem to expect.
So you'll first have to remove the spaces:
"this gets three at a time".gsub(/\s+/, "").scan(/.../)
or non-word characters:
"this gets three at a time".gsub(/\W+/, "").scan(/.../)
before you match the three characters.
Although you should rather use
"this gets three at a time".gsub(/\W+/, "").scan(/.{1,3}/)
to also obtain the last 1 or 2, if the length is not divisible by 3.

"this gets three at a time".tr(" \t\n\r", "").scan(/.{1,3}/)

You can try these as well:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/\w\w\w/) // no change in regex
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/.{1,3}/)
Or:
sentence = "this gets three at a time"
sentence[" "] = ""
sentence.scan(/[a-zA-Z]{1,3}/)

Related

Check 4 chars with regex Ruby console input

I have 4 chars, first one is letter 'L' for example, the other two are numbers and the last one is letter again, all of them are separated by one space. User is entering them in the Ruby console. I need to check that they are separated by one space and don't have other weird characters and that there is nothing after the last letter.
So if a user enters for example gets.chomp = 'L 5 7 A', I need to check that everything is ok and separated by only one space and return input[1], input[2], input[3]. How can I do that? Thanks.
You can do something like this:
puts "Enter string"
input = gets.chomp
r = /^(L)\s(\d)\s(\d)\s([A-Z])$/
matches = input.match r
puts matches ? "inputs: #{$1}, #{$2}, #{$3}, #{$4}" : "input-format incorrect"
Here $1 is the first capture, similarly for $2, $3 etc. If you want to store the result in an array you can use:
matches = input.match(r).to_a
then the first element is the entire match, followed by each capture.
Try
/^\w\s(\d)\s(\d)\s(\w)$/
Rubular is a good sandbox site for experimenting with and debugging regexes.

Splitting a string on variable numbers of words

The following question was posted by #ruhroe about an hour ago. I was about to post an answer when it was taken down. That's unfortunate, as I thought it was rather interesting. I'm putting it back up in case the OP sees this and also to give others an opportunity to post solutions.
The original question (which I've edited):
The problem is to split a string on some spaces in the string, based on criteria which depend in part on a number given by the user. If that number were, say, 5, each substring would contain either:
one word having 5 or more characters or
as many consecutive words (separated by spaces) as possible, provided the resulting string has at most 5 characters.
For example, if the string were:
"abcdefg fg hijkl mno pqrs tuv wx yz"
the result would be:
["abcdefg", "fg", "hijkl", "mno", "pqrs", "tuv", "wx yz"]
"abcdefg" is on a separate line because it has at least five characters.
"fg" is on a separate line because "fg" contains 5 or few characters and when combined with the following word, with a space between them, the resulting string, "fg hijkl", contains more than 5 characters.
"hijkl" is on a separate line because it satisfies both criteria.
How can I do that?
I believe this does it:
str = "abcdefg fg hijkl e mn pqrs tuv wx yz"
str.scan(/\b(?:\w{5,}|\w[\w\s]{0,3}\w|\w)\b/)
#=> ["abcdefg", "fg", "hijkl", "e mn", "pqrs", "tuv", "wx yz"]
As you iterate through the words in your collection (splitting the original string up into words should be trivial), it seems like there are three possible scenarios:
It's a blank line, and we should insert the current word into the line
It's a non-blank line, and the word can fit
It's a non-blank line, and the word can't fit and it should go into a new line
Something like this should work (note - I haven't tested this much outside of your solution. You'll definitely want to do that):
words.each do |word|
if line.blank?
# this is a new line, so start it with the current word
line << word
elsif word_can_fit_line?(line, word, length)
# the word fits, so append it to the current line
line << " #{word}"
else
# the word doesn't fit, so keep this line and start a new one with
# the current word
lines << line
line = word
end
end
# add the last line and we're done
lines << line
lines
Note that the implementation of word_can_fit_line? should be trivial - you just want to see if the current line length, plus a space, plus the word length, is less than or equal to your desired line length.

Ruby regex to capture any string with a number

I am looking for a regular expression in Ruby to capture a sentence that has any sort of number in it.
For instance, I need to capture all of the following:
"5 different ways to do it"
"2 x 2 is certainly 4"
"there are 15 different things"
"Try to get to 10"
I only want to capture sentences with a number within, but that has nothing else before or after the number. I don't want to include things like:
"$2 billion dollars"
"The 5x effect"
It has to be just a sequence for 1 or more numbers at the beginning, middle, or end of a sentence.
Thanks.
You probably want something like:
/^.*(?<!\S)\d+(?!\S).*$/
Which will match a number and "look-around" for a non-space.
This
(s =~ /(^|\s)\d+(\s|$)/) ? s : nil
will return the string s if it contains at least one non-negative integer, that is:
the entire string,
at the beginning of the string followed by a whitespace character,
at the end the string preceded by a whitespace character, or
is both preceded and followed by a whitespace character.

Ruby regular expressions for movie titles and ratings

The quiz problem:
You are given the following short list of movies exported from an Excel comma-separated values (CSV) file. Each entry is a single string that contains the movie name in double quotes, zero or more spaces, and the movie rating in double quotes. For example, here is a list with three entries:
movies = [
%q{"Aladdin", "G"},
%q{"I, Robot", "PG-13"},
%q{"Star Wars","PG"}
]
Your job is to create a regular expression to help parse this list:
movies.each do |movie|
movie.match(regexp)
title,rating = $1,$2
end
# => for first entry, title should be Aladdin, rating should be G,
# => WITHOUT the double quotes
You may assume movie titles and ratings never contain double-quote marks. Within a single entry, a variable number of spaces (including 0) may appear between the comma after the title and the opening quote of the rating.
Which of the following regular expressions will accomplish this? Check all that apply.
regexp = /"([^"]+)",\s*"([^"]+)"/
regexp = /"(.*)",\s*"(.*)"/
regexp = /"(.*)", "(.*)"/
regexp = /(.*),\s*(.*)/
Would someone explain why the answer was (1) and (2)?
Would someone explain why the answer was (1) and (2)?
The resulting strings will be similar to "Aladdin", "G" let's take a look at the correct answer #1:
/"([^"]+)",\s*"([^"]+)"/
"([^"]+)" = at least one character that is not a " surrounded by "
, = a comma
\s* = a number of spaces (including 0)
"([^"]+)" = like first
Which is exactly the type of strings you will get. Let's take a look at the above string:
"Aladdin", "G"
#^1 ^2^3^4
Now let's take at the second correct answer:
/"(.*)",\s*"(.*)"/
"(.*)" = any number (including 0) of almost any character surrounded by ".
, = a comma
\s* = any number of spaces (including 0)
"(.*)" = see first point
Which is correct as well as the following irb session (using Ruby 1.9.3) shows:
'"Aladdin", "G"'.match(/"([^"]+)",\s*"([^"]+)"/) # number 1
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
'"Aladdin", "G"'.match(/"(.*)",\s*"(.*)"/) # number 2
# => #<MatchData "\"Aladdin\", \"G\"" 1:"Aladdin" 2:"G">
Just for completeness I'll tell why the third and fourth are wrong as well:
/"(.*)", "(.*)"/
The above regex is:
"(.*)" = any number (including 0) of almost any character surrounded by "
, = a comma
= a single space
"(.*)" = see first point
Which is wrong because, for example, Aladdin takes more than one character (the first point) as the following irb session shows:
'"Aladdin", "G"'.match(/"(.*)", "(.*)"/) # number 3
# => nil
The fourth regex is:
/(.*),\s*(.*)/
which is:
(.*) = any number (including 0) of almost any character
, = a comma
\s* = any number (including 0) of spaces
(.*) = see first point
Which is wrong because the text explicitly says that the movie titles do not contain any number of " character and that are surrounded by double quotes. The above regex does not checks for the presence of " in movie titles as well as the needed surrounding double quotes, accepting strings like "," (which are not valid) as the following irb session shows:
'","'.match(/(.*),\s*(.*)/) # number 4
# => #<MatchData "\",\"" 1:"\"" 2:"\"">

ruby regex to extract two parts: digits, then whatever comes after

My user input is a string I need to split into two parts, (1) a partial phone number [any sequence of digits - . space, parens so I assume that is represented by /[\d\. \-\(\)]/ ] and (2) whatever follows (if anything).
For example
"88 comment" -> "88" & "comment"
"415-915 second part" --> "415-915" & "second part"
"(415) 915 part 2" --> "(415) 915" & "part 2"
"a note" --> "" & "a note"
"part 2" --> "" & "part 2"
As a relative newbie to ruby and regex, I have no idea how to extract multiple parts, and how to define the second part as being whatever comes after the first part (which basically means whatever comes after anything that doesn't match the first part)
Here's the regex (I'll explain below):
/^([-\d. ()]*)(.*)$/
^ means "start at the beginning of the string"
In ([-\d. ()]*), the * means "match any number of the previous character, and the parens mean to create a match group (this is how you will get the value later). So this is the first sequence.
In (.*), . means "match any single character", so .* means "match any number of any characters", it's basically a catch-all. The parens create a second match group.
$ means "finish at the end of the string"
So in ruby:
string =~ /^([-\d. ()]*)(.*)$/
puts $1.strip # is the phone number (with excess whitespace removed)
puts $2.strip # is the rest (with excess whitespace removed)
Try /([\d.\s()/-]*)(.+)/ The first group will capture the number, the second one the "other" part. I don't know ruby, so you have to implement that pattern yourself.

Resources