Ruby: extract substring between 2nd and 3rd fullstops [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
i am constructing a program in Ruby which requires the value to be extracted between the 2nd and 3rd full-stop in a string.
I have searched online for various related solutions, including truncation and this prior Stack-Overflow question: Get value between 2nd and 3rd comma, however no answer illustrated a solution in the Ruby language.
Thanks in Advance.

list = my_string.split(".")
list[2]
That will do it I think. First command splits it into a list. Second gets the bit you want

You could split the string on full stops (aka periods), but that creates an array with one element for each substring preceding a full stop. If the document had, say, one million such substrings, that would be a rather inefficient way of getting just the third one.
Suppose the string is:
mystring =<<_
Now is the time
for all Rubiests
to come to the
aid of their
bowling team.
Or their frisbee
team. Or their
air guitar team.
Or maybe something
else...
_
Here are a couple of approaches you could take.
#1 Use a regular expression
r = /
(?: # start a non-capture group
.*?\. # match any character any number of times, lazily, followed by a full stop
){2} # end non-capture group and perform operation twice
\K # forget everything matched before
[^.]* # match everything up to the next full stop
/xm # extended/free-spacing regex definition mode and multiline mode
mystring[r]
#=> " Or their\nair guitar team"
You could of course write the regex:
r = /(?:.*?\.){2}\K[^.]*/m
but the extended form makes it self-documenting.
The regex engine will step through the string until it finds a match or concludes that there can be no match, and stop there.
#2 Pretend a full stop is a newline
First suppose we were looking for the third line, rather than the third substring followed by a full stop. We could write:
mystring.each_line.take(3).last.chomp
# => "to come to the"
Enumerable#take determines when a line ends by examining the input record separator, which is held by the global variable $/. By default, $/ equals a newline. We therefore could do this:
irs = $/ # save old value, normally \n
$/ = '.'
mystring.each_line.take(3).last[0..-2]
#=> " Or their\nair guitar team"
Then leave no footprints:
$/ = irs
Here String#each_line returns an enumerator (in effect, a rule for determining a sequence of values), not an array.

Related

Get selected value from string, from point to point [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
please is in ruby possible to get information from example name "Doe,Jon" (exact format) to get only the name "Jon"? Of course the name can be always different, I was thinking if is not possible to get the value from end of string to "," separator. If is it possible, how?
Thanks for your help.
So lets examine some of the solutions that are given to you in the comments
Split
"Doe,Jon".split(',').last
# or a bit more verbose
parts = "Doe,Jon".split(',') # ["Doe", "Jon"]
name = parts.last # "Jon"
String#split splits a sting into an array. It uses the parameter "," as separator. Array#last returns the last item from an array.
Gsub
"Doe,Jon".gsub(/.*,/, '')
String#gsub substitutes the part that matches the Regular Expression (/.*,/) with the substitution value ("").
The regexp matches everything (.*) up to (and including) the comma. And the replacement is an empty string, essentially deleting the part that matches the regexp.
Note that you could/should probably have an anchor to make the regexp more strict (/\A.*,/)
Slice
String#slice creates a substring given a range. -1 is a shortcut for the last element.
String#index finds the index of a character inside a String.
"Doe,Jon".slice(("Doe,Jon".index(',')+1)..-1)
# or more verbose
full = "Doe,Jon"
index_of_comma = full.index(',') # => 3
index_after_comma = index + 1
name = full.slice(index_after_comma..full.size)
CSV
CSV (Comma Separated Values) is a format where multiple values are separated by a comma (or other separation character).
require "csv"
CSV.parse("John,Doe")[0][1]
This will treat the name as CSV data and then access the first row of data (´[0]´). And from that row accesses the second element ([1]) which is the name.
Now what?
There are usually multiple ways to reach a goal. And it's up to you to pick a way. I'd go with the first one. To me it is easy to read and understand its purpose.

Starting a simple data entry program. A method for writing values in variables from an entry text ruby [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to write a simple program to collect data from pieces of texts by creating regular expressions to identify values through the phrases of the texts.
I want to start from something simple:
The car is red
I´m looking for a expression that let me to store the value red, or other possibles values such as blue, yellow, green if phrase changes. I want to do that from the interpreter or from a .txt file.
So my questions has two parts. One is to specify the value I want save. In this case "red", So I imagine a piece of code like {"The car is 'value'"} => value #color...Whatever be the regular expression that capture the syntax pattern of the language, Sorry I am not yet very familiar with the syntax of ruby , that's about precisely my question.
And the other part is about creating a variable to store the string "red" or whatever would be the case: yellow, green, brown...
I hope the question be clear.
It's pretty simple using a match group:
string = "The car is red. The car is blue"
regex = /The car is (\w+)/
matches = string.scan(regex)
print matches
# => [["red"], ["blue"]]
print matches.flatten(1)
# => ["red", "blue"]
(\w+) in the regex is the match group. The parenthesis are the start and end bounds of the match. The match is what will be returned by scan. You can use multiple match groups if you want.
\w in regex is a non-word-boundary character. \w+ effectively captures one word.
cucumber makes use of this approach
You need to build a regular expression that will extract desired word from the input. You can read about it in the official documentation about Regexp class. You can also use rubular.com to test and play with expressions on sample data. You can assign match result to a variable in Ruby in the following way:
match = "The car is red".match(/red/)
color = match[0] unless match.nil?
or in a single line:
color, *_ = "The car is red".match(/red/).to_a
# color => "red"
color, *_ = "The car is red".match(/blue/).to_a
# color => nil
match method can extract multiple variables at once. In the sample code above, we've used only one (color). The method returns MatchData object as the result. Calling to_a will convert it to the array (note that empty match will produce an empty array so it is safe in this case). At the end you can assign it to a variable and forget the rest of results (they will be single variable in array anyway).
I didn't provide you a regexp, because this is a topic that you can learn your self depending on the use case (for your simple example, there is an example in the another answer).

Adding \n at every # characters in string Ruby [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I would like to start a new line after every 66 characters for any file that is input into a Ruby script.
some_string.insert( 66, "\n" )
puts some_string
shows that a new line starts after the 66th character but I need it to happen after each 66th character. In other words, each line should be 66 characters long (except possibly the last).
I'm sure it involves a regex but I've tried various with insert, scan, gsub and cannot get it to work.
I'm new to Ruby and programming and this is the first thing I've tried outside of a tutorial. Thanks for the information, all.
You could do something like this:
<your_string>.scan(/.{1,66}/).join("\n")
It will basically split <your_string> at every 66th character and then re-join it by adding the \n between each part.
Or this variation to not split words in half:
<your_string>.scan(/.{1,66} /).join("\n")
some_string.gsub(/.{66}/, "\n")
If you're interested in exploring an answer that doesn't use RegEx, try something like:
a = "Your string goes here"
d = 66
Array(0..a.length/d).collect {|j| a[j*d..(j+1)*d-1]}.join("\n")
The RegEx is likely faster, but this uses the Array Constructor, .collect and .join so it might be an interesting learning exercise. The first part generates an array of numbers based on the number of chunks (a.length/d). The collect gathers the substrings in to an array. The body of the collect generates substrings by ranges on the original string, and the join puts it back together with '\n' separators.
Use the following to split the string into an array of strings of length 66 and join those strings with a newline character.
some_string.scan(/.{1,66}/).join("\n")

Validating version number using regex for user input [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I need to validate version numbers, e.g., 6.0.2/6.0.2.011 when a user enters them.
I checked to_i, but it doesn't serve my purpose. What can be a way to validate the version number? Can anyone let me know?
Here's a regular expression that matches a "valid" version number as per your specifications (only numbers separated by .):
/\A\d+(?:\.\d+)*\z/
This expression can be broken down as follows:
\A anchor the expression to the start of the string
\d+ match one or more digit character ([0-9])
(?: begin a non-capturing group
\. match a literal dot (.) character
\d+ match one or more digit character
)* end the group, and allow it to repeat 0 or more times
\z anchor the expression to the end of the string
This expression will only allow . when followed by at least one more number, but will allow any number of "sections" of the version number (ie. 6, 6.0, 6.0.2, and 6.0.2.011 will all match).
If you want to work with version numbers, I advise the versionomy (Github) gem.
See if this helps.
if a.length == a.scan(/\d|\./).length
# only dots and numbers are present
# do something
else
# do something else
end
e.g
a = '6.0.2.011'
a.length == a.scan(/\d|\./).length #=> true
b = '6b.0.2+2.011'
b.length == b.scan(/\d|\./).length #=> false
input length is checked against the scan outcome's length to ensure only dot and numbers are present. Having said that, it is very hard to guarantee that future version numbers will all follow the same conventions. How will you make sure that some one does not introduce something like 6a.0.2.011

Is it possible to make these regex shorter? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
As the topic suggest, is it possible to make these regex shorter? I am are using Ruby 1.9.3
/\n\s+(\w{0,3})[\s&&[^\n]\S]+?([\d\.]+)[\S\s&&[^\n]]+?([\d\.]+)/
and this
/\s+(\w+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+-*\s+(\d+)\s+(\d+)\s+/
Thanks!
/\n\s+(\w{0,3})[\s&&[^\n]\S]+?([\d\.]+)[\S\s&&[^\n]]+?([\d\.]+)/
If I understand ruby regular expressions correctly, [\s&&[^\n]\S] means that a character should be a whitespace character AND either a non-whitespace character or not a newline. As a character cannot be both a whitespace and non-whitespace character, you could probably shorten it to [\s&&[^\n]].
You could also remove the parentheses, (\w{0,3}) becomes \w{0,3}, but if you are trying to use the characters in those groups later on in your code, then you shouldn't.
/\s+(\w+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+-*\s+(\d+)\s+(\d+)\s+/
You could combine some of your statements, \s+\w+(\s+\d+){5}\s+-*(\s+\d+){2}\s+, but again this would cause headaches if your code actually uses those groups to extract information.
Are you essentially aiming to split a fixed-width-column web page?
Regexp is one way. You may be interested in a fixed-width-column approach:
uri = URI.parse 'http://www.ida.liu.se/~TDP007/material/seminarie2/weather.txt'
page = uri.read
rows = page.split(/\n/)[9..-3]
rows.each{|r|
day, max, mnt = r[0..3].strip, r[4..11].strip, r[12..17].strip
}
The following might not be shorter (if you count the number of characters needed to type it), but it is a lot more readable:
arr = ['(\w+)'] # Match a word
arr += ['(\d+)']*5 # Match five numbers
arr += ['-*'] # ignore dashes
arr += ['(\d+)']*2 # Match two numbers
# All of the above separated with space, plus space before and after.
my_regexp = Regexp.new(([''] + arr + ['']).join('\s+'))
If that is the only file that you need to process then you can remove unnecessary data by hand, then read the file line by line, split by space characters \s+ and pick out the columns.
Even without removing unnecessary data by hand, you can also read the original file line by line, split by \s+, and test whether the first few entries are numbers. This is exactly what you are doing with regex also (test format and extract data matching the format).
Note that [\s&&[^\n]\S] means intersecting \s and [^\n]\S, which results in the set: all space characters but new line. So we can rewrite it to [\s&&[^\n]]. However, [\S\s&&[^\n]] means intersecting \S\s and [^\n], which results in the set: all characters but new line. The equivalent rewrite is . or [^\n], but I doubt this is what you mean. The result will still be correct for the current input due to the lazy quantifier, but it might not for bad input.
Another thing is . will mean literal . inside character class, so [\d.] is equivalent to [\d\.].

Resources