I'm looking to extract elements of an array containing a version number, where a version number is either at the start or end of a string or padded by spaces, and is a series of digits and periods but does not start or end with a period. For example "10.10 Thingy" and "Thingy 10.10.5" is valid, but "Whatever 4" is not.
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)*(?=$| )/] }
=> ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4"]
I'm not sure how to modify the regex to require at least one period so that "Whatever 4" is not in the results.
This is only a slight variant of Archonic's answer.
r = /
(?<=\A|\s) # match the beginning of the string or a space in a positive lookbehind
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
The question was not to return the version information, but to to return the strings that have correct version information. As a result there is no need for the lookarounds:
r = /
[\A\s\] # match the beginning of the string or a space
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
[\s\z] # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
Suppose one wanted to obtain both the strings that contain valid versions and the versions contained in those strings. One could write the following:
r = /
(?<=\A|\s\) # match the beginning of string or a space in a pos lookbehind
(?:\d+\.)+ # match >= 1 digits then a period in non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or end of string in a pos lookahead
/x # free-spacing regex definition mode
haystack.each_with_object({}) do |str,h|
version = str[r]
h[str] = version if version
end
# => {"10.10 Thingy"=>"10.10", "Thingy 10.10.5"=>"10.10.5"}
Ah hah! I knew I was close.
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)+(?=$| )/] }
The asterisk at the end of (\.\d+)* was allowing that pattern to repeat any number of times, including zero times. You can limit that with (\.\d+){x,y} where x and y are the min and max times. You can also only identify a minimum with (\.\d+){x,}. In my case I wanted a minimum of once, which would be (\.\d+){1,}, however that's synonymous with (\.\d+)+. That only took half the day to figure out...
Related
I am busy working through some problems I have found on the net and I feel like this should be simple but I am really struggling.
Say you have the string 'AbcDeFg' and the next string of 'HijKgLMnn', I want to be able to find the same characters in the string so in this case it would be 'g'.
Perhaps I wasn't giving enough information - I am doing Advent of Code and I am on day 3. I just need help with the first bit which is where you are given a string of characters - you have to split the characters in half and then compare the 2 strings. You basically have to get the common character between the two. This is what I currently have:
file_data = File.read('Day_3_task1.txt')
arr = file_data.split("\n")
finals = []
arr.each do |x|
len = x.length
divided_by_two = len / 2
second = x.slice!(divided_by_two..len).split('')
first = x.split('')
count = 0
(0..len).each do |z|
first.each do |y|
if y == second[count]
finals.push(y)
end
end
count += 1
end
end
finals = finals.uniq
Hope that helps in terms of clarity :)
Did you try to convert both strings to arrays with the String#char method and find the intersection of those arrays?
Like this:
string_one = 'AbcDeFg'.chars
string_two = 'HijKgLMnn'.chars
string_one & string_two # => ["g"]
One way to do that is to use the method String#scan with the regular expression
rgx = /(.)(?!.*\1.*_)(?=.*_.*\1)/
I'm not advocating this approach. I merely thought some readers might find it interesting.
Suppose
str1 = 'AbcDgeFg'
str2 = 'HijKgLMnbn'
Now form the string
str = "#{str1}_#{str2}"
#=> "AbcDeFg_HijKgLMnbn"
I've assumed the strings contain letters only, in which case they are separated in str with any character other than a letter. I've used an underscore. Naturally, if the strings could contain underscores a different separator would have to be used.
We then compute
str.scan(rgx).flatten
#=> ["b", "g"]
Array#flatten is needed because
str.scan(rgx)
#=>[["b"], ["g"]]
The regular expression can be written in free-spacing mode to make it self-documenting:
rgx =
/
(.) # match any character, same to capture group 1
(?! # begin a negative lookahead
.* # match zero or more characters
\1 # match the contents of capture group 1
.* # match zero or more characters
_ # match an underscore
) # end the negative lookahead
(?= # begin a positive lookahead
.* # match zero or more characters
_ # match an underscore
.* # match zero or more characters
\1 # match the contents of capture group 1
) # end the positive lookahead
/x # invoke free-spacing regex definition mode
Note that if a character appears more than once in str1 and at least once in str2 the negative lookahead ensures that only the last one in str1 is matched, to avoid returning duplicates.
Alternatively, one could write
str.gsub(rgx).to_a
The uses the (fourth) form of String#gsub which takes a single argument and no block and returns an enumerator.
I have a large file, and I want to be able to check if a word is present twice.
puts "Enter a word: "
$word = gets.chomp
if File.read('worldcountry.txt') # do something if the word entered is present twice...
How can i check if the file worldcountry.txt include twice the $word i entered ?
I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby
On the Gerry post with this code
word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
Thanks, i will pay more attention next time.
If the file is not overly large, it can be gulped into a string. Suppose:
str = File.read('cat')
#=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.
Suppose the given word is 'dog'.
Confirm the file contains at least two instances of the given word
One can attempt to match the regular expression
r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
#=> true
Demo
Confirm the file contains exactly two instances of the given word
Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let
r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
#=> false
Demo
The two regular expressions can be written in free-spacing mode to make them self-documenting.
r1 = /
\bdog\b # match 'dog' surrounded by word breaks
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
/m # cause . to match newlines
r2 = /
\A # match beginning of string
(?: # begin non-capture group
(?: # begin non-capture group
. # match one character
(?! # begin negative lookahead
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
) # end non-capture group
* # execute preceding non-capture group zero or more times
\bdog\b # match 'dog' surrounded by word breaks
) # end non-capture group
{2} # execute preceding non-capture group twice
(?! # begin negative lookahead
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
/xm # # cause . to match newlines and invoke free-spacing mode
If I have a string that's a sentence, I want to check if the first and last letter of each word are the same and find which of the words have their first and last letter the same. For example:
sentence_one = "Label the bib numbers in red."
You could use a regex:
sentence_one = "Label the bib numbers in red"
sentence_one.scan(/(\b(\w)\w*(\2)\b)/i)
#=> [["Label", "L", "l"], ["bib", "b", "b"]]
\b is a word boundary, \w matches a letter (you may have to adjust this). There are 3 captures: (1) the whole word, (2) the first letter and (3) the last letter. Using \2 requires the last letter to match the first.
This will print out all words that start with and end with the same letter (not case-sensitive)
sentence_one = "Label the bib numbers in red"
words = sentence_one.split(' ')
words.each do |word|
if word[0].downcase == word[-1].downcase
puts word
end
end
sentence_one.scan(/\S+/).select{|s| s[0].downcase == s[-1].downcase}
# => ["Label", "bib"]
In a comment the OP asked how one could obtain a count of words having the desired property. Here's one way to do that. I assume that the desired property is that a word's first and last characters are the same, though possibly of different case. Here is a way to do that that does not produce an intermediate array whose elements would be counted.
r = /
\b # match a word break
(?: # begin a non-capture group
\p{Alpha} # match a letter
| # or
(\p{Alpha}) # match a letter in capture group 1
\p{Alpha}* # match zero or more letters
\1 # match the contents of capture group 1
) # end the non-capture group
\b # match a word break
/ix # case-indifferent and free-spacing regex definition modes
str = "How, now is that a brown cow?"
str.gsub(r).count
#=> 2
See String#gsub, in particular the case where there is only one argument and no block is provided.
Note
str.gsub(r).to_a
#=> ["that", "a"]
str.scan(r)
#=> [["t"], [nil]]
Sometimes it is awkward to use scan when the regular expression contains capture groups (see String#scan). Those problems often can be avoided by instead using gsub followed by to_a (or Enumerable#entries).
Just to add one option more splitting to array (skipping one letter words):
sentence_one = "Label the bib numbers in a red color"
sentence_one.split(' ').keep_if{ |w| w.end_with?(w[0].downcase) & (w.size > 1) }
#=> ["Label", "bib"]
sentence_one = "Label the bib numbers in red"
puts sentence_one.split(' ').count{|word| word[0] == word[-1]} # => 1
I've been having difficulty trying to figureout how to go about solving this issue. I have 2 kinds of URLs in which I need to be able to update/increment the number value for the page.
Url 1:
forum-351-page-2.html
In the above, I would like to modify this url for n pages. So I'd like to generate new urls with a given range of say page-1 to page-30. But that's all I'd like to change. page-n.html
Url 2:
href="forumdisplay.php?fid=115&page=3
The second url is different but I feal it's easier visit.
R = /
(?: # begin non-capture group
(?<=-page-) # match string in a positive lookbehind
\d+ # match 1 or more digits
(?=\.html) # match period followed by 'html' in a positive lookahead
) # close non-capture group
| # or
(?: # begin non-capture group
(?<=&page=) # match string in a positive lookbehind
\d+ # match 1 or more digits
\z # match end of string
) # close non-capture group
/x # free-spacing regex definition mode
def update(str, val)
str.sub(R, val.to_s)
end
update("forum-351-page-2.html", 4)
#=> "forum-351-page-4.html"
update("forumdisplay.php?fid=115&page=3", "4")
#=> "forumdisplay.php?fid=115&page=4"
For the first url
url1 = "forum-351-page-2.html"
(1..30).each do |x|
puts url1.sub(/page-\d*/, "page-#{x}")
end
This will output
"forum-351-page-1.html"
"forum-351-page-2.html"
"forum-351-page-3.html"
...
"forum-351-page-28.html"
"forum-351-page-29.html"
"forum-351-page-30.html"
You can do the same thing for the second url.
url1.sub(/page=\d*$/, "page=#{x}")
Picked up Ruby recently and have been fiddling around with it. I wanted to learn how to use regex or other Ruby tricks to check for certain words, whitespace characters, valid format etc in a given text line.
Let's say I have an order list that looks strictly like this in this format:
cost: 50 items: book,lamp
One space after semicolon, no space after each comma, no trailing whitespaces at the end and stuff like that.
How can I check for errors in this format using Ruby? This for example should fail my checks:
cost: 60 items:shoes,football
My goal was to split the string by a " " and check to see if the first word was "cost:", if the second word was a number and so on but I realized that splitting on a " " doesn't help me check for extra whitespaces as it just eats it up. Also doesn't help me check for trailing whitespaces. How do I go about doing this?
You could use the following regular expression.
r = /
\A # match beginning of string
cost:\s # match "cost:" followed by a space
\d+\s # match > 0 digits followed by a space
items:\s # match "items:" followed by a space
[[:alpha:]]+ # match > 0 lowercase or uppercase letters
(?:,[[:alpha:]]+) # match a comma followed by > 0 lowercase or uppercase
# letters in a non-capture group (?: ... )
* # perform the match on non-capture group >= 0 times
\z # match the end of the string
/x # free-spacing regex definition mode
"cost: 50 items: book,lamp" =~ r #=> 0 (a match, beginning at index 0)
"cost: 50 items: book,lamp,table" =~ r #=> 0 (a match, beginning at index 0)
"cost: 60 items:shoes,football" =~ r #=> nil (no match)
The regex can can of course be written in the normal manner:
r = /\Acost:\s\d+\sitems:\s[[:alpha:]]+(?:,[[:alpha:]]+)*\z/
or
r = /\Acost: \d+ items: [[:alpha:]]+(?:,[[:alpha:]]+)*\z/
though a whitespace character (\s) cannot be replaced by a space in the free-spacing mode definition (\x).