Ruby-Regular expression to find index - ruby

I want to fetch the value 10 from video[10]. How to do this using regular expression?
I have tried : /video\[(.*?)\]/.
This did not work.

It should be /video\[(.*)\]/.
/video\[(.*)\]/.match "video[10]"
#=> #<MatchData "video[10]" 1:"10">

r = /
\b # match a word break
video\[ # match string
(\d+) # match >= 1 digits in capture group 1
\] # match char
/x # extended/free-spacing mode
"Martha, where did you put video[10]?"[r,1]
#=> "10"

Related

How to check with ruby if a word is repeated twice in a file

I have a large file, and I want to be able to check if a word is present twice.
puts "Enter a word: "
$word = gets.chomp
if File.read('worldcountry.txt') # do something if the word entered is present twice...
How can i check if the file worldcountry.txt include twice the $word i entered ?
I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby
On the Gerry post with this code
word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
Thanks, i will pay more attention next time.
If the file is not overly large, it can be gulped into a string. Suppose:
str = File.read('cat')
#=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.
Suppose the given word is 'dog'.
Confirm the file contains at least two instances of the given word
One can attempt to match the regular expression
r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
#=> true
Demo
Confirm the file contains exactly two instances of the given word
Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let
r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
#=> false
Demo
The two regular expressions can be written in free-spacing mode to make them self-documenting.
r1 = /
\bdog\b # match 'dog' surrounded by word breaks
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
/m # cause . to match newlines
r2 = /
\A # match beginning of string
(?: # begin non-capture group
(?: # begin non-capture group
. # match one character
(?! # begin negative lookahead
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
) # end non-capture group
* # execute preceding non-capture group zero or more times
\bdog\b # match 'dog' surrounded by word breaks
) # end non-capture group
{2} # execute preceding non-capture group twice
(?! # begin negative lookahead
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
/xm # # cause . to match newlines and invoke free-spacing mode

Insert hyphen into number

I want to convert:
"890414.14.1422, 900515141092, 950616-12-5414"
to:
"890414-14-1422, 900515-14-1092, 950616-12-5414"
How can I achieve it?
I tried:
def format_ids(string)
string.gsub(/(\d{6})[.-](\d{2})[.-](\d{4})/, '\1-\2-\3')
end
format_ids("890414.14.1422, 900515141092, 950616-12-5414")
# => "890414-14-1422, 900515141092, 950616-12-5414"
You should make the delimiters in the input string non mandatory:
- string.gsub(/(\d{6})[.-](\d{2})[.-](\d{4})/, '\1-\2-\3')
+ string.gsub(/(\d{6})[.-]?(\d{2})[.-]?(\d{4})/, '\1-\2-\3')
Note question marks after the delimiters, they do the trick.
str = "890414.14.1422, 900515141092, 950616-12-5414"
r = /
( # begin capture group 1
\. # match a period
| # or
(?<=\d{6}) # match after 6 digits (positive lookbehind)
(?=\d{6}) # match before 6 digits (positive lookahead)
| # or
(?<=\d{8}) # match after 8 digits (positive lookbehind)
(?=\d{4}) # match before 4 digits (positive lookahead)
) # end capture group 1
/x # free-spacing regex definition mode
str.gsub(r,'-')
#=> "890414-14-1422, 900515-14-1092, 950616-12-5414"
This regular expression is conventionally (not free-spacing mode) written as follows:
/(\.|(?<=\d{6})(?=\d{6})|(?<=\d{8})(?=\d{4}))/
Note that (?<=\d{6}) and (?=\d{6}) match a position between two consecutive spaces that has a width of zero, as do (?<=\d{8}) and (?=\d{4}).

Detect specific format of version number using regex

I'm looking to extract elements of an array containing a version number, where a version number is either at the start or end of a string or padded by spaces, and is a series of digits and periods but does not start or end with a period. For example "10.10 Thingy" and "Thingy 10.10.5" is valid, but "Whatever 4" is not.
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)*(?=$| )/] }
=> ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4"]
I'm not sure how to modify the regex to require at least one period so that "Whatever 4" is not in the results.
This is only a slight variant of Archonic's answer.
r = /
(?<=\A|\s) # match the beginning of the string or a space in a positive lookbehind
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
The question was not to return the version information, but to to return the strings that have correct version information. As a result there is no need for the lookarounds:
r = /
[\A\s\] # match the beginning of the string or a space
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
[\s\z] # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
Suppose one wanted to obtain both the strings that contain valid versions and the versions contained in those strings. One could write the following:
r = /
(?<=\A|\s\) # match the beginning of string or a space in a pos lookbehind
(?:\d+\.)+ # match >= 1 digits then a period in non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or end of string in a pos lookahead
/x # free-spacing regex definition mode
haystack.each_with_object({}) do |str,h|
version = str[r]
h[str] = version if version
end
# => {"10.10 Thingy"=>"10.10", "Thingy 10.10.5"=>"10.10.5"}
Ah hah! I knew I was close.
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)+(?=$| )/] }
The asterisk at the end of (\.\d+)* was allowing that pattern to repeat any number of times, including zero times. You can limit that with (\.\d+){x,y} where x and y are the min and max times. You can also only identify a minimum with (\.\d+){x,}. In my case I wanted a minimum of once, which would be (\.\d+){1,}, however that's synonymous with (\.\d+)+. That only took half the day to figure out...

extracting data through regexps is returning nil

I'm trying to extract a pair of string from a parsed PDF and I have this extract:
Number:731 / 13/06/2016 1823750212 10/06/2016\n\n\n\n Articolo
http://rubular.com/r/GRI6j4Byz3
My goal is to get out the 731 and 1823750212 values.
I tried something like text[/Number:(.*)Articolo/] for the first steps but it's returning nil while on rubular it somewhat matches.
Any tips?
Whether the format of the string is fixed (dates and the long number,) this will do the trick:
text.scan /\ANumber:(\d+).*?(\d{5,})/
#⇒ [[ "731", "1823750212" ]]
I have assumed that we do not know the length of either string (representations of non-negative integers) to be extracted, only that the first follows "Number:", which is at the beginning of the string, and the second is preceded and followed by at least one space.
r = /
(?<=\A\Number:) # match beginning of string followed by 'Number:' in a
# positive lookbehind
\d+ # match one or more digits
| # or
(?<=\s) # match a whitespace char in a positive lookbehind
\d+ # match one or more digits
(?=\s) # match a whitespace char in a positive lookbehind
/x # free-spacing regex definition mode
str = "Number:731 / 13/06/2016 1823750212 10/06/2016\n\n\n\n Articolo"
str.scan(r)
#=> ["731", "1823750212"]
If there could be intervening spaces between the colon and "731", you could do modify the regex as follows.
r = /
\A # match beginning of string followed by 'Number:' in a
# positive lookbehind
Number: # match string 'Number:'
\s* # match zero or more spaces
\K # forget everything matched so far
\d+ # match one or more digits
| # or
(?<=\s) # match a whitespace char in a positive lookbehind
\d+ # match one or more digits
(?=\s) # match a whitespace char in a positive lookbehind
/x # free-spacing regex definition mode
str = "Number: 731 / 13/06/2016 1823750212 10/06/2016\n\n\n\n Articolo"
str.scan(r)
#=> ["731", "1823750212"]
Here \K must be used because Ruby does not support variable-length positive lookbehinds.

Capitalize the first character after a dash

So I've got a string that's an improperly formatted name. Let's say, "Jean-paul Bertaud-alain".
I want to use a regex in Ruby to find the first character after every dash and make it uppercase. So, in this case, I want to apply a method that would yield: "Jean-Paul Bertaud-Alain".
Any help?
String#gsub can take a block argument, so this is as simple as:
str = "Jean-paul Bertaud-alain"
str.gsub(/-[a-z]/) {|s| s.upcase }
# => "Jean-Paul Bertaud-Alain"
Or, more succinctly:
str.gsub(/-[a-z]/, &:upcase)
Note that the regular expression /-[a-z]/ will only match letters in the a-z range, meaning it won't match e.g. à. This is because String#upcase does not attempt to capitalize characters with diacritics anyway, because capitalization is language-dependent (e.g. i is capitalized differently in Turkish than in English). Read this answer for more information: https://stackoverflow.com/a/4418681
"Jean-paul Bertaud-alain".gsub(/(?<=-)\w/, &:upcase)
# => "Jean-Paul Bertaud-Alain"
I suggest you make the test more demanding by requiring the letter to be upcased: 1) be preceded by a capitalized word followed by a hypen and 2) be followed by lowercase letters followed by a word break.
r = /
\b # Match a word break
[A-Z] # Match an upper-case letter
[a-z]+ # Match >= 1 lower-case letters
\- # Match hypen
\K # Forget everything matched so far
[a-z] # Match a lower-case letter
(?= # Begin a positive lookahead
[a-z]+ # Match >= 1 lower-case letters
\b # Match a word break
) # End positive lookahead
/x # Free-spacing regex definition mode
"Jean-paul Bertaud-alain".gsub(r) { |s| s.upcase }
#=> "Jean-Paul Bertaud-Alain"
"Jean de-paul Bertaud-alainM".gsub(r) { |s| s.upcase }
#=> "Jean de-paul Bertaud-alainM"

Resources