Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 months ago.
Improve this question
As a beginner in Ruby, is there a quick to extract the first and second number from this string 5.16.0.0-15? In this case, I am looking 5 and 16. Thanks
One way is to use the method String#match with the regular expression
rgx = /(\d+)\.(\d+)/
to construct a MatchData object. The regular expression captures the first two strings of digits, separated by a period. The method MatchData#captures is then use to extract the contents of capture groups 1 and 2 (strings) and save them to an array. Lastly, String#to_i is used to convert the strings in the array to integers:
"5.16.0.0-15".match(rgx).captures.map(&:to_i)
#=> [5, 16]
We see that
m = "5.16.0.0-15".match(rgx)
#=> #<MatchData "5.16" 1:"5" 2:"16">
a = m.captures
#=> ["5", "16"]
a.map(&:to_i)
#=> [5, 16]
a.map(&:to_i) can be thought of as shorthand for a.map { |s| s.to_i }.
We can express the regular expression in free-spacing mode to make it self-documenting:
/
( # begin capture group 1
\d+ # match one or more digits
) # end capture group 1
\. # match a period
( # begin capture group 2
\d+ # match one or more digits
) # end capture group 2
/x # invoke free-spacing regex definition mode
One reason for using a regular expression here is to confirm the structure of the string, should that be desired. That could be done by using the following regex:
rgx1 =
/
\A # match the beginning of the string
( # begin capture group 1
\d+ # match one or more digits
) # end capture group 1
\. # match a period
( # begin capture group 2
\d+ # match one or more digits
) # end capture group 2
(?: # begin a non-capture group
\. # match a period
\d+ # match one or more digits
(?: # begin a non-capture group
\- # match a hyphen
\d+ # match one or more digits
)? # end non-capture group and make it optional
)* # end non-capture group and execute it zero or more times
\z # match the end of the string
/x # invoke free-spacing regex definition mode
"5.16.0.0-15".match(rgx1).captures.map(&:to_i)
#=> [5, 16]
"5.16.0.A".match(rgx1)
#=> nil
"5.16.0.0-1-5".match(rgx1)
#=> nil
The last two examples would generate exceptions because nil has no method captures. One could of course handle those exceptions.
rgx1 is conventionally written /\A(\d+)\.(\d+)(x?:\.\d+(?:\-\d+)?)*\z/.
Use #split, telling it to split on "." and only split into three parts, then access the first two.
irb(main):003:0> s = "5.16.0.0-15"
=> "5.16.0.0-15"
irb(main):004:0> s.split(".", 3)[0..1]
=> ["5", "16"]
Optionally map to integers.
irb(main):005:0> s.split(".", 3)[0..1].map(&:to_i)
=> [5, 16]
Related
I am busy working through some problems I have found on the net and I feel like this should be simple but I am really struggling.
Say you have the string 'AbcDeFg' and the next string of 'HijKgLMnn', I want to be able to find the same characters in the string so in this case it would be 'g'.
Perhaps I wasn't giving enough information - I am doing Advent of Code and I am on day 3. I just need help with the first bit which is where you are given a string of characters - you have to split the characters in half and then compare the 2 strings. You basically have to get the common character between the two. This is what I currently have:
file_data = File.read('Day_3_task1.txt')
arr = file_data.split("\n")
finals = []
arr.each do |x|
len = x.length
divided_by_two = len / 2
second = x.slice!(divided_by_two..len).split('')
first = x.split('')
count = 0
(0..len).each do |z|
first.each do |y|
if y == second[count]
finals.push(y)
end
end
count += 1
end
end
finals = finals.uniq
Hope that helps in terms of clarity :)
Did you try to convert both strings to arrays with the String#char method and find the intersection of those arrays?
Like this:
string_one = 'AbcDeFg'.chars
string_two = 'HijKgLMnn'.chars
string_one & string_two # => ["g"]
One way to do that is to use the method String#scan with the regular expression
rgx = /(.)(?!.*\1.*_)(?=.*_.*\1)/
I'm not advocating this approach. I merely thought some readers might find it interesting.
Suppose
str1 = 'AbcDgeFg'
str2 = 'HijKgLMnbn'
Now form the string
str = "#{str1}_#{str2}"
#=> "AbcDeFg_HijKgLMnbn"
I've assumed the strings contain letters only, in which case they are separated in str with any character other than a letter. I've used an underscore. Naturally, if the strings could contain underscores a different separator would have to be used.
We then compute
str.scan(rgx).flatten
#=> ["b", "g"]
Array#flatten is needed because
str.scan(rgx)
#=>[["b"], ["g"]]
The regular expression can be written in free-spacing mode to make it self-documenting:
rgx =
/
(.) # match any character, same to capture group 1
(?! # begin a negative lookahead
.* # match zero or more characters
\1 # match the contents of capture group 1
.* # match zero or more characters
_ # match an underscore
) # end the negative lookahead
(?= # begin a positive lookahead
.* # match zero or more characters
_ # match an underscore
.* # match zero or more characters
\1 # match the contents of capture group 1
) # end the positive lookahead
/x # invoke free-spacing regex definition mode
Note that if a character appears more than once in str1 and at least once in str2 the negative lookahead ensures that only the last one in str1 is matched, to avoid returning duplicates.
Alternatively, one could write
str.gsub(rgx).to_a
The uses the (fourth) form of String#gsub which takes a single argument and no block and returns an enumerator.
I have a large file, and I want to be able to check if a word is present twice.
puts "Enter a word: "
$word = gets.chomp
if File.read('worldcountry.txt') # do something if the word entered is present twice...
How can i check if the file worldcountry.txt include twice the $word i entered ?
I found what i needed from this: count-the-frequency-of-a-given-word-in-text-file-in-ruby
On the Gerry post with this code
word_count = 0
my_word = "input"
File.open("texte.txt", "r") do |f|
f.each_line do |line|
line.split(' ').each do |word|
word_count += 1 if word == my_word
end
end
end
puts "\n" + word_count.to_s
Thanks, i will pay more attention next time.
If the file is not overly large, it can be gulped into a string. Suppose:
str = File.read('cat')
#=> "There was a dog 'Henry' who\nwas pals with a dog 'Buck' and\na dog 'Sal'."
puts str
There was a dog 'Henry' who
was pals with a dog 'Buck' and
a dog 'Sal'.
Suppose the given word is 'dog'.
Confirm the file contains at least two instances of the given word
One can attempt to match the regular expression
r1 = /\bdog\b.*\bdog\b/m
str.match?(r1)
#=> true
Demo
Confirm the file contains exactly two instances of the given word
Using a regular expression to determine is the file contains exactly two instances of the the given word is somewhat more complex. Let
r2 = /\A(?:(?:.(?!\bdog\b))*\bdog\b){2}(?!.*\bdog\b)/m
str.match?(r1)
#=> false
Demo
The two regular expressions can be written in free-spacing mode to make them self-documenting.
r1 = /
\bdog\b # match 'dog' surrounded by word breaks
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
/m # cause . to match newlines
r2 = /
\A # match beginning of string
(?: # begin non-capture group
(?: # begin non-capture group
. # match one character
(?! # begin negative lookahead
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
) # end non-capture group
* # execute preceding non-capture group zero or more times
\bdog\b # match 'dog' surrounded by word breaks
) # end non-capture group
{2} # execute preceding non-capture group twice
(?! # begin negative lookahead
.* # match zero or more characters
\bdog\b # match 'dog' surrounded by word breaks
) # end negative lookahead
/xm # # cause . to match newlines and invoke free-spacing mode
I'm trying to use Ruby regex to get word combo like below.
In a example below I only need cases 1-4, * marked them in caps for easy testing. Word in the middle (dbo, bcd) could be anything or nothing like in case#3. I have trouble how to get that double period case#3 working. It's also good to get standalone SALES as word too but probably it's too much for one regex ?Tx all guru .
This is my script which partially working, need add alpha..SALES
s = '1 alpha.dbo.SALES 2 alpha.bcd.SALES 3 alpha..SALES 4 SALES
bad cases 5x alpha.saleS 6x saleSXX'
regex = /alpha+\.+[a-z]+\.?sales/ix
puts 'R: ' + s.scan(regex).to_s
##R: ["alpha.dbo.SALES", "alpha.bcd.SALES"]
s = '1 alpha.dbo.SALES 2 alpha.bcd.SALES 3 alpha..SALES 4 SALES
bad cases 5x alpha.saleS 6x saleSXX 7x alpha.abc.SALES.etc'
regex = /(?<=^|\s)(?:alpha\.[a-z]*\.)?(?:sales)(?=\s|$)/i
puts 'R: ' + s.scan(regex).to_s
Output:
R: ["alpha.dbo.SALES", "alpha.bcd.SALES", "alpha..SALES", "SALES"]
r = /
(?<=\d[ ]) # match a digit followed by a space in a positive lookbehind
(?: # begin a non-capture group
\p{Alpha}+ # match one or more letters
\. # match a period
(?: # begin a non-capture group
\p{Alpha}+ # match one or more letters
\. # match a period
| # or
\. # match a period
) # end non-capture group
)? # end non-capture group and optionally match it
SALES # match string
(?!=[.\p{Alpha}]) # do not match a period or letter (negative lookahead)
/x # free-spacing regex definition mode.
s.scan(r)
#=> ["alpha.dbo.SALES", "alpha.bcd.SALES", "alpha..SALES", "SALES"]
This regular expression is customarily written as follows.
r = /
(?<=\d )(?:\p{Alpha}+\.(?:\p{Alpha}+\.|\.))?SALES(?!=[.\p{Alpha}])/
In free-spacing mode the space must be put in a character class ([ ]); else it would be stripped out.
I want to convert:
"890414.14.1422, 900515141092, 950616-12-5414"
to:
"890414-14-1422, 900515-14-1092, 950616-12-5414"
How can I achieve it?
I tried:
def format_ids(string)
string.gsub(/(\d{6})[.-](\d{2})[.-](\d{4})/, '\1-\2-\3')
end
format_ids("890414.14.1422, 900515141092, 950616-12-5414")
# => "890414-14-1422, 900515141092, 950616-12-5414"
You should make the delimiters in the input string non mandatory:
- string.gsub(/(\d{6})[.-](\d{2})[.-](\d{4})/, '\1-\2-\3')
+ string.gsub(/(\d{6})[.-]?(\d{2})[.-]?(\d{4})/, '\1-\2-\3')
Note question marks after the delimiters, they do the trick.
str = "890414.14.1422, 900515141092, 950616-12-5414"
r = /
( # begin capture group 1
\. # match a period
| # or
(?<=\d{6}) # match after 6 digits (positive lookbehind)
(?=\d{6}) # match before 6 digits (positive lookahead)
| # or
(?<=\d{8}) # match after 8 digits (positive lookbehind)
(?=\d{4}) # match before 4 digits (positive lookahead)
) # end capture group 1
/x # free-spacing regex definition mode
str.gsub(r,'-')
#=> "890414-14-1422, 900515-14-1092, 950616-12-5414"
This regular expression is conventionally (not free-spacing mode) written as follows:
/(\.|(?<=\d{6})(?=\d{6})|(?<=\d{8})(?=\d{4}))/
Note that (?<=\d{6}) and (?=\d{6}) match a position between two consecutive spaces that has a width of zero, as do (?<=\d{8}) and (?=\d{4}).
I've been having difficulty trying to figureout how to go about solving this issue. I have 2 kinds of URLs in which I need to be able to update/increment the number value for the page.
Url 1:
forum-351-page-2.html
In the above, I would like to modify this url for n pages. So I'd like to generate new urls with a given range of say page-1 to page-30. But that's all I'd like to change. page-n.html
Url 2:
href="forumdisplay.php?fid=115&page=3
The second url is different but I feal it's easier visit.
R = /
(?: # begin non-capture group
(?<=-page-) # match string in a positive lookbehind
\d+ # match 1 or more digits
(?=\.html) # match period followed by 'html' in a positive lookahead
) # close non-capture group
| # or
(?: # begin non-capture group
(?<=&page=) # match string in a positive lookbehind
\d+ # match 1 or more digits
\z # match end of string
) # close non-capture group
/x # free-spacing regex definition mode
def update(str, val)
str.sub(R, val.to_s)
end
update("forum-351-page-2.html", 4)
#=> "forum-351-page-4.html"
update("forumdisplay.php?fid=115&page=3", "4")
#=> "forumdisplay.php?fid=115&page=4"
For the first url
url1 = "forum-351-page-2.html"
(1..30).each do |x|
puts url1.sub(/page-\d*/, "page-#{x}")
end
This will output
"forum-351-page-1.html"
"forum-351-page-2.html"
"forum-351-page-3.html"
...
"forum-351-page-28.html"
"forum-351-page-29.html"
"forum-351-page-30.html"
You can do the same thing for the second url.
url1.sub(/page=\d*$/, "page=#{x}")