I have a regex to find TV series files on my drive
if (filename =~ /S\d+?E\d+?/ix)
puts "EPISODE : #{filename}"
works well enough and prints the filename which is something like this for example
EPISODE : Lie.to.Me.S02E02.Truth.or.Consequences.HDTV.XviD-2HD.avi
How can I display everything before the match, instead of the whole filename?
So I want to match on the S02E02 but display Lie.to.Me, but this Lie.to.Me string can really be anything, so I cannot do a regex for something specific.
s = "Lie.to.Me.S02E02.Truth.or.Consequences.HDTV.XviD-2HD.avi"
m = s.match(/S\d+?E\d+?/ix)
puts m.pre_match
=> "Lie.to.Me."
Try using the $` special variable:
def check(filename)
if (filename =~ /S\d+?E\d+?/ix)
puts "MATCH: #{filename}"
puts "PRE: #{$`}"
end
end
check 'EPISODE : Lie.to.Me.S02E02.Truth.or.Consequences.HDTV.XviD-2HD.avi'
# MATCH: EPISODE : Lie.to.Me.S02E02.Truth.or.Consequences.HDTV.XviD-2HD.avi
# PRE: EPISODE : Lie.to.Me.
Use #match with a .* before your pattern, with a capturing group.
"Lie.To.Me-S02E01-Xvid.avi".match(/\A(.*?)S\d+E\d+?/ix)[1]
# => Lie.To.Me-
Use pre_match:
match = /S\d+?E\d+?/ix.match(filename)
if match then
puts match.pre_match
end
you should look after using parentheses in your regular expression to be able to handle groups:
if (filename =~ /.+(S\d+?E\d+?).*/ix)
puts "EPISODE : \1"
That means only the first group that matches will be displayed.
Related
In Ruby regular expressions I would like to use gsub to replace a last occurrence of a grouping, if it occurs, otherwise, perform a replacement anyways at a default location. I am trying to replace the last occurrence of a number in the 40s (40...49). I have the following regular expression, which is correctly capturing the grouping I would like in '\3':
/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/
Some sample strings I am using this regex on are:
12a23b34c45d56eFoo
12a45b34c46d89eFoo
45aFoo
Foo
12a23bFoo
12a23b445cFoo
Using https://regex101.com/, I see the last number in 40s is captured in '\3'. I would then like to somehow perform string.gsub(regex, '\3' => 'NEW') to replace this last occurrence or append before Foo if not present. My desired results would be:
12a23b34cNEWd56eFoo
12a45b34cNEWd89eFoo
NEWaFoo
NEWFoo
12a23bNEWFoo
12a23b4NEWcFoo
If I correctly understood, you are interested in gsub with codeblock:
str.gsub(PATTERN) { |mtch|
puts mtch # the whole match
puts $~[3] # the third group
mtch.gsub($~[3], 'NEW') # the result
}
'abc'.gsub(/(b)(c)/) { |m| m.gsub($~[2], 'd') }
#⇒ "abd"
Probably you should handle the case when there are no 40-s occureneces at all, like:
gsub($~[1], "NEW$~[1]") if $~[3].nil?
To handle all the possible cases, one might declare the group for Foo:
# NOTE THE GROUP ⇓⇓⇓⇓⇓
▶ re = /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
#⇒ /(([1-3,5-9][0-9]|([4][0-9]))[a-z])*(Foo)/
▶ inp.gsub(re) do |mtch|
▷ $~[3].nil? ? mtch.gsub($~[4], "NEW#{$~[4]}") : mtch.gsub(/#{$~[3]}/, 'NEW')
▷ end
#⇒ "12a23b34cNEWd56eFoo\n12a45b34cNEWd89eFoo\nNEWaFoo\nNEWFoo\n12a23bNEWFoo"
Hope it helps.
I suggest the following:
'12a23b34c45d56eFoo'.gsub(/(([1-3,5-9][0-9]|([4][0-9]))[a-z])*Foo/) {
if Regexp.last_match[3].nil? then
puts "Append before Foo"
else
puts "Replace group 3"
end
}
You'd need to find a way to append or replace accordingly or maybe someone can edit with a concise code...
I answered my own question. Forgot to initialize count = 0
I have a bunch of sentences in a paragraph.
a = "Hello there. this is the best class. but does not offer anything." as an example.
To figure out if the first letter is capitalized, my thought is to .split the string so that a_sentence = a.split(".")
I know I can "hello world".capitalize! so that if it was nil it means to me that it was already capitalized
EDIT
Now I can use array method to go through value and use '.capitalize!
And I know I can check if something is .strip.capitalize!.nil?
But I can't seem to output how many were capitalized.
EDIT
a_sentence.each do |sentence|
if (sentence.strip.capitalize!.nil?)
count += 1
puts "#{count} capitalized"
end
end
It outputs:
1 capitalized
Thanks for all your help. I'll stick with the above code I can understand within the framework I only know in Ruby. :)
Try this:
b = []
a.split(".").each do |sentence|
b << sentence.strip.capitalize
end
b = b.join(". ") + "."
# => "Hello there. This is the best class. But does not offer anything."
Your post's title is misleading because from your code, it seems that you want to get the count of capitalized letters at the beginning of a sentence.
Assuming that every sentence is finishing on a period (a full stop) followed by a space, the following should work for you:
split_str = ". "
regex = /^[A-Z]/
paragraph_text.split(split_str).count do |sentence|
regex.match(sentence)
end
And if you want to simply ensure that each starting letter is capitalized, you could try the following:
paragraph_text.split(split_str).map(&:capitalize).join(split_str) + split_str
There's no need to split the string into sentences:
str = "It was the best of times. sound familiar? Out, damn spot! oh, my."
str.scan(/(?:^|[.!?]\s)\s*\K[A-Z]/).length
#=> 2
The regex could be written with documentation by adding x after the closing /:
r = /
(?: # start a non-capture group
^|[.!?]\s # match ^ or (|) any of ([]) ., ! or ?, then one whitespace char
) # end non-capture group
\s* # match any number of whitespace chars
\K # forget the preceding match
[A-Z] # match one capital letter
/x
a = str.scan(r)
#=> ["I", "O"]
a.length
#=> 2
Instead of Array#length, you could use its alias, size, or Array#count.
You can count how many were capitalized, like this:
a = "Hello there. this is the best class. but does not offer anything."
a_sentence = a.split(".")
a_sentence.inject(0) { |sum, s| s.strip!; s.capitalize!.nil? ? sum += 1 : sum }
# => 1
a_sentence
# => ["Hello there", "This is the best class", "But does not offer anything"]
And then put it back together, like this:
"#{a_sentence.join('. ')}."
# => "Hello there. This is the best class. But does not offer anything."
EDIT
As #Humza sugested, you could use count:
a_sentence.count { |s| s.strip!; s.capitalize!.nil? }
# => 1
How can I get the filename without the extensions? For example, input of "/dir1/dir2/test.html.erb" should return "test".
In actual code I will passing in __FILE__ instead of "/dir1/dir2/test.html.erb".
Read documentation:
basename(file_name [, suffix] ) → base_name
Returns the last component of the filename given in file_name, which
can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as
the separator when File::ALT_SEPARATOR is not nil. If suffix is given
and present at the end of file_name, it is removed.
=> File.basename('public/500.html', '.html')
=> "500"
in you case:
=> File.basename("test.html.erb", ".html.erb")
=> "test"
How about this
File.basename(f, File.extname(f))
returns the file name without the extension.. works for filenames with multiple '.' in it.
In case you don't know the extension you can combine File.basename with File.extname:
filepath = "dir/dir/filename.extension"
File.basename(filepath, File.extname(filepath)) #=> "filename"
Pathname provides a convenient object-oriented interface for dealing with file names.
One method lets you replace the existing extension with a new one, and that method accepts the empty string as an argument:
>> Pathname('foo.bar').sub_ext ''
=> #<Pathname:foo>
>> Pathname('foo.bar.baz').sub_ext ''
=> #<Pathname:foo.bar>
>> Pathname('foo').sub_ext ''
=> #<Pathname:foo>
This is a convenient way to get the filename stripped of its extension, if there is one.
But if you want to get rid of all extensions, you can use a regex:
>> "foo.bar.baz".sub(/(?<=.)\..*/, '')
=> "foo"
Note that this only works on bare filenames, not paths like foo.bar/pepe.baz. For that, you might as well use a function:
def without_extensions(path)
p = Pathname(path)
p.parent / p.basename.sub(
/
(?<=.) # look-behind: ensure some character, e.g., for ‘.foo’
\. # literal ‘.’
.* # extensions
/x, '')
end
Split by dot and the first part is what you want.
filename = 'test.html.erb'
result = filename.split('.')[0]
Considering the premise, the most appropriate answer for this case (and similar cases with other extensions) would be something such as this:
__FILE__.split('.')[0...-1].join('.')
Which will only remove the extension (not the other parts of the name: myfile.html.erb here becomes myfile.html, rather than just myfile.
Thanks to #xdazz and #Monk_Code for their ideas. In case others are looking, the final code I'm using is:
File.basename(__FILE__, ".*").split('.')[0]
This generically allows you to remove the full path in the front and the extensions in the back of the file, giving only the name of the file without any dots or slashes.
name = "filename.100.jpg"
puts "#{name.split('.')[-1]}"
Yet understanding it's not a multiplatform solution, it'd work for unixes:
def without_extensions(path)
lastSlash = path.rindex('/')
if lastSlash.nil?
theFile = path
else
theFile = path[lastSlash+1..-1]
end
# not an easy thing to define
# what an extension is
theFile[0...theFile.index('.')]
end
puts without_extensions("test.html.erb")
puts without_extensions("/test.html.erb")
puts without_extensions("a.b/test.html.erb")
puts without_extensions("/a.b/test.html.erb")
puts without_extensions("c.d/a.b/test.html.erb")
I think I'm close, but the regex isn't evaluating. Hoping someone may know why.
def new_title(title)
words = title.split(' ')
words = [words[0].capitalize] + words[1..-1].map do |w|
if w =~ /and|an|a|the|in|if|of/
w
else
w.capitalize
end
end
words.join(' ')
end
When I pass in lowercase titles, they get returned as lowercase.
You need to properly anchor your regular expression:
new_title("the last hope")
# => "The last Hope"
This is because /a/ matches a word with an a in it. /\Aa\Z/ matches a string that consists entirely of a, and /\A(a|of|...)\Z/ matches against a set of words.
In any case, what you might want is this:
case (w)
when 'and', 'an', 'a', 'the', 'in', 'if', 'of'
w
else
w.capitalize
end
Using a regular expression here is a bit heavy handed. What you want is an exclusion list.
This is called titleize, and is implemented like this:
def titleize(word)
humanize(underscore(word)).gsub(/\b('?[a-z])/) { $1.capitalize }
end
Se the doc.
If you want fancy titlezing, check out granth's titleize
Your regular expression should be checking the whole word (^word$). Anyway, isn't more simple to use Enumerable#include?:
def new_title(title)
words = title.split(' ')
rest_words = words.drop(1).map do |word|
%w(and an a the in if of).include?(word) ? word : word.capitalize
end
([words[0].capitalize] + rest_words).join(" ")
end
I have such code
reg = /(.+)_path/
if reg.match('home_path')
puts reg.match('home_path')[0]
end
This will eval regex twice :(
So...
reg = /(.+)_path/
result = reg.match('home_path')
if result
puts result[0]
end
But it will store variable result in memory till.
I have one functional-programming idea
/(.+)_path/.match('home_path').compact.each do |match|
puts match[0]
end
But seems there should be better solution, isn't it?
There are special global variables (their names start with $) that contain results of the last regexp match:
r = /(.+)_path/
# $1 - the n-th group of the last successful match (may be > 1)
puts $1 if r.match('home_path')
# => home
# $& - the string matched by the last successful match
puts $& if r.match('home_path')
# => home_path
You can find full list of predefined global variables here.
Note, that in the examples above puts won't be executed at all if you pass a string that doesn't match the regexp.
And speaking about general case you can always put assignment into condition itself:
if m = /(.+)_path/.match('home_path')
puts m[0]
end
Though, many people don't like that as it makes code less readable and gives a good opportunity for confusing = and ==.
My personal favorite (w/ 1.9+) is some variation of:
if /(?<prefix>.+)_path/ =~ "home_path"
puts prefix
end
If you really want a one-liner: puts /(?<prefix>.+)_path/ =~ 'home_path' ? prefix : false
See the Ruby Docs for a few limitations of named captures and #=~.
From the docs: If a block is given, invoke the block with MatchData if match succeed.
So:
/(.+)_path/.match('home_path') { |m| puts m[1] } # => home
/(.+)_path/.match('homepath') { |m| puts m[1] } # prints nothing
How about...
if m=/regex here/.match(string) then puts m[0] end
A neat one-line solution, I guess :)
how about this ?
puts $~ if /regex/.match("string")
$~ is a special variable that stores the last regexp match. more info: http://www.regular-expressions.info/ruby.html
Actually, this can be done with no conditionals at all. (The expression evaluates to "" if there is no match.)
puts /(.+)_path/.match('home_xath').to_a[0].to_s