Ruby regex replace some subpattern captures - ruby

I have regex for path parsing. Below is the part of regex that repeats multiple times.
dir_pattern = /
\/?
(?<dir> #pattern to catch directory
[^[:cntrl:]\/\n\r]+ #directory name
)
(?=\/) #indistinguishable from file otherwise
/x
Input:
/really/long/absolute/path/to/file.extension
Desired output:
to/really/long/file.extension
I want to cut off some (not all directories) and reorder remaining ones. How could I achieve that?
Since I'm already using regexes for filtering files needed, I would like to keep using them.

Ok, here is a regex answer based on the new information posted above:
rx = /\/[^\/]+/i
# matches each character that is not a '/'
# this ensures any character like a '.' in a file name or the dot
# in the extension is kept.
path = '/really/long/absolute/path/to/file.extension'
d = path.scan(rx)
# returns an array of all matches ["/really", "/long", "/absolute", "/path", "/to", "/file.extension"]
new_path = [y[4], y[0], y[1], y[-1]].join
# returns "to/really/long/file.extension"
Lets wrap it in a method:
def short_path(path, keepers)
rx = /\/[^\/]+/i
d = path.scan(rx)
new_path = []
keepers.each do |dir|
new_path << d[dir]
end
new_path << d[-1]
new_path.join
end
Usage: just past the method the path and an array of the positions you want to keep in the new order.
path = '/really/long/absolute/path/to/file.extension'
new_path = short_path(path, [4,0,1])
# returns '/to/really/long/file.extension'
If you need to remove the first '/' for a relative path just:
new_path.sub!(/\//, '')
Old answer using string manipulation without regex...
x = "01234567 capture me!"
puts "#{x[7]}#{x[4]}#{x2}"
#=> "742"

Related

Ruby: Get filename without the extensions

How can I get the filename without the extensions? For example, input of "/dir1/dir2/test.html.erb" should return "test".
In actual code I will passing in __FILE__ instead of "/dir1/dir2/test.html.erb".
Read documentation:
basename(file_name [, suffix] ) → base_name
Returns the last component of the filename given in file_name, which
can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as
the separator when File::ALT_SEPARATOR is not nil. If suffix is given
and present at the end of file_name, it is removed.
=> File.basename('public/500.html', '.html')
=> "500"
in you case:
=> File.basename("test.html.erb", ".html.erb")
=> "test"
How about this
File.basename(f, File.extname(f))
returns the file name without the extension.. works for filenames with multiple '.' in it.
In case you don't know the extension you can combine File.basename with File.extname:
filepath = "dir/dir/filename.extension"
File.basename(filepath, File.extname(filepath)) #=> "filename"
Pathname provides a convenient object-oriented interface for dealing with file names.
One method lets you replace the existing extension with a new one, and that method accepts the empty string as an argument:
>> Pathname('foo.bar').sub_ext ''
=> #<Pathname:foo>
>> Pathname('foo.bar.baz').sub_ext ''
=> #<Pathname:foo.bar>
>> Pathname('foo').sub_ext ''
=> #<Pathname:foo>
This is a convenient way to get the filename stripped of its extension, if there is one.
But if you want to get rid of all extensions, you can use a regex:
>> "foo.bar.baz".sub(/(?<=.)\..*/, '')
=> "foo"
Note that this only works on bare filenames, not paths like foo.bar/pepe.baz. For that, you might as well use a function:
def without_extensions(path)
p = Pathname(path)
p.parent / p.basename.sub(
/
(?<=.) # look-behind: ensure some character, e.g., for ‘.foo’
\. # literal ‘.’
.* # extensions
/x, '')
end
Split by dot and the first part is what you want.
filename = 'test.html.erb'
result = filename.split('.')[0]
Considering the premise, the most appropriate answer for this case (and similar cases with other extensions) would be something such as this:
__FILE__.split('.')[0...-1].join('.')
Which will only remove the extension (not the other parts of the name: myfile.html.erb here becomes myfile.html, rather than just myfile.
Thanks to #xdazz and #Monk_Code for their ideas. In case others are looking, the final code I'm using is:
File.basename(__FILE__, ".*").split('.')[0]
This generically allows you to remove the full path in the front and the extensions in the back of the file, giving only the name of the file without any dots or slashes.
name = "filename.100.jpg"
puts "#{name.split('.')[-1]}"
Yet understanding it's not a multiplatform solution, it'd work for unixes:
def without_extensions(path)
lastSlash = path.rindex('/')
if lastSlash.nil?
theFile = path
else
theFile = path[lastSlash+1..-1]
end
# not an easy thing to define
# what an extension is
theFile[0...theFile.index('.')]
end
puts without_extensions("test.html.erb")
puts without_extensions("/test.html.erb")
puts without_extensions("a.b/test.html.erb")
puts without_extensions("/a.b/test.html.erb")
puts without_extensions("c.d/a.b/test.html.erb")

How to use gsub to extract and replace in Ruby?

I have a bunch of markdown image paths in several files and I want to change the root directory. The regex for the image tag is this:
/\!\[image\]\((.*?)\)/
I need to be able to grab the group, parse out the filename and give it a new path before returning it to gsub to be substituted out.
For instance, I want to find all strings like this:
![image](/old/path/to/image1.png)
And convert them to:
![image](/new/path/to/image1.png)
I know I can do this in a gsub block, I'm just not very clear how it works.
Here's one way, verbosely for clarity's sake:
markdown = "![image](/old/path/to/image1.png)"
regex = /(\w+.png)/
match_data = regex.match markdown
p base_name = match_data[1]
#=> "image1.png"
p new_markdown = "![image](/new/path/to/#{base_name})"
#=> "![image](/new/path/to/image1.png)"
More succinctly:
p markdown.gsub( /\/.+(\w+.png)/, "/new/path/to/#{$1}" )
#=> "![image](/new/path/to/image1.png)"
You can use a regular expression with positive lookbehind and positive lookahead to replace only the filename part in the original String. I have a new_path variable holding the new path, and simply substitute that using .sub.
img = "![image](/old/path/to/image1.png)"
new_path = '/new/path/to/image1.png'
p img.sub(/(?<=!\[image\]\()[^)]+(?=\))/, new_path)
# => "![image](/new/path/to/image1.png)"

How to mass rename files in ruby

I have been trying to work out a file rename program based on ruby, as a programming exercise for myself (I am aware of rename under linux, but I want to learn Ruby, and rename is not available in Mac).
From the code below, the issue is that the .include? method always returns false even though I see the filename contains such search pattern. If I comment out the include? check, gsub() does not seem to generate a new file name at all (i.e. file name remains the same). So can someone please take a look at see what I did wrong? Thanks a bunch in advance!
Here is the expected behavior:
Assuming that in current folder there are three files: a1.jpg, a2.jpg, and a3.jpg
The Ruby script should be able to rename it to b1.jpg, b2.jpg, b3.jpg
#!/Users/Antony/.rvm/rubies/ruby-1.9.3-p194/bin/ruby
puts "Enter the file search query"
searchPattern = gets
puts "Enter the target to replace"
target = gets
puts "Enter the new target name"
newTarget = gets
Dir.glob("./*").sort.each do |entry|
origin = File.basename(entry, File.extname(entry))
if origin.include?(searchPattern)
newEntry = origin.gsub(target, newTarget)
File.rename( origin, newEntry )
puts "Rename from " + origin + " to " + newEntry
end
end
Slightly modified version:
puts "Enter the file search query"
searchPattern = gets.strip
puts "Enter the target to replace"
target = gets.strip
puts "Enter the new target name"
newTarget = gets.strip
Dir.glob(searchPattern).sort.each do |entry|
if File.basename(entry, File.extname(entry)).include?(target)
newEntry = entry.gsub(target, newTarget)
File.rename( entry, newEntry )
puts "Rename from " + entry + " to " + newEntry
end
end
Key differences:
Use .strip to remove the trailing newline that you get from gets. Otherwise, this newline character will mess up all of your match attempts.
Use the user-provided search pattern in the glob call instead of globbing for everything and then manually filtering it later.
Use entry (that is, the complete filename) in the calls to gsub and rename instead of origin. origin is really only useful for the .include? test. Since it's a fragment of a filename, it can't be used with rename. I removed the origin variable entirely to avoid the temptation to misuse it.
For your example folder structure, entering *.jpg, a, and b for the three input prompts (respectively) should rename the files as you are expecting.
I used the accepted answer to fix a bunch of copied files' names.
Dir.glob('./*').sort.each do |entry|
if File.basename(entry).include?(' copy')
newEntry = entry.gsub(' copy', '')
File.rename( entry, newEntry )
end
end
Your problem is that gets returns a newline at the end of the string. So, if you type "foo" then searchPattern becomes "foo\n". The simplest fix is:
searchPattern = gets.chomp
I might rewrite your code slightly:
$stdout.sync
print "Enter the file search query: "; search = gets.chomp
print "Enter the target to replace: "; target = gets.chomp
print " Enter the new target name: "; replace = gets.chomp
Dir['*'].each do |file|
# Skip directories
next unless File.file?(file)
old_name = File.basename(file,'.*')
if old_name.include?(search)
# Are you sure you want gsub here, and not sub?
# Don't use `old_name` here, it doesn't have the extension
new_name = File.basename(file).gsub(target,replace)
File.rename( file, new_path )
puts "Renamed #{file} to #{new_name}" if $DEBUG
end
end
Here's a short version I've used today (without pattern matching)
Save this as rename.rb file and run it inside the command prompt with ruby rename.rb
count = 1
newname = "car"
Dir["/path/to/folder/*"].each do |old|
File.rename(old, newname + count.to_s)
count += 1
end
I had /Copy of _MG_2435.JPG converted into car1, car2, ...
I made a small script to rename the entire DBZ serie by seasons and implement this:
count = 1
new_name = "Dragon Ball Z S05E"
format_file = ".mkv"
Dir.glob("dragon ball Z*").each do |old_name|
File.rename(old_name, new_name + count.to_s + format_file)
count += 1
end
The result would be:
Dragon Ball Z S05E1
Dragon Ball Z S05E2
Dragon Ball Z S05E3
In a folder, I wanted to remove the trailing underscore _ of any audio filename while keeping everything else. Sharing my code here as it might help someone.
What the program does:
Prompts the user for the:
Directory path: c:/your/path/here (make sure to use slashes /, not backslashes, \, and without the final one).
File extension: mp3 (without the dot .)
Trailing characters to remove: _
Looks for any file ending with c:/your/path/here/filename_.mp3 and renames it c:/your/path/here/filename.mp3 while keeping the file’s original extension.
puts 'Enter directory path'
path = gets.strip
directory_path = Dir.glob("#{path}/*")
# Get file extension
puts 'Enter file extension'
file_extension = gets.strip
# Get trailing characters to remove
puts 'Enter trailing characters to remove'
trailing_characters = gets.strip
suffix = "#{trailing_characters}.#{file_extension}"
# Rename file if condition is met
directory_path.each do |file_path|
next unless file_path.end_with?(suffix)
File.rename(file_path, "#{file_path.delete_suffix(suffix)}.#{file_extension}")
end

Create regular expression from string

Is there any way to create the regex /func:\[sync\] displayPTS/ from string func:[sync] displayPTS?
The story behind this question is that I have serval string pattens to search against in a text file and I don't want to write the same thing again and again.
File.open($f).readlines.reject {|l| not l =~ /"#{string1}"/}
File.open($f).readlines.reject {|l| not l =~ /"#{string2}"/}
Instead , I want to have a function to do the job:
def filter string
#build the reg pattern from string
File.open($f).readlines.reject {|l| not l =~ pattern}
end
filter string1
filter string2
s = "func:[sync] displayPTS"
# => "func:[sync] displayPTS"
r = Regexp.new(s)
# => /func:[sync] displayPTS/
r = Regexp.new(Regexp.escape(s))
# => /func:\[sync\]\ displayPTS/
I like Bob's answer, but just to save the time on your keyboard:
string = 'func:\[sync] displayPTS'
/#{string}/
If the strings are just strings, you can combine them into one regular expression, like so:
targets = [
"string1",
"string2",
].collect do |s|
Regexp.escape(s)
end.join('|')
targets = Regexp.new(targets)
And then:
lines = File.readlines('/tmp/bar').reject do |line|
line !~ target
end
s !~ regexp is equivalent to not s =~ regexp, but easier to read.
Avoid using File.open without closing the file. The file will remain open until the discarded file object is garbage collected, which could be long enough that your program will run out of file handles. If you need to do more than just read the lines, then:
File.open(path) do |file|
# do stuff with file
end
Ruby will close the file at the end of the block.
You might also consider whether using find_all and a positive match would be easier to read than reject and a negative match. The fewer negatives the reader's mind has to go through, the clearer the code:
lines = File.readlines('/tmp/bar').find_all do |line|
line =~ target
end
How about using %r{}:
my_regex = "func:[sync] displayPTS"
File.open($f).readlines.reject { |l| not l =~ %r{#{my_regex}} }

Cut off the filename and extension of a given string

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:
def getFiles(directory)
arr = Dir[directory + '/**/*.plt']
arr.each do |k|
puts "#{k}"
end
end
The output is the path and the files. But I want only the path.
Instead of /foo/bar.txt I want only the /foo/
My first thought was a regexp but I am not sure how to do that.
Could File.dirname be of any use?
File.dirname(file_name ) → dir_name
Returns all components of the filename
given in file_name except the last
one. The filename must be formed using
forward slashes (``/’’) regardless of
the separator used on the local file
system.
File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"
You don't need a regex or split.
File.dirname("/foo/bar/baz.txt")
# => "/foo/bar"
The following code should work (tested in the ruby console):
>> path = "/foo/bar/file.txt"
=> "/foo/bar/file.txt"
>> path[0..path.rindex('/')]
=> "/foo/bar/"
rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461
Good luck!
I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.
path = '/foo/bar.txt'
path = path.split '/'
path.pop
path = path.join '/'
# path is now '/foo'
not sure what language your in but here is the regex for the last / to the end of the string.
/[^\/]*+$/
Transliterates to all characters that are not '/' before the end of the string
For a regular expression, this should work, since * is greedy:
.*/

Resources