Cut off the filename and extension of a given string - ruby

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:
def getFiles(directory)
arr = Dir[directory + '/**/*.plt']
arr.each do |k|
puts "#{k}"
end
end
The output is the path and the files. But I want only the path.
Instead of /foo/bar.txt I want only the /foo/
My first thought was a regexp but I am not sure how to do that.

Could File.dirname be of any use?
File.dirname(file_name ) → dir_name
Returns all components of the filename
given in file_name except the last
one. The filename must be formed using
forward slashes (``/’’) regardless of
the separator used on the local file
system.
File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"

You don't need a regex or split.
File.dirname("/foo/bar/baz.txt")
# => "/foo/bar"

The following code should work (tested in the ruby console):
>> path = "/foo/bar/file.txt"
=> "/foo/bar/file.txt"
>> path[0..path.rindex('/')]
=> "/foo/bar/"
rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461
Good luck!

I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.
path = '/foo/bar.txt'
path = path.split '/'
path.pop
path = path.join '/'
# path is now '/foo'

not sure what language your in but here is the regex for the last / to the end of the string.
/[^\/]*+$/
Transliterates to all characters that are not '/' before the end of the string

For a regular expression, this should work, since * is greedy:
.*/

Related

How to extract part of the string which comes after given substring?

For example I have url string like:
https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj
From this string I need to extract number 1234 which comes after subfolder/. I tried with gsub but no luck. Any help would be appreciated.
Suppose your url is saved in a variable called url.
Then the following should return 1234
url.match(/subfolder\/(\d*)/)[1]
Explanation:
url.match(/ # call the match function which takes a regex
subfolder\/ # search for the first appearance of the string 'subfolder/'
# note: we must escape the `/` so we don't end the regex early
(\d*) # match any number of digits in a capture group,
/)[1] # close the regex and return the first capture group
lwassink has the right idea, but it can be done more simply. If subfolder is always the same:
url = "https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj"
url[/subfolder\/\K\d+/]
# => "1234"
The \K discards the matched text up to that point, so only "1234" is returned.
If you want to get the number after any subfolder, and the domain name is always the same, you might do this instead:
url[%r{amazonaws\.com/[^/]+/\K\d+}]
# => "1234"
s.split('/')[4]
Add a .to_i at the end if you like.
Or, to key it on a substring like you asked for...
a = s.split '/'
a[a.find_index('subfolder') + 1]
Or, to do it as a one-liner I suppose you could:
s.split('/').tap { |a| #i = 1 + a.find_index('subfolder')}[#i]
Or, since I am a damaged individual, I would actually write that:
s.split('/').tap { |a| #i = 1 + (a.find_index 'subfolder')}[#i]
url = 'http://abc/xyz'
index= url.index('/abc/')
url[index+5..length_of_string_you_want_to_extract]
Hope, that helps!

Ruby regex replace some subpattern captures

I have regex for path parsing. Below is the part of regex that repeats multiple times.
dir_pattern = /
\/?
(?<dir> #pattern to catch directory
[^[:cntrl:]\/\n\r]+ #directory name
)
(?=\/) #indistinguishable from file otherwise
/x
Input:
/really/long/absolute/path/to/file.extension
Desired output:
to/really/long/file.extension
I want to cut off some (not all directories) and reorder remaining ones. How could I achieve that?
Since I'm already using regexes for filtering files needed, I would like to keep using them.
Ok, here is a regex answer based on the new information posted above:
rx = /\/[^\/]+/i
# matches each character that is not a '/'
# this ensures any character like a '.' in a file name or the dot
# in the extension is kept.
path = '/really/long/absolute/path/to/file.extension'
d = path.scan(rx)
# returns an array of all matches ["/really", "/long", "/absolute", "/path", "/to", "/file.extension"]
new_path = [y[4], y[0], y[1], y[-1]].join
# returns "to/really/long/file.extension"
Lets wrap it in a method:
def short_path(path, keepers)
rx = /\/[^\/]+/i
d = path.scan(rx)
new_path = []
keepers.each do |dir|
new_path << d[dir]
end
new_path << d[-1]
new_path.join
end
Usage: just past the method the path and an array of the positions you want to keep in the new order.
path = '/really/long/absolute/path/to/file.extension'
new_path = short_path(path, [4,0,1])
# returns '/to/really/long/file.extension'
If you need to remove the first '/' for a relative path just:
new_path.sub!(/\//, '')
Old answer using string manipulation without regex...
x = "01234567 capture me!"
puts "#{x[7]}#{x[4]}#{x2}"
#=> "742"

Ruby: Get filename without the extensions

How can I get the filename without the extensions? For example, input of "/dir1/dir2/test.html.erb" should return "test".
In actual code I will passing in __FILE__ instead of "/dir1/dir2/test.html.erb".
Read documentation:
basename(file_name [, suffix] ) → base_name
Returns the last component of the filename given in file_name, which
can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as
the separator when File::ALT_SEPARATOR is not nil. If suffix is given
and present at the end of file_name, it is removed.
=> File.basename('public/500.html', '.html')
=> "500"
in you case:
=> File.basename("test.html.erb", ".html.erb")
=> "test"
How about this
File.basename(f, File.extname(f))
returns the file name without the extension.. works for filenames with multiple '.' in it.
In case you don't know the extension you can combine File.basename with File.extname:
filepath = "dir/dir/filename.extension"
File.basename(filepath, File.extname(filepath)) #=> "filename"
Pathname provides a convenient object-oriented interface for dealing with file names.
One method lets you replace the existing extension with a new one, and that method accepts the empty string as an argument:
>> Pathname('foo.bar').sub_ext ''
=> #<Pathname:foo>
>> Pathname('foo.bar.baz').sub_ext ''
=> #<Pathname:foo.bar>
>> Pathname('foo').sub_ext ''
=> #<Pathname:foo>
This is a convenient way to get the filename stripped of its extension, if there is one.
But if you want to get rid of all extensions, you can use a regex:
>> "foo.bar.baz".sub(/(?<=.)\..*/, '')
=> "foo"
Note that this only works on bare filenames, not paths like foo.bar/pepe.baz. For that, you might as well use a function:
def without_extensions(path)
p = Pathname(path)
p.parent / p.basename.sub(
/
(?<=.) # look-behind: ensure some character, e.g., for ‘.foo’
\. # literal ‘.’
.* # extensions
/x, '')
end
Split by dot and the first part is what you want.
filename = 'test.html.erb'
result = filename.split('.')[0]
Considering the premise, the most appropriate answer for this case (and similar cases with other extensions) would be something such as this:
__FILE__.split('.')[0...-1].join('.')
Which will only remove the extension (not the other parts of the name: myfile.html.erb here becomes myfile.html, rather than just myfile.
Thanks to #xdazz and #Monk_Code for their ideas. In case others are looking, the final code I'm using is:
File.basename(__FILE__, ".*").split('.')[0]
This generically allows you to remove the full path in the front and the extensions in the back of the file, giving only the name of the file without any dots or slashes.
name = "filename.100.jpg"
puts "#{name.split('.')[-1]}"
Yet understanding it's not a multiplatform solution, it'd work for unixes:
def without_extensions(path)
lastSlash = path.rindex('/')
if lastSlash.nil?
theFile = path
else
theFile = path[lastSlash+1..-1]
end
# not an easy thing to define
# what an extension is
theFile[0...theFile.index('.')]
end
puts without_extensions("test.html.erb")
puts without_extensions("/test.html.erb")
puts without_extensions("a.b/test.html.erb")
puts without_extensions("/a.b/test.html.erb")
puts without_extensions("c.d/a.b/test.html.erb")

How to use gsub to extract and replace in Ruby?

I have a bunch of markdown image paths in several files and I want to change the root directory. The regex for the image tag is this:
/\!\[image\]\((.*?)\)/
I need to be able to grab the group, parse out the filename and give it a new path before returning it to gsub to be substituted out.
For instance, I want to find all strings like this:
![image](/old/path/to/image1.png)
And convert them to:
![image](/new/path/to/image1.png)
I know I can do this in a gsub block, I'm just not very clear how it works.
Here's one way, verbosely for clarity's sake:
markdown = "![image](/old/path/to/image1.png)"
regex = /(\w+.png)/
match_data = regex.match markdown
p base_name = match_data[1]
#=> "image1.png"
p new_markdown = "![image](/new/path/to/#{base_name})"
#=> "![image](/new/path/to/image1.png)"
More succinctly:
p markdown.gsub( /\/.+(\w+.png)/, "/new/path/to/#{$1}" )
#=> "![image](/new/path/to/image1.png)"
You can use a regular expression with positive lookbehind and positive lookahead to replace only the filename part in the original String. I have a new_path variable holding the new path, and simply substitute that using .sub.
img = "![image](/old/path/to/image1.png)"
new_path = '/new/path/to/image1.png'
p img.sub(/(?<=!\[image\]\()[^)]+(?=\))/, new_path)
# => "![image](/new/path/to/image1.png)"

Removing underscore character from each entry in a list of paths

There is an array of strings
paths = ['foo/bar_baz/_sunny', bar/foo_baz/_warm', 'foo/baz/_cold', etc etc]
I need to remove underscore in each last part of path (_sunny => sunny, _warm => warm, _cold => cold)
paths.each do |path|
path_parts = path.split('/')
path_parts.last.sub!(/^_/, '')
puts path_parts.join('/')
end
However that solution is a bit dirty. I feel it can be done without using path.split and path.join. Do you have any ideas?
Thanks in advance
I don't know Ruby, but the pattern
/('[a-zA-Z0-9_\/]*\/)_([a-zA-Z0-9_]*')/g
could be replaced with
'$1$2'
if $x is used in Ruby to reference matching groups, and g is valid flag. It would need to be applied once to the string, with no splits or joins.
Or, more compactly:
paths.map {|p| p.sub(/_(?=[^\/]*$)/,"")}
That is, strip out any underscore that is followed by any number of non-slashes and then the end of the string...

Resources