Ruby: windows path conversion - ruby

I often use long paths in my scripts and since i'm on windows i have to convert these long paths to nix style with slashes in stead of backslashes. Nothing difficult but annoying if thereafter you copy that path to go to that folder since in explorer you have to do the opposite again.
So i made a function that does the conversion, now i can use windowspaths that i can copy around and keep Ruby sattisfied.
Question: is there a more elegant solution here ? I don't like the second gsub to handle the double \ at he beginning and also would like to handle a \ at the end (currently not possible). The function should be able to handle network unc's (\..) and local drivepaths (c:..)
class String
def path
self.gsub('\\','/').gsub(/^\//,'//')
end
end
path = '\\server\share\folder'.path
Dir.glob(path+'**/*') do |file|
puts file
end
#=>
#//server/share/folder/file1.txt
#//server/share/folder/file2.txt

The suggestion to use File.join made me try a regular split & join and now i have this version, got rid of the ugly double gsub, now it's longer but can handle an ending slash. Has someone a better version ?
class String
def to_path(end_slash=false)
"#{'/' if self[0]=='\\'}#{self.split('\\').join('/')}#{'/' if end_slash}"
end
end
puts '\\server\share\folder'.to_path(true) #//server/share/folder/
puts 'c:\folder'.to_path #c:/folder

The portable way to write paths is with Ruby's File#join method. This will create OS-independent paths, using the right path separators.
For UNC paths, this previous answer addresses the creation of a custom File#to_unc method:
def File.to_unc( path, server="localhost", share=nil )
parts = path.split(File::SEPARATOR)
parts.shift while parts.first.empty?
if share
parts.unshift share
else
# Assumes the drive will always be a single letter up front
parts[0] = "#{parts[0][0,1]}$"
end
parts.unshift server
"\\\\#{parts.join('\\')}"
end
I haven't tried it myself, but it would appear to be the result you're looking for.

Related

Regex in Ruby for a URL that is an image

So I'm working on a crawler to get a bunch of images on a page that are saved as links. The relevant code, at the moment, is:
def parse_html(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//a[#href]")
nodes.inject([]) do |uris, node|
uris << node.attr('href').strip
end.uniq
end
I am current getting a bunch of links, most of which are images, but not all. I want to narrow down the links before downloading with a regex. So far, I haven't been able to come up with a Ruby-Friendly regex for the job. The best I have is:
^https?:\/\/(?:[a-z0-9\-]+\.)+[a-z]{2,6}(?:/[^\/?]+)+\.(?:jpg|gif|png)$.match(nodes)
Admittedly, I got that regex from someone else, and tried to edit it to work and I'm failing. One of the big problems I'm having is the original Regex I took had a few "#"'s in it, which I don't know if that is a character I can escape, or if Ruby is just going to stop reading at that point. Help much appreciated.
I would consider modifying your XPath to include your logic. For example, if you only wanted the a elements that contained an img you can use the following:
"//a[img][#href]"
Or even go further and extract just the URIs directly from the href values:
uris = html_doc.xpath("//a[img]/#href").map(&:value)
As some have said, you may not want to use Regex for this, but if you're determined to:
^http(s?):\/\/.*\.(jpeg|jpg|gif|png)
Is a pretty simple one that will grab anything beginning with http or https and ending with one of the file extensions listed. You should be able to figure out how to extend this one, Rubular.com is good for experimenting with these.
Regexp is a very powerful tool but - compared to simple string comparisons - they are pretty slow.
For your simple example, I would suggest using a simple condition like:
IMAGE_EXTS = %w[gif jpg png]
if IMAGE_EXTS.any? { |ext| uri.end_with?(ext) }
# ...
In the context of your question, you might want to change your method to:
IMAGE_EXTS = %w[gif jpg png]
def parse_html(html)
uris = []
Nokogiri::HTML(html).xpath("//a[#href]").each do |node|
uri = node.attr('href').strip
uris << uri if IMAGE_EXTS.any? { |ext| uri.end_with?(ext) }
end
uris.uniq
end

How to check for multiple words inside a folder

I have a words in a text file called words.txt, and I need to check if any of those words are in my Source folder, which also contains sub-folders and files.
I was able to get all of the words into an array using this code:
array_of_words = []
File.readlines('words.txt').map do |word|
array_of_words << word
end
And I also have (kinda) figured out how to search through the whole Source folder including the sub-folders and sub-files for a specific word using:
Dir['Source/**/*'].select{|f| File.file?(f) }.each do |filepath|
puts filepath
puts File.readlines(filepath).any?{ |l| l['api'] }
end
Instead of searching for one word like api, I want to search the Source folder for the whole array of words (if that is possible).
Consider this:
File.readlines('words.txt').map do |word|
array_of_words << word
end
will read the entire file into memory, then convert it into individual elements in an array. You could accomplish the same thing using:
array_of_words = File.readlines('words.txt')
A potential problem is its not scalable. If "words.txt" is larger than the available memory your code will have problems so be careful.
Searching a file for an array of words can be done a number of ways, but I've always found it easiest to use a regular expression. Perl has a great module called Regexp::Assemble that makes it easy to convert a list of words into a very efficient pattern, but Ruby is missing that sort of functionality. See "Is there an efficient way to perform hundreds of text substitutions in Ruby?" for one solution I put together in the past to help with that.
Ruby does have Regexp.union however it's only a partial help.
words = %w(foo bar)
re = Regexp.union(words) # => /foo|bar/
The pattern generated has flags for the expression so you have to be careful with interpolating it into another pattern:
/#{re}/ # => /(?-mix:foo|bar)/
(?-mix: will cause you problems so don't do that. Instead use:
/#{re.source}/ # => /foo|bar/
which will generate the pattern and behave like we expect.
Unfortunately, that's not a complete solution either, because the words could be found as sub-strings in other words:
'foolish'[/#{re.source}/] # => "foo"
The way to work around that is to set word-boundaries around the pattern:
/\b(?:#{re.source})\b/ # => /\b(?:foo|bar)\b/
which then look for whole words:
'foolish'[/\b(?:#{re.source})\b/] # => nil
More information is available in Ruby's Regexp documentation.
Once you have a pattern you want to use then it becomes a simpler matter to search. Ruby has the Find class, which makes it easy to recursively search directories for files. The documentation covers how to use it.
Alternately, you can cobble your own method using the Dir class. Again, it has examples in the documentation to use it, but I usually go with Find.
When reading the files you're scanning I'd recommend using foreach to read the files line-by-line. File.read and File.readlines are not scalable and can make your program behave erratically as Ruby tries to read a big file into memory. Instead, foreach will result in very scalable code that runs more quickly. See "Why is "slurping" a file not a good practice?" for more information.
Using the links above you should be able to put something together quickly that'll run efficiently and be flexible.
This untested code should get you started:
WORD_ARRAY = File.readlines('words.txt').map(&:chomp)
WORD_RE = /\b(?:#{Regexp.union(WORD_ARRAY).source}\b)/
Dir['Source/**/*'].select{|f| File.file?(f) }.each do |filepath|
puts "#{filepath}: #{!!File.read(filepath)[WORD_RE]}"
end
It will output the file it's reading, and "true" or "false" whether there is a hit finding one of the words in the list.
It's not scalable because of readlines and read and could suffer serious slowdown if any of the files are huge. Again, see the caveats in the "slurp" link above.
Recursively searches directory for any of the words contained in words.txt
re = /#{File.readlines('words.txt').map { |word| Regexp.quote(word.strip) }.join('|')}/
Dir['Source/**/*.{cpp,txt,html}'].select{|f| File.file?(f) }.each do |filepath|
puts filepath
puts File.readlines(filepath, "r:ascii").grep(re).any?
end

`File.exist?` won't find file that exists

I'm trying to use File.exist? to have a dynamic background. When I try to to point File.exist? to a specific file it doesn't seem to find it.
image_path = "../assets/" + city_code + ".jpg"
if
File.exist?('#{image_path}')
#image = image_path
else
#image = "../assets/coastbanner.jpg"
end
If I remove replace city_code with a path that I know exists, it still won't find it. I must bee missing something small.
I wanted to be a little more clear: If I remove the interpolation/variable and replace it with a path that I know works, '../assets/coastbanner.jpg' for example, it still will not work. In fact, no valid path seems to get it to yield a true response.
There are some issues with your code:
String interpolation works only with double quotes
There is no need for a string interpolation when the variable is a string already
the condition belongs in the same line than the if
Furthermore the folder in which the server is running might not be the same folder than the current webpage is served from. That said it might make sense to use absolute path for image links and path including Rails.root for the File.exist? test.
I am not sure about your application's folder structure, but something like the following should work:
image_path = "/assets/#{city_code}.jpg"
if File.exist?(Rails.root.join('app', image_path))
#image = image_path
else
#image = '/assets/coastbanner.jpg'
end
Tip: Test with the coastbanner.jpg to get the image apth correct and then with the File.exist? to get the path in the app.
You need to use double quotes for the string interpolation to work.
File.exist?("#{image_path}")
In this case, you could actually just use the variable since it already is the string you want.
File.exist?(image_path)
If you want to use the #{} feature, a good place to use it is when creating image_path.
image_path = "../assets/#{city_code}.jpg"
if
File.exist?( image_path )
Don't think you can have #{} in a string with single quotes. Try double-quoting it.
I assume you are using with in a Rails application, and if so, the most probable cause would be:
When Rails creates assets it creates a digest. Because of this your image name might have a digest string in the name. So I suggest to use image_path or asset_path
Ex: File.exist?(image_path("#{city_code}.jpg"))
But having said that, proper was to handle these kind of cases would be to have a flag in the database if you want to set the the custom image or not. Because it's much faster than accessing the file system.

Capturing parts of an absolute filepath in Ruby

I'm writing a class that parses a filename. I've got 3 questions:
The regex
Given hello/this/is/my/page.html I want to capture three parts:
The parent folders: hello/this/is/my
The filename itself: page
The extension: .html
This is the regex: /^((?:[^\/]+\/)*)(\w+)(\.\w+)$/
The problem is that when I tried this (using Rubular), when I use a relative pathfile such as page.html, it all gets captured into the first capturing group.
Can someone suggest a regex that works correctly for both relative and absolute filepaths?
The class
Would this class be ok?
class RegexFilenameHelper
filenameRegex = /^((?:[^\/]+\/)*)(\w+)(\.\w+)$/
def self.getParentFolders(filePath)
matchData = filenameRegex.match(filePath)
return matchData[1]
end
def self.getFileName(filePath)
# ...
end
def self.getFileExtension(filePath)
# ...
end
end
I understand that it's inefficient to call .match for every function, but I don't intend to use all three functions sequentially.
I also intend to call the class itself, and not instantiate an object.
An aside
Assuming this is important: would you rather capture .html or html, and why?
Using the standard library:
As Tim Pietzcker suggested, the functionality is already implemented in the Pathname and File classes.
filepath = "hello/this/is/my/page.html"
Getting the parents: File.dirname(filepath) => "hello/this/is/my"
Getting the name: File.basename(filepath) => "page.html"
without extension: File.basename(filepath, File.extname(filepath)) => "page"
Getting the extension: File.extname(filepath) => ".html"
We call class methods without having to instantiate any class, which is exactly what I wanted.
It's not necessary for the file or folders to actually exist in the file system!
Thanks to Tim Pietzcker for letting me know!
Using regex:
If I had wanted to do it with regex, the correct regex would be ((?:^.*\/)?)([^\/]+)(\..*$).
((?:^.*\/)?): Captures everything before the last /, or nothing (that's what the last ? is for). This is the parent path, which is optional.
([^\/]+): Gets everything that's not /, which is the filename.
(\..*$): Captures everything coming after the last ., including it.
I tried this in Rubular and it worked like a charm, but I'm still not sure if the second capturing group is too broad, so be careful if you use this!
Thanks to user230910 for helping me get there! :)

Get full path in Sinatra route including everything after question mark

I have the following path:
http://192.168.56.10:4567/browse/foo/bar?x=100&y=200
I want absolutely everything that comes after "http://192.168.56.10:4567/browse/" in a string.
Using a splat doesn't work (only catches "foo/bar"):
get '/browse/*' do
Neither does the regular expression (also only catches "foo/bar"):
get %r{/browse/(.*)} do
The x and y params are all accessible in the params hash, but doing a .map on the ones I want seems unreasonable and un-ruby-like (also, this is just an example.. my params are actually very dynamic and numerous). Is there a better way to do this?
More info: my path looks this way because it is communicating with an API and I use the route to determine the API call I will make. I need the string to look this way.
If you are willing to ignore hash tag in path param this should work(BTW browser would ignore anything after hash in URL)
updated answer
get "/browse/*" do
p "#{request.path}?#{request.query_string}".split("browse/")[1]
end
Or even simpler
request.fullpath.split("browse/")[1]
get "/browse/*" do
a = "#{params[:splat]}?#{request.env['rack.request.query_string']}"
"Got #{a}"
end

Resources