How to use gsub to extract and replace in Ruby? - ruby

I have a bunch of markdown image paths in several files and I want to change the root directory. The regex for the image tag is this:
/\!\[image\]\((.*?)\)/
I need to be able to grab the group, parse out the filename and give it a new path before returning it to gsub to be substituted out.
For instance, I want to find all strings like this:
![image](/old/path/to/image1.png)
And convert them to:
![image](/new/path/to/image1.png)
I know I can do this in a gsub block, I'm just not very clear how it works.

Here's one way, verbosely for clarity's sake:
markdown = "![image](/old/path/to/image1.png)"
regex = /(\w+.png)/
match_data = regex.match markdown
p base_name = match_data[1]
#=> "image1.png"
p new_markdown = "![image](/new/path/to/#{base_name})"
#=> "![image](/new/path/to/image1.png)"
More succinctly:
p markdown.gsub( /\/.+(\w+.png)/, "/new/path/to/#{$1}" )
#=> "![image](/new/path/to/image1.png)"

You can use a regular expression with positive lookbehind and positive lookahead to replace only the filename part in the original String. I have a new_path variable holding the new path, and simply substitute that using .sub.
img = "![image](/old/path/to/image1.png)"
new_path = '/new/path/to/image1.png'
p img.sub(/(?<=!\[image\]\()[^)]+(?=\))/, new_path)
# => "![image](/new/path/to/image1.png)"

Related

Combination of strings, other variables and regexp in variable declaration

I'm trying to move files to other directories with FileUtils.mv. I'm trying to define a variable called name_convention, which is a mix of strings, other variables and I also want to include a regexp, where I'm failing. My code so far:
#these are my other variables already declared from an array
season = array[11..13]
episode = array[15..17]
#and this is my 'name_convention' variable
name_convention = "friends" + season + episode + "bluray.mkv"
Up to here, everything is working fine. Except that between friends and season, there can be either a . or a _. For example:
friends_s01e01_bluray.mkv
friends.s01e01.bluray.mkv
I tried to use a regexp, like /(\.|-)/, but I got the error: no implicit conversion of regex into string ruby
How can I provide the two options to my name_convention variable, so that it can be applied to both filenames?
You're trying to interpolate a regex into a string, but you need to do the opposite - interpolate the strings into the regex:
season = "s01"
episode = "e01"
regex = /friends[\._]#{Regexp.escape(season)}#{Regexp.escape(episode)}.bluray.mkv/
regex.match "friends_s01e01_bluray.mkv"
# => MatchData
regex.match "friends.s01e01_bluray.mkv"
# => MatchData
regex.match "friends-s01e01_bluray.mkv"
# => nil
For this particular example (s01 and e01) you don't need the Regexp.escape but it's a good idea to include it just in case.
If you're looking for a quick and dirty sNNeNN parser, try this:
def parse_episode(str)
m = str.match(/\A(.*?)[\-\_\.]?(s\d+)(e\d+)[\-\_\.]?(.*)\z/i)
# If matched, strip out the first entry which is the complete match
m&.to_a&.drop(1)
end
Where this produces results like:
parse_episode('snowpiercer-s01e01-stream')
# => ["snowpiercer", "s01", "e01", "stream"]
parse_episode('s01')
# => nil
parse_episode('wilford')
# => nil
parse_episode('simpsons_S04E12_monorail')
# => ["simpsons", "S04", "E12", "monorail"]
parse_episode('simpsons.S04E12')
# => ["simpsons", "S04", "E12", ""]

Use ARGV[] argument vector to pass a regular expression in Ruby

I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].

Ruby regex replace some subpattern captures

I have regex for path parsing. Below is the part of regex that repeats multiple times.
dir_pattern = /
\/?
(?<dir> #pattern to catch directory
[^[:cntrl:]\/\n\r]+ #directory name
)
(?=\/) #indistinguishable from file otherwise
/x
Input:
/really/long/absolute/path/to/file.extension
Desired output:
to/really/long/file.extension
I want to cut off some (not all directories) and reorder remaining ones. How could I achieve that?
Since I'm already using regexes for filtering files needed, I would like to keep using them.
Ok, here is a regex answer based on the new information posted above:
rx = /\/[^\/]+/i
# matches each character that is not a '/'
# this ensures any character like a '.' in a file name or the dot
# in the extension is kept.
path = '/really/long/absolute/path/to/file.extension'
d = path.scan(rx)
# returns an array of all matches ["/really", "/long", "/absolute", "/path", "/to", "/file.extension"]
new_path = [y[4], y[0], y[1], y[-1]].join
# returns "to/really/long/file.extension"
Lets wrap it in a method:
def short_path(path, keepers)
rx = /\/[^\/]+/i
d = path.scan(rx)
new_path = []
keepers.each do |dir|
new_path << d[dir]
end
new_path << d[-1]
new_path.join
end
Usage: just past the method the path and an array of the positions you want to keep in the new order.
path = '/really/long/absolute/path/to/file.extension'
new_path = short_path(path, [4,0,1])
# returns '/to/really/long/file.extension'
If you need to remove the first '/' for a relative path just:
new_path.sub!(/\//, '')
Old answer using string manipulation without regex...
x = "01234567 capture me!"
puts "#{x[7]}#{x[4]}#{x2}"
#=> "742"

Create regular expression from string

Is there any way to create the regex /func:\[sync\] displayPTS/ from string func:[sync] displayPTS?
The story behind this question is that I have serval string pattens to search against in a text file and I don't want to write the same thing again and again.
File.open($f).readlines.reject {|l| not l =~ /"#{string1}"/}
File.open($f).readlines.reject {|l| not l =~ /"#{string2}"/}
Instead , I want to have a function to do the job:
def filter string
#build the reg pattern from string
File.open($f).readlines.reject {|l| not l =~ pattern}
end
filter string1
filter string2
s = "func:[sync] displayPTS"
# => "func:[sync] displayPTS"
r = Regexp.new(s)
# => /func:[sync] displayPTS/
r = Regexp.new(Regexp.escape(s))
# => /func:\[sync\]\ displayPTS/
I like Bob's answer, but just to save the time on your keyboard:
string = 'func:\[sync] displayPTS'
/#{string}/
If the strings are just strings, you can combine them into one regular expression, like so:
targets = [
"string1",
"string2",
].collect do |s|
Regexp.escape(s)
end.join('|')
targets = Regexp.new(targets)
And then:
lines = File.readlines('/tmp/bar').reject do |line|
line !~ target
end
s !~ regexp is equivalent to not s =~ regexp, but easier to read.
Avoid using File.open without closing the file. The file will remain open until the discarded file object is garbage collected, which could be long enough that your program will run out of file handles. If you need to do more than just read the lines, then:
File.open(path) do |file|
# do stuff with file
end
Ruby will close the file at the end of the block.
You might also consider whether using find_all and a positive match would be easier to read than reject and a negative match. The fewer negatives the reader's mind has to go through, the clearer the code:
lines = File.readlines('/tmp/bar').find_all do |line|
line =~ target
end
How about using %r{}:
my_regex = "func:[sync] displayPTS"
File.open($f).readlines.reject { |l| not l =~ %r{#{my_regex}} }

Cut off the filename and extension of a given string

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:
def getFiles(directory)
arr = Dir[directory + '/**/*.plt']
arr.each do |k|
puts "#{k}"
end
end
The output is the path and the files. But I want only the path.
Instead of /foo/bar.txt I want only the /foo/
My first thought was a regexp but I am not sure how to do that.
Could File.dirname be of any use?
File.dirname(file_name ) → dir_name
Returns all components of the filename
given in file_name except the last
one. The filename must be formed using
forward slashes (``/’’) regardless of
the separator used on the local file
system.
File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"
You don't need a regex or split.
File.dirname("/foo/bar/baz.txt")
# => "/foo/bar"
The following code should work (tested in the ruby console):
>> path = "/foo/bar/file.txt"
=> "/foo/bar/file.txt"
>> path[0..path.rindex('/')]
=> "/foo/bar/"
rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461
Good luck!
I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.
path = '/foo/bar.txt'
path = path.split '/'
path.pop
path = path.join '/'
# path is now '/foo'
not sure what language your in but here is the regex for the last / to the end of the string.
/[^\/]*+$/
Transliterates to all characters that are not '/' before the end of the string
For a regular expression, this should work, since * is greedy:
.*/

Resources