Is there a gsub alternative for ruby that can run a method everytime a match is made? - ruby

I need to run a script that will rewrite the folder path of a html file, there will be many matches, and the replacement string needs to be computed, something like
"Html string".gsub( /images/([a-zA-Z0-9\-]+)/, "/images/#{replacement_method($1)}/" )
only problem is gsub, at least to my knowledge, will only run the replacement_method() once, and I need it to run every time since the desired replacement string changes occurring to the folder string.
Is there a way to make this work with gsub? something like the replace function in wordpress?
Any other realistic approaches?

You have to use a block:
"Html string".gsub( /images/(folder)/) { |match| "/images/#{replacement_method(match)}/" }
The block will be called for each match in the string.

Related

How do I filter file names out of a SQLite dump?

I'm trying to filter out all file names from an SQLite text dump using Ruby. I'm not very handy/familiar with regex and need a way to read, and write to a file, another dump of image files that are within the SQLite dump. I can filter out everything except stuff like this:
VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);
and this:
src="/images/9/94/folder%2FGraph.JPG"
I can't figure out the easiest way to filter through this. I've tried using split and other functions, but instead of splitting the string into an array by the character specified, it just removed the character.
You should be able to use .gsub('%2', ' ') the %2 with a space, while quoted, it should be fine.
Split does remove the character that is being split, though. So you may not want to do that, or if you do, you may want to use the Array#join method with the argument of the character you split with to put it back in.
I want to 'extract' the file name from the statements above. Say I have src="/images/9/94/folder%2FGraph.JPG", I want folder%2FGraph.JPG to be extracted out.
If you want to extract what is inside the src parameter:
foo = 'src="/images/9/94/folder%2FGraph.JPG"'
foo[/^src="(.+)"/, 1]
=> "/images/9/94/folder%2FGraph.JPG"
That returns a string without the surrounding parenthesis.
Here's how to do the first one:
bar = "VALUES(3,5,1,43,'/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG','1415',NULL);"
bar.split(',')[4][1..-2]
=> "/images/e/e5/Folder%2FOrders%2FFinding_Orders%2FView_orders3.JPG"
Not everything in programming is a regex problem. Somethings, actually, in my opinion, most things, are not candidates for a pattern. For instance, the first example could be written:
foo.split('=')[1][1..-2]
and the second:
bar[/'(.+?)'/, 1]
The idea is to use whichever is most clean and clear and understandable.
If all you want is the filename, then use a method designed to return only the filename.
Use one of the above and pass its output to File.basename. Filename.basename returns only the filename and extension.

Content Inside Parenthesis Regular Expression Ruby

I'm trying to take out the the content inside the parenthesis. For example, if the string is "(blah blah) This is stack(over)flow", I want to just take out "(blah blah)" but leave "(over)" alone. I'm trying
/\A\(.*\)/
but returns "(blah blah) This is stack(over)", and I'm sure why it's returning that.
Easiest fix:
/\A\(.*?\)/
Normally, * will try to match as much as it possibly can, so it'll match all the way to the last ) in the line. This is called "greedy" matching. Putting ? after +/*/? makes them non-greedy, and they'll match the shortest possible string.
But note that this won't work for nested parentheses. That's rather more complicated. Given your example, I assume this is for a pretty simple ad-hoc format where nesting isn't a concern.

Replacing partial regex matches in place with Ruby

I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

what does the empty regex match in ruby?

following a RoR security tutorial (here), i wrote something along the lines of
##private_re = //
def secure?
action_name =~ ##private_re
end
the idea is that in the base case, this shouldn't match anything, and return nil. problem is that it doesn't. i've worked around for the time being by using a nonsensical string, but i'd like to know the answer.
The empty regular expression successfully matches every string.
Examples of regular expressions that will always fail to match:
/(?=a)b/
/\Zx\A/
/[^\s\S]/
It is meant to not change the behavior of the controller in any way, as // will match every string.
The idea is that ##private is meant to be set in the controller to match things you DO want to be private. Thus, that code is meant to do nothing, but when combined with
##private = /.../ in the controller, gives you a nice privacy mechanism.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I donĀ“t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources