ruby regexp to replace equations - ruby

I have a some HTML text in mathjax format:
text = "an inline \\( f(x) = \frac{a}{b} \\) equation, a display equation \\[ F = m a \\] \n and another inline \\(y = x\\)"
(Note: equations are delimited by single slashes, e.g. \(, not \\(, the extra \ is just escaping the first one for ruby text).
I want to get the output that substitutes this into, say an image created by latex.codecogs, e.g.
desired_output = "an inline <img src="http://latex.codecogs.com/png.latex?f(x) = \frac{a}{b}\inline"/> equation, a display equation <img src="http://latex.codecogs.com/png.latex?F = m a"/> \n and another inline <img src="http://latex.codecogs.com/png.latex?y = x\inline"/> "
Using Ruby. I try:
desired = text.gsub("(\\[)(.*?)(\\])", "<img src=\"http://latex.codecogs.com/png.latex?\2\" />")
desired = desired.gsub("(\\()(.*?)(\\))", "<img src=\"http://latex.codecogs.com/png.latex?\2\\inline\")
desired
But this is unsuccessful, returning only the original input. What did I miss? How do I construct this query appropriately?

Try:
desired = text.gsub(/\\\[\s*(.*?)\s*\\\]/, "<img src=\"http://latex.codecogs.com/png.latex?\\1\"/>")
desired = desired.gsub(/\\\(\s*(.*?)\s*\\\)/, "<img src=\"http://latex.codecogs.com/png.latex?\\1\inline\"/>")
desired
The important changes that had to happen:
The first parameter for gsub should be a regex (as Anthony mentioned)
If the second parameter is a double-quoted string, then the back references have to be like \\2 (instead of just \2) (see the rdoc)
The first parameter was not escaping the \
There were a couple of other minor formatting things (spaces, etc).

Not sure if your regexp is right - but in Ruby, Regexp are delimited by //, try like this :
desired = text.gsub(/(\\[)(.*?)(\\])/, "<img src=\"http://latex.codecogs.com/png.latex?\2\" />")
You were trying to do string substition, and of course gsub wasn't finding a string containing (\\[)(.*?)(\\])

Related

Extract values after pattern in Ruby string

I have a string like this:
"<root><some ProdCode=\"40\" ProducerName=\"demo1\" ProdCode=\"40\" Need_Confirmation=\"1\"/><some ProdCode=\"40\" ProducerName=\"demo1\" ProdCode=\"40\" Need_Confirmation=\"1\"/></root>"
I'm trying to pull the content from this string which is between =\"content\" and put it in an array, like ["40","demo1","40","1",40......]
You should use :scan to select elements by regexp pattern. Then remove escape characters.
string.scan(/"[^"]+"/).map { |element| element.delete('\\"') }
Explanation of pattern:
/ – regexp starts
" – first char should be "
[^"]+ – next should be any char except ". + sign says that number of such chars should be at least 1.
" – next should be again "
/ – regexp ends
So string.scan(/"[^"]+"/) would return:
["\"40\"", "\"demo1\"", "\"40\"", "\"1\"", "\"40\"", "\"demo1\"", "\"40\"", "\"1\""]
Then we can just delete \" using :delete method.
Convenient tool to build regexps is http://rubular.com/
When your string is this simple you can use scan + regular expression like this:
result = html.scan(/ProdCode="\d+?"/)
If it is more complex you can use a html parser like nokogiri or oga.

How can I select quoted strings that are outside html tags?

I am working on a syntax highlighter in ruby. From this input string (processed per line):
"left"<div class="wer">"test"</div>"right"
var car = ['Toyota', 'Honda']
How can I find "left", and "right" in the first line, 'Toyota', and 'Honda' on the second line?
I have (["'])(\\\1|[^\1]*?)\1 to highlight the quoted strings. I am struggling with the negative look behind part of the regex.
I tried appending another regex (?![^<]*>|[^<>]*<\/), but I can't get it to work with quoted strings. It works with simple alphanumeric only.
You can match one or more tokens by creating groups using parentheses in regex, and using | to create an or condition:
/("left")|("right")|('Toyota')|('Honda')/
Here's an example:
http://rubular.com/r/C8ONnxKYEV
EDIT
Just saw the tile of your question specified that you want to search outside HTML tags.
Unfortunately this isn't possible using only Regular expressions. The reason is that HTML, along with any language that requires delimiters like "", '', (), aren't regular. In other words, regexes don't contain a way of distinguishing levels of nesting and therefore you'll need to use a parser along with your Regex. If you're doing this strictly in Ruby, consider using a tool like Nokogiri or Mechanize to properly parse and interact with the DOM.
Description
This Ruby script first finds and replaces the HTML tags, note this is not perfect, and is susceptible to many edge cases. Then the script just looks for all the single and double quoted values.
str = %Q["left" <div class="wer">"test"</div>"right"\n]
str = str + %Q<var car = ['Toyota', 'Honda']>
puts "SourceString: \n" + str + "\n\n"
str.gsub!(/(?:<([a-z]+)(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?>).*?<\/\1>/i, '_')
puts "SourceString after replacement: \n" + str + "\n\n"
puts "array of quoted values"
str.scan(/"[^"]*"|'[^']*'/)
Sample Output
SourceString:
"left" <div class="wer">"test"</div>"right"
var car = ['Toyota', 'Honda']
SourceString after replacement:
"left" _"right"
var car = ['Toyota', 'Honda']
=> ["\"left\"", "\"right\"", "'Toyota'", "'Honda'"]
Live Example
https://repl.it/CRGo
HTML Parsing
I do recommend using an HTML parsing engine instead. This one seems pretty decent for Ruby: https://www.ruby-toolbox.com/categories/html_parsing

regex replace [ with \[

I want to write a regex in Ruby that will add a backslash prior to any open square brackets.
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
# desired out = "my.name\[0].hello.line\[2]"
I've tried multiple combinations of backslashes in the substitution string and can't get it to leave a single backslash.
You don't need a regular expression here.
str = "my.name[0].hello.line[2]"
puts str.gsub('[', '\[')
# my.name\[0].hello.line\[2]
I tried your code and it worked correct:
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
puts out #my.name\[0].hello.line\[2]
If you replace putswith p you get the inspect-version of the string:
p out #"my.name\\[0].hello.line\\[2]"
Please see the " and the masked \. Maybe you saw this result.
As Daniel already answered: You can also define the string with ' and don't need to mask the values.

Reformatting dates

I'm trying to reformat German dates (e.g. 13.03.2011 to 2011-03-13).
This is my code:
str = "13.03.2011\n14:30\n\nHannover Scorpions\n\nDEG Metro Stars\n60\n2 - 3\n\n\n\n13.03.2011\n14:30\n\nThomas Sabo Ice Tigers\n\nKrefeld Pinguine\n60\n2 - 3\n\n\n\n"
str = str.gsub("/(\d{2}).(\d{2}).(\d{4})/", "/$3-$2-$1/")
I get the same output like input. I also tried my code with and without leading and ending slashes, but I don't see a difference. Any hints?
I tried to store my regex'es in variables like find = /(\d{2}).(\d{2}).(\d{4})/ and replace = /$3-$2-$1/, so my code looked like this:
str = "13.03.2011\n14:30\n\nHannover Scorpions\n\nDEG Metro Stars\n60\n2 - 3\n\n\n\n13.03.2011\n14:30\n\nThomas Sabo Ice Tigers\n\nKrefeld Pinguine\n60\n2 - 3\n\n\n\n"
find = /(\d{2}).(\d{2}).(\d{4})/
replace = /$3-$2-$1/
str = str.gsub(find, replace)
TypeError: no implicit conversion of Regexp into String
from (irb):4:in `gsub'
Any suggestions for this problem?
First mistake is the regex delimiter. You do not need place the regex as string. Just place it inside a delimiter like //
Second mistake, you are using captured groups as $1. Replace those as \\1
str = str.gsub(/(\d{2})\.(\d{2})\.(\d{4})/, "\\3-\\2-\\1")
Also, notice I have escaped the . character with \., because in regex . means any character except \n

Escaping single and double quotes in a string in ruby?

How can I escape single and double quotes in a string?
I want to escape single and double quotes together. I know how to pass them separately but don't know how to pass both of them.
e.g: str = "ruby 'on rails" " = ruby 'on rails"
My preferred way is to not worry about escaping and instead use %q, which behaves like a single-quote string (no interpolation or character escaping), or %Q for double quoted string behavior:
str = %q[ruby 'on rails" ] # like single-quoting
str2 = %Q[quoting with #{str}] # like double-quoting: will insert variable
See https://docs.ruby-lang.org/en/trunk/syntax/literals_rdoc.html#label-Strings and search for % strings.
Use backslash to escape characters
str = "ruby \'on rails\" "
Here is a complete list:
From http://learnrubythehardway.org/book/ex10.html
You can use Q strings which allow you to use any delimiter you like:
str = %Q|ruby 'on rails" " = ruby 'on rails|
>> str = "ruby 'on rails\" \" = ruby 'on rails"
=> "ruby 'on rails" " = ruby 'on rails"
I would go with a heredoc if I'm starting to have to worry about escaping. It will take care of it for you:
string = <<MARKER
I don't have to "worry" about escaping!!'"!!
MARKER
MARKER delineates the start/end of the string. start string on the next line after opening the heredoc, then end the string by using the delineator again on it's own line.
This does all the escaping needed and converts to a double quoted string:
string
=> "I don't have to \"worry\" about escaping!!'\"!!\n"
I would use just:
str = %(ruby 'on rails ")
Because just % stands for double quotes(or %Q) and allows interpolation of variables on the string.
Here is an example of how to use %Q[] in a more complex scenario:
%Q[
<meta property="og:title" content="#{#title}" />
<meta property="og:description" content="#{#fullname}'s profile. #{#fullname}'s location, ranking, outcomes, and more." />
].html_safe
One caveat:
Using %Q[] and %q[] for string comparisons is not intuitively safe.
For example, if you load something meant to signify something empty, like "" or '', you need to use the actual escape sequences. For example, let's say qvar equals "" instead of any empty string.
This will evaluate to false
if qvar == "%Q[]"
As will this,
if qvar == %Q[]
While this will evaluate to true
if qvar == "\"\""
I ran into this issue when sending command-line vars from a different stack to my ruby script. Only Gabriel Augusto's answer worked for me.

Resources