I would like to escape the # with \ when they appear in \href commands.
Normally I would write a regex such as s/(\\href\{.*?)#(.*?)\}/\1\\#\2/g, but I imagine gsub would we a good choice here to first extract the \href content and then replace # with \#.
Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.
There can be multiple links in one line.
Question
Can gsub simplify these sorts of problems?
Except if one or several of the urls contained inside \href{..}s has a password part enclosed between quotes like http://username:"sdkfj#lkn#"#domainname.org/path/file.ext, the only possible place for the character # in a url is at the end and delimits the fragment part: ./path/path/file.rb?val=toto#thefragmentpart.
In other words, if I am not wrong there's max one # to escape per href{...}. Then you can simply do that:
text.gsub(/\\href{[^#}]*\K#/, "\\#")
The character class [^#}] forbids the character } and ensures that you are always between curly brackets.
You could use two gsubs : one with an argument and a block (for href{...}), one with 2 arguments (to replace # with \#):
text = %q(Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.)
puts text.gsub(/href{[^}]+}/){ |href| href.gsub('#', '\#') }
#=> Here is some text with a \href{./file.pdf\#section.1.5}{link} to section 1.5.
If you want to launch it from a terminal with ruby -e for a test.txt file, you can use:
ruby -pe '$_.gsub(/href{[^}]+}/){ |href| href.gsub(%q|#|, %q|\#|) }' test.txt
# Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.
# Here is some text with a \href{./file.pdf#section.1.6}{link} to section 1.6.
# Here is some text with a \href{./file.pdf#section.1.7}{link} to section 1.7.
or
ruby -e 'puts ARGF.read.gsub(/href{[^}]+}/){ |href| href.gsub(%q|#|, %q|\#|) }' test.txt
# Here is some text with a \href{./file.pdf#section.1.5}{link} to section 1.5.
# Here is some text with a \href{./file.pdf#section.1.6}{link} to section 1.6.
# Here is some text with a \href{./file.pdf#section.1.7}{link} to section 1.7.
Do not mix ruby -pe and ARGF.read, it would only read the first line of your file!
I recently used the <<- operator to output a multi-line string, like this:
<<-form
<h1>Name to say hi!</h1>
<form method="post">
<input type="text" name="name">
<input type="submit" value="send">
</form>
form
But I stole the <<- operator from some Open Source code, but I didn't find any documentation on it.
I kinda figured out that it works the same as in bash:
$ cat <<EOF >> form.html
> <h1>Name to say hi!</h1>
> <form method="post">
> <input type="text" name="name">
> <input type="submit" value="send">
> </form>
> EOF
Does it work that way? I just wanna find documentation on it.
From The Ruby Programming Language:
Here Documents
For long string literals, there may be no single character delimiter that can be used without worrying about remembering to escape characters within the literal. Ruby's solution to this problem is to allow you to specify an arbitrary sequence of characters to serve as the delimiter for the string. This kind of literal is borrowed from Unix shell syntax and is historically known as a here document. (Because the document is right here in the source code rather than in an external file.)
Here documents begin with << or <<-. These are followed immediately (no space is allowed, to prevent ambiguity with the left-shift operator) by an identifier or string that specifies the ending delimiter. The text of the string literal begins on the next line and continues until the text of the delimiter appears on a line by itself. For example:
document = <<HERE # This is how we begin a here document
This is a string literal.
It has two lines and abruptly ends...
HERE
The Ruby interpreter gets the contents of a string literal by reading a line at a time from its input. This does not mean, however, that the << must be the last thing on its own line. In fact, after reading the content of a here document, the Ruby interpreter goes back to the line it was on and continues parsing it. The following Ruby code, for example, creates a string by concatenating two here documents and a regular single-quoted string:
greeting = <<HERE + <<THERE + "World"
Hello
HERE
There
THERE
The <<HERE on line 1 causes the interpreter to read lines 2 and 3. And the <<THERE causes the interpreter to read lines 4 and 5. After these lines have been read, the three string literals are concatenated into one.
The ending delimiter of a here document really must appear on a line by itself: no comment may follow the delimiter. If the here document begins with <<, then the delimiter must start at the beginning of the line. If the literal begins with <<- instead, then the delimiter may have whitespace in front of it. The newline at the beginning of a here document is not part of the literal, but the newline at the end of the document is. Therefore, every here document ends with a line terminator, except for an empty here document, which is the same as "":
empty = <<END
END
If you use an unquoted identifier as the terminator, as in the previous examples, then the here document behaves like a double-quoted string for the purposes of interpreting backslash escapes and the # character. If you want to be very, very literal, allowing no escape characters whatsoever, place the delimiter in single quotes. Doing this also allows you to use spaces in your delimiter:
document = <<'THIS IS THE END, MY ONLY FRIEND, THE END'
.
. lots and lots of text goes here
. with no escaping at all.
.
THIS IS THE END, MY ONLY FRIEND, THE END
The single quotes around the delimiter hint that this string literal is like a single-quoted string. In fact, this kind of here document is even stricter. Because the single quote is not a delimiter, there is never a need to escape a single quote with a backslash. And because the backslash is never needed as an escape character, there is never a need to escape the backslash itself. In this kind of here document, therefore, backslashes are simply part of the string literal.
You may also use a double-quoted string literal as the delimiter for a here document. This is the same as using a single identifier, except that it allows spaces within the delimiter:
document = <<-"# # #" # This is the only place we can put a comment
<html><head><title>#{title}</title></head>
<body>
<h1>#{title}</h1>
#{body}
</body>
</html>
# # #
Note that there is no way to include a comment within a here document except on the first line after the << token and before the start of the literal. Of all the # characters in this code, one introduces a comment, three interpolate expressions into the literal, and the rest are the delimiter
http://www.ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#here_doc
This is the Ruby "here document" or heredoc syntax. The addition of the - indicates the indent.
The reason why you cannot find any documentation on the <<- operator is because it isn't an operator. It's literal syntax, like ' or ".
Specifically, it's the here document syntax which is one of the many syntactic forms of string literals in Ruby. Ruby here documents are similar to POSIX sh here documents, but handling of whitespace removal is different: in POSIX sh here documents delimited by <<-, only leading tabs are removed, but they are removed from the contents of the string, whereas in Ruby all leading whitespace is removed, but only from the delimiter.
This post will tell you everything you need to know about the "heredoc" string syntax. In addition, you can view the rubydoc page for string syntax.
Is it possible in Yaml to have multi-line syntax for strings without an additional character generated between newlines?
Folded (>) syntax puts spaces, literal syntax (|) puts newlines between lines.
The summary here does not give a solution: In YAML, how do I break a string over multiple lines?.
E.g.
>-
line1_
line2
generates line1<space>line2 - I would like to have line1_line2 without additional token.
Use a double-quoted string:
"line1_\
line2"
By escaping the newline character, it is completely removed instead of being translated into a space. It is not possible to do this with block scalars because they have no escape sequences.
Hi I need to create string like this:
drawtext="fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2"
I want to do something like
str = Q%[drawtext="fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2"]
I am getting this:
=> "drawtext=\"fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2\""
The escape characters after equals sign in drawtext=" is what I want to get rid of.. How to achieve that?
The string is to be used in a command line args.
Like many languages, Ruby needs a way of delimiting a quoted quote, and the enclosing quotes.
What you're seeing is the escape character which is a way of saying literal quote instead of syntactic quote:
foo = 'test="test"'
# => "test=\"test\""
The escape character is only there because double-quotes are used by default when inspecting a string. It's stored internally as a single character, of course. You may also see these in other circumstances such as a CR+LF delimited file line:
"example_line\r\n"
The \r and \n here correspond with carriage-return and line-feed characters. There's several of these characters defined in ANSI C that have carried over into many languages including Ruby and JavaScript.
When you output a string those escape characters are not displayed:
puts foo
test="test"
I'm a bit stuck on this issue. I'm trying to make a newline using '\n'. I'm opening a file, then replacing the text, then writing it back as an html file:
replace = text.gsub(/aaa/, 'aaa\nbbb')
But this results in:
aaa\nbbb
I'm trying to make do:
aaa
bbb
In single-quoted strings a backslash is just a backslash (except if it precedes another backslash or a quote). Use double quotes: "aaa\nbbb" .
You'll want to read:Backslashes in Single quoted strings vs. Double quoted strings in Ruby?.