split string by spaces properly accounting for quotes and backslashes (ruby) - ruby

I want to split a string (insecure foreign line, like exim_mainlog line) by spaces, but not by spaces that are inside of double quotes, and ignore if the quote is escaped by a backslash like \", and ignore the backslash if it is just escaped like \\. Without slow parsing the string manually with FSM.
Example line:
U=mailnull T="test \"quote\" and wild blackslash\\" P=esmtps
Should be split into:
["U=mailnull", "T=\"test \\\"quote\\\" and wild blackslash\\\"", "P=esmtps"]
(Btw, I think ruby should had method for such split.., sigh).

I think I found simple enough solution: input.scan(/(?:"(?:\\.|[^"])*"|[^" ])+/)

Related

How come you can't gsub this string in Ruby?

These \\n are showing up in my strings even though it should only be \n.
But if I do this :
"\n".gsub('\\n','\b')
It returns :
"\n"
Ideally, I'm trying to find a regex that could rewrite this string :
"R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\\nZjTX\\n"
To not display two backslashes, but just one like this :
"R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\nZjTX\n"
But any of the regex I do will not work. I can gsub out the \n and put something like X there, but if I put a \ in it, then Ruby escapes it with an additional \ which consequentially destroys my encryption module as it needs to be specific.
Any ideas?
You are falling into the trap of a different meaning of escapes when used in strings with double quotes vs single quotes. Double-quoted strings allow escape characters to be used. Thus, here "\n" actually is a one-character string containing a single line feed. Compare that to '\n' which is a two-character string containing a literal backslash followed by a character n.
This explains, whey your gsub doesn't match. If you use the following code, it should work:
"\\n".gsub('\n','\b')
For your actual issue, you can use this
string = "R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\\nZjTX\\n"
new_string = string.gsub("\\n", "\n")

ruby .split('\n') not splitting on new line

Why does this string not split on each "\n"? (RUBY)
"ADVERTISING [7310]\n\t\tIRS NUMBER:\t\t\t\t061340408\n\t\tSTATE OF INCORPORATION:\t\t\tDE\n\t\tFISCAL YEAR END:\t\t\t0331\n\n\tFILING VALUES:\n\t\tFORM TYPE:\t\t10-Q\n\t\tSEC ACT:\t\t1934 Act\n\t".split('\n')
>> ["ADVERTISING [7310]\n\t\tIRS NUMBER:\t\t\t\t061340408\n\t\tSTATE OF INCORPORATION:\t\t\tDE\n\t\tFISCAL YEAR END:\t\t\t0331\n\n\tFILING VALUES:\n\t\tFORM TYPE:\t\t10-Q\n\t\tSEC ACT:\t\t1934 Act\n\t"]
You need .split("\n"). String interpolation is needed to properly interpret the new line, and double quotes are one way to do that.
In Ruby single quotes around a string means that escape characters are not interpreted. Unlike in C, where single quotes denote a single character. In this case '\n' is actually equivalent to "\\n".
So if you want to split on \n you need to change your code to use double quotes.
.split("\n")
Ruby has the methods String#each_line and String#lines
returns an enum:
http://www.ruby-doc.org/core-1.9.3/String.html#method-i-each_line
returns an array:
http://www.ruby-doc.org/core-2.1.2/String.html#method-i-lines
I didn't test it against your scenario but I bet it will work better than manually choosing the newline chars.
Or a regular expression
.split(/\n/)
You can't use single quotes for this:
"ADVERTISING [7310]\n\t\tIRS NUMBER:\t\t\t\t061340408\n\t\tSTATE OF INCORPORATION:\t\t\tDE\n\t\tFISCAL YEAR END:\t\t\t0331\n\n\tFILING VALUES:\n\t\tFORM TYPE:\t\t10-Q\n\t\tSEC ACT:\t\t1934 Act\n\t".split("\n")

Ruby string with quotes for shell command args?

Hi I need to create string like this:
drawtext="fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2"
I want to do something like
str = Q%[drawtext="fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2"]
I am getting this:
=> "drawtext=\"fontfile=/Users/stpn/Documents/Video_Experiments/fonts/Trebuchet_MS.ttf:text='content':fontsize=100:fontcolor=red:y=h/2\""
The escape characters after equals sign in drawtext=" is what I want to get rid of.. How to achieve that?
The string is to be used in a command line args.
Like many languages, Ruby needs a way of delimiting a quoted quote, and the enclosing quotes.
What you're seeing is the escape character which is a way of saying literal quote instead of syntactic quote:
foo = 'test="test"'
# => "test=\"test\""
The escape character is only there because double-quotes are used by default when inspecting a string. It's stored internally as a single character, of course. You may also see these in other circumstances such as a CR+LF delimited file line:
"example_line\r\n"
The \r and \n here correspond with carriage-return and line-feed characters. There's several of these characters defined in ANSI C that have carried over into many languages including Ruby and JavaScript.
When you output a string those escape characters are not displayed:
puts foo
test="test"

ruby regex about escape a escape

I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$

Ruby - Making a newline within usage of gsub

I'm a bit stuck on this issue. I'm trying to make a newline using '\n'. I'm opening a file, then replacing the text, then writing it back as an html file:
replace = text.gsub(/aaa/, 'aaa\nbbb')
But this results in:
aaa\nbbb
I'm trying to make do:
aaa
bbb
In single-quoted strings a backslash is just a backslash (except if it precedes another backslash or a quote). Use double quotes: "aaa\nbbb" .
You'll want to read:Backslashes in Single quoted strings vs. Double quoted strings in Ruby?.

Resources