Ruby text variable with \ inside - ruby

I need to create a db record with a column with this value
query = "booster pro 12\24 v"
but this is what stored into db column.
"booster pro 12\u0014 v"
if I try with single-quoted I get
'booster pro 12\24 v' -> "booster pro 12\\24 v"
but If I use this, I got the correct string printed.
puts 'booster 12\24 v' -> booster 12\24 v

If I try with single-quoted I get
'booster pro 12\24 v' -> "booster pro 12\\24 v"
This is correct. You've already got the right answer.
A backslash is an escape character. When displaying escape characters in a double-quoted string, you must escape them with a backslash. That means a literal backslash is represented by two backslashes.
Two literal backslashes would be represented by four backslashes, in a double-quoted string.
Or for example, consider:
How would you display a newline character? Answer: "\n"
How would you display a backslash character, followed by an "n"? Answer: "\\n"
This behaviour is not specific to ruby.

Related

ruby gsub new line characters

I have a string with newline characters that I want to gsub out for white space.
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\\r\\n]/, ' ')
something like this ^ only my regex seems to be replacing the 'r' and 'n' letters as well. the other constraint is sometimes the pattern repeats itself twice and thus would be replaced with two whitespaces in a row, although this is not preferable it is better than all the text being cut apart.
If there is a way to only select the new line characters. Or even better if there a more rubiestic way of approaching this outside of going to regex?
If you have mixed consecutive line breaks that you want to replace with a single space, you may use the following regex solution:
s.gsub(/\R+/, ' ')
See the Ruby demo.
The \R matches any type of line break and + matches one or more occurrences of the quantified subpattern.
Note that in case you have to deal with an older version of Ruby, you will need to use the negated character class [\r\n] that matches either \r or \n:
.gsub(/[\r\n]+/, ' ')
or - add all possible linebreaks:
/gsub(/(?:\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029])+/, ' ')
This should work for your test case:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]/, ' ')
If you don't want successive \r\n characters to result in duplicate spaces you can use this instead:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]+/, ' ')
(Note the addition of the + after the character class.)
As Wiktor mentioned, you're using \\ in your regex, which inside the regex literal /.../ actually escapes a backslash, meaning you're matching a literal backslash \, r, or n as part of your expression. Escaping characters works differently in regex literals, since \ is used so much, it makes no sense to have a special escape for it (as opposed to regular strings, which is a whole different animal).

Is there a way to match double quotes inside two double quotes?

I tried the following regex, but it matches all the double quotes:
(?>(?<=(")|))"(?(1)(?!"))
Here is a sample of the text:
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
The pattern I want to match is the double quote between the double quotes in line 2
As a general rule, I would say: no.
Given a string:
\"Burger\" \"Decaf\" shirt\"
How do you decide which \" is superfluous (non-matching)? Is this one after Burger, one after Decaf or one after shirt? Or one before any of these words? I believe the choice is arbitrary.
Although in your particular example it seems that you want all \" that are not adjacent to comma.
These can be found by following regexp:
(?<!^)(?<![,\[])\\"(?![,\]])
We start with \\" (backslash followed by double quote) in the center.
Then we use negative lookahead to discard all matches that are followed by comma or closing square bracket.
Then we use negative lookbehind to discard all matches that happen after comma or opening bracket.
Regexp engine that I have used can't cope with alternation inside lookaround statements. To work around it, I take advantage of the fact that lookarounds are zero-length matches and I prepend negative lookbehind that matches beginning of line at the beginning of expression.
Proof (in perl):
$ cat test
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
$ perl -n -e '$_ =~ s/(?<!^)(?<![,\[])\\"(?![,\]])/|||/g; print $_' test
"[\"my cars last night\",
\"Burger\",\"Decaf||| shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
Let's assume that the format of your string must be like this:
["item1", "item2", ... "itemN"]
The way to know if a double quote is a closing double quote is to check if it is followed by a comma or a closing square bracket.
To find a double quote enclosed by double quotes, you must match all well formatted items from the beginning until an unexpected quote.
Example to find the first enclosed quote (if it exists):
(?:"[^"]*",\s*)*+"[^"]*\K"
demo
But this works only for one enclosed quote in all the string and isn't useful if you want to find all of them.
to find all quotes:
(?:\G(?!\A)|(?:\A[^"]*|[^"]*",\s*)(?:"[^"]*",\s*)*+")[^"]*\K"(?!\s*[\],])
demo

How come you can't gsub this string in Ruby?

These \\n are showing up in my strings even though it should only be \n.
But if I do this :
"\n".gsub('\\n','\b')
It returns :
"\n"
Ideally, I'm trying to find a regex that could rewrite this string :
"R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\\nZjTX\\n"
To not display two backslashes, but just one like this :
"R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\nZjTX\n"
But any of the regex I do will not work. I can gsub out the \n and put something like X there, but if I put a \ in it, then Ruby escapes it with an additional \ which consequentially destroys my encryption module as it needs to be specific.
Any ideas?
You are falling into the trap of a different meaning of escapes when used in strings with double quotes vs single quotes. Double-quoted strings allow escape characters to be used. Thus, here "\n" actually is a one-character string containing a single line feed. Compare that to '\n' which is a two-character string containing a literal backslash followed by a character n.
This explains, whey your gsub doesn't match. If you use the following code, it should work:
"\\n".gsub('\n','\b')
For your actual issue, you can use this
string = "R3pQvDqmz/EQ7zho2mhIeE6UB4dLa6GUH7173VEMdGCcdsRm5pernkqCgbnj\\nZjTX\\n"
new_string = string.gsub("\\n", "\n")

ruby regex about escape a escape

I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$

Watir magic escape sequence?

I am currently using Watir with Firefox and it seems that when I try to set a field with the following text:
##$QWER7890uiop
The command I am using is the following:
text_field(:name, "password").value=("!##$QWER7890uiop)
I've also tried this:
text_field(:name, "password").set "!##$QWER7890uiop)
Only the first 2 characters get entered. Is there something I can do to by pass this feature?
You need to escape the string using single quotes '.
text_field(:name, "password").value='"!##$QWER7890uiop'
Many characters are substituted inside double quotes.
Escape sequences like \n, \t, \s, etc are replaced by their equivalent character(s). See here for full list.
#{} where anything the braces is interpreted as a ruby expression.
#$something where $something is interpreted as a ruby global variable. That's the problem with your quote above, beside not being terminated.
%s is interpreted as an ERB template expression (it is interpolated).
For instance:
puts "%s hours later" % 'Five'
results in
"Five hours later".

Resources