I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$
Related
I have a string that looks something like '\\u001E\\u001Csome_text\u001F' - where the first two characters are escaped with two backslashes and the last one only has one backslash.
I want to convert that string so that all the unicode literals have two backslashes in them, so the output would look like '\\u001E\\u001Csome_text\\u001F'. What ways could I go about doing this?
If the goal is to find instances of '\u' not preceded by a '\' character, then a negative lookbehind should suit the need to match:
(?<!\\)\\u
If you want to match the whole character, then you can also use a positive lookahead to verify the following characters:
(?<!\\)\\u(?=[0-9A-Z]{4})
Both of these will return the instances of '\u', which you can replace with '\\u'
I tried the following regex, but it matches all the double quotes:
(?>(?<=(")|))"(?(1)(?!"))
Here is a sample of the text:
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
The pattern I want to match is the double quote between the double quotes in line 2
As a general rule, I would say: no.
Given a string:
\"Burger\" \"Decaf\" shirt\"
How do you decide which \" is superfluous (non-matching)? Is this one after Burger, one after Decaf or one after shirt? Or one before any of these words? I believe the choice is arbitrary.
Although in your particular example it seems that you want all \" that are not adjacent to comma.
These can be found by following regexp:
(?<!^)(?<![,\[])\\"(?![,\]])
We start with \\" (backslash followed by double quote) in the center.
Then we use negative lookahead to discard all matches that are followed by comma or closing square bracket.
Then we use negative lookbehind to discard all matches that happen after comma or opening bracket.
Regexp engine that I have used can't cope with alternation inside lookaround statements. To work around it, I take advantage of the fact that lookarounds are zero-length matches and I prepend negative lookbehind that matches beginning of line at the beginning of expression.
Proof (in perl):
$ cat test
"[\"my cars last night\",
\"Burger\",\"Decaf\" shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
$ perl -n -e '$_ =~ s/(?<!^)(?<![,\[])\\"(?![,\]])/|||/g; print $_' test
"[\"my cars last night\",
\"Burger\",\"Decaf||| shirt\",
\"Mocha\",\"marshmallows\",
\"Coffee Mission\"]"
Let's assume that the format of your string must be like this:
["item1", "item2", ... "itemN"]
The way to know if a double quote is a closing double quote is to check if it is followed by a comma or a closing square bracket.
To find a double quote enclosed by double quotes, you must match all well formatted items from the beginning until an unexpected quote.
Example to find the first enclosed quote (if it exists):
(?:"[^"]*",\s*)*+"[^"]*\K"
demo
But this works only for one enclosed quote in all the string and isn't useful if you want to find all of them.
to find all quotes:
(?:\G(?!\A)|(?:\A[^"]*|[^"]*",\s*)(?:"[^"]*",\s*)*+")[^"]*\K"(?!\s*[\],])
demo
I want to create a regular expression such as:
/\\\s*\\\s*$/
I am trying it in this way:
Regexp.new('\\\s*\\\s*$') # => /\\s*\\s*$/
What am I doing wrong?
Well (\\) matches a single backslash. Backslash serves as an escape character for Regexp.
rgx = Regexp.new('\\\\\\s*\\\\\\s*$')
A more verbose way of doing this would be the following as #Cary Swoveland stated.
rgx = Regexp.new('\\{3}s*\\{3}s*$')
Using literal notation avoids some confusion. This compiles to what you said you want:
/\\\s*\\\s*$/
Though, to be clear, this still matches a single backslash, optional whitespace a single backslash and more optional whitespace. Backslashes are escaped when you inspect a regexp.
I want to split a string (insecure foreign line, like exim_mainlog line) by spaces, but not by spaces that are inside of double quotes, and ignore if the quote is escaped by a backslash like \", and ignore the backslash if it is just escaped like \\. Without slow parsing the string manually with FSM.
Example line:
U=mailnull T="test \"quote\" and wild blackslash\\" P=esmtps
Should be split into:
["U=mailnull", "T=\"test \\\"quote\\\" and wild blackslash\\\"", "P=esmtps"]
(Btw, I think ruby should had method for such split.., sigh).
I think I found simple enough solution: input.scan(/(?:"(?:\\.|[^"])*"|[^" ])+/)
I'm a bit stuck on this issue. I'm trying to make a newline using '\n'. I'm opening a file, then replacing the text, then writing it back as an html file:
replace = text.gsub(/aaa/, 'aaa\nbbb')
But this results in:
aaa\nbbb
I'm trying to make do:
aaa
bbb
In single-quoted strings a backslash is just a backslash (except if it precedes another backslash or a quote). Use double quotes: "aaa\nbbb" .
You'll want to read:Backslashes in Single quoted strings vs. Double quoted strings in Ruby?.