Regex test if value is valid format - ruby

I have a task where I need to check if a value is properly quoted CSV column:
cases:
no quotation - OK
"with quotation" - OK
"opening quote - Not Good
improper"quote" - Not Good
closing quote" - Not Good
CSV flags an error like below:
Illegal quoting in line 5. (CSV::MalformedCSVError)
Question: How would I get to have this working using a single regex? I need to flag error for cases 3-5.
And if you have any idea what should be checked if a CSV value is valid or not, please tell so.
EDIT: I have added 2 scenarios/cases below:
"quote "inside quotes" - Not Good
"quotes ""inside quotes" - Not Good
EDIT: added 1 more case:
"" - OK

Without considering escaped quotes :
/^("[^"]*"|[^"]+)$/m
See it here.
It means :
beginning of line
1 quote + anything except quote + 1 quote, or
anything except quote (at least one character)
end of the line

^"{1}.+"{1}$|^[^"]*$
This matches all lines either starting and ending with one quotation mark, or lines not including quotation marks at all.
demo

Related

YAML syntax sed in Gitlab-CI

I've made a mistake in the file below, but I cannot see where my mistake is. I have this command in my .gitlab-ci.yml configuration file.
- sed "s/use_scm_version=True/use_scm_version={'write_to': '..\/version.txt', 'root': '..'},\/"setup.py
It seems that the ":" are interpreted as a semicolon even if I surround the entire sed between double quotes.
(<unknown>): did not find expected key while parsing a block mapping at line 109 column 11
Any ideas ?
Since your double quotes are not at the beginning of the scalar node, they don't have special meaning in YAML and the colon is seen as the normal value indicator (and both the key and value have an embedded double quote).
I recommend you quote the whole scalar:
- "sed s/use_scm_version=True/use_scm_version={'write_to': '..\/version.txt', 'root': '..'},\/setup.py"
And optionally add \" (backslash escaped double quotes) as necessary within there if that doesn't work.

Regex - How can I remove specific characters between strings/delimiters?

This is related to cleaning files before parsing them elsewhere, namely, malformed/ugly CSV. I see plenty of examples for removing/matching all characters between certain strings/characters/delimiters, but I cannot find any for specific strings. Example portion of line would look something like:
","Should now be allowed by rule above "Server - Access" added by Rich"\r
To be clear, this is not the entire line, but the entire line is enclosed in quotes and separated by "," and ends in ^M (Windows newline/carriage return).The 'columns' preceding this would be enclosed at each side by ",". I would probably use this too to remove cruft that appears earlier in the line.
What I am trying to get to is the removal of all double quotes between "," and "\r ("Server - Access" - these ones) without removing the delimiters. Alternatively, I may just find and replace them with \" to delimit them for the Ruby CSV library. So far I have this:
(?<=",").*?(?="\\r)
Which basically matches everything between the delimiters. If I replace .*? with anything, be that a letter, double quotes etc, I get zero matches. What am I doing wrong?
Note: This should be Ruby compatible please.
If I understand you correctly, you can use negative lookahead and lookbehind:
text = '","Should now be allowed by rule above "Server - Access" added by Rich"\r'
puts text.gsub(/(?<!,)"(?![,\\r])/, '\"')
# ","Should now be allowed by rule above \"Server - Access\" added by Rich"\r
Of course, this won't work if the values themselves can contain comas and new lines...

Regexp in Ruby does not give the expected result

I have some strings like 2015 - THIS Test and 2015 - THAT Test.
I want to have the part THIS Test or THAT Test so I tried this:
"2015 - THIS Test"[/((THIS|THAT)\s\.*)/]
But that only gives me THIS or THAT.
Why does it cut the rest?
How to get the desired substring correctly?
I don't want to rely on just cutting the first 7 characters.
You escaped the dot and it lost the meaning of any character but a newline and started to denote a literal . symbol. \.* matches zero or more literal dots.
Remove the \:
puts "2015 - THIS Test"[/((THIS|THAT)\s.*)/]
puts "2015 - THAT Test"[/((THIS|THAT)\s.*)/]
Result (see demo):
THIS Test
THAT Test

ruby regex about escape a escape

I am trying to write a regex in Ruby to test a string such as:
"GET \"anything/here.txt\""
the point is, everything can be in the outer double quote, but all double quotes in the outer double quotes must be escaped by back slash(otherwise it doesnt match). So for example
"GET "anything/here.txt""
this will not be a proper line.
I tried many ways to write the regex but doest work. can anyone help me with this? thank you
You can use positive lookbehind:
/\A"((?<=\\)"|[^"])*"\z/
This does exactly what you asked for: "if a double quote appears inside the outer double quotes without a backslash prefixed, it doesn't match."
Some comments:
\A,\z: These match only at the beginning and end of the string. So the pattern has to match against the whole string, not a part of it.
(?<=): This is the syntax for positive lookbehind; it asserts that a pattern must match directly before the current position. So (?<=\\)" matches "a double quote which is preceded by a backslash".
[^"]: This matches "any character which is not a backslash".
One point about this regex, is that it will match an inner double quote which is preceded by two backslashes. If that is a problem, post a comment and I'll fix it.
If your version of Ruby doesn't have lookbehind, you could do something like:
/\A"(\\.|[^"\\])*"\z/
Note that unlike the first regexp, this one does not count a double backslash as escaping a quote (rather, the first backslash escapes the second one), so "\\"" will not match.
This works:
/"(?<method>[A-Z]*)\s*\\\"(?<file>[^\\"]*)\\""/
See it on Rubular.
Edit:
"(?<method>[A-Z]*)\s(?<content>(\\\"|[a-z\/\.]*)*)"
See it here.
Edit 2: without (? ...) sequence (for Ruby 1.8.6):
"([A-Z]*)\s((\\\"|[a-z\/\.]*)*)"
Rubular here.
Tested this on Rubular successfully:
\"GET \\\".*\\\"\"
Breakdown:
\" - Escape the " for the regex string, meaning the literal character "
GET - Assuming you just want GET than this is explicit
\\" - Escape \ and " to get the literal string \"
.* - 0 or more of any character other than \n
\\"\" - Escapes for the literal \""
I'm not sure a regex is really your best tool here, but if you insist on using one, I recommend thinking of the string as a sequence of tokens: a quote, then a series of things that are either \\, \" or anything that isn't a quote, then a closing quote at the end. So this:
^"(\\\\|\\"|[^"])*"$

count quotes in a string that do not have a backslash before them

Hey I'm trying to use a regex to count the number of quotes in a string that are not preceded by a backslash..
for example the following string:
"\"Some text
"\"Some \"text
The code I have was previously using String#count('"')
obviously this is not good enough
When I count the quotes on both these examples I need the result only to be 1
I have been searching here for similar questions and ive tried using lookbehinds but cannot get them to work in ruby.
I have tried the following regexs on Rubular from this previous question
/[^\\]"/
^"((?<!\\)[^"]+)"
^"([^"]|(?<!\)\\")"
None of them give me the results im after
Maybe a regex is not the way to do that. Maybe a programatic approach is the solution
How about string.count('"') - string.count("\\"")?
result = subject.scan(
/(?: # match either
^ # start-of-string\/line
| # or
\G # the position where the previous match ended
| # or
[^\\] # one non-backslash character
) # then
(\\\\)* # match an even number of backslashes (0 is even, too)
" # match a quote/x)
gives you an array of all quote characters (possibly with a preceding non-quote character) except unescaped ones.
The \G anchor is needed to match successive quotes, and the (\\\\)* makes sure that backslashes are only counted as escaping characters if they occur in odd numbers before the quote (to take Amarghosh's correct caveat into account).

Resources