Regular Expression with Double Quotes - ruby

Given a string in between quotations, as such "Hello"
The following regular expression will print out a match of the string without the double quotations:
/"([^"]+)"/
I don't understand how it is capturing the characters. I believe what this should be capturing is just the initial double quote. What this regular expression is saying is find an expression that starts and ends with double quotes and again has one or more double quotes at the beginning. And it captures that one or more double quotes at the beginning. How does it end up matching the string here with [^"]+ ?

The expression [^"]+ means literally to match all characters which are not the double quote ". So when placed inside (), all characters following the first " and up to the next " are captured. This is because the ^ inside a [] character class implies negation rather than the start of a string as it would mean outside the []. So [^"] literally means anything but a ".
The () itself is the capture group, and the regex will only capture the expression which exists inside (). Depending on the programming language you use, it may also record the entire string matched "Hello" by the entire expression /"([^"]+)"/ in a separate variable, but the purpose of () is to capture its contents.
Full breakdown of the expression:
" - first literal quote
( - begin capture
[^"]+ all subsequent characters up to but not including "
) - end capture group
" - final closing quote literal

Related

How below REGEXP_REPLACE works?

I have query in my project and that is having REGEXP_REPLACE
i tried to find how it works by searching but i found it like
w+ Matches a word character (that is, an alphanumeric or underscore
(_) character).
but not able to find '"\w+\":' why these "" are used and what is mean by '{|}|"',''
UPDATE (SELECT data,data_value FROM TEMP) t
SET t.DATA_VALUE=REGEXP_REPLACE(REGEXP_REPLACE(t.data, '"\w+\":',''),'{|}|"','');
can you please tell me how it works?
This appear to be a regular expression for stripping keys and enclosing brackets from a JSON string - unfortunately, if this is the case then it does not work in all situations.
The regular expression
'"\w+\":'
will match:
A " double quotation mark;
\w+ one-or-more word (a-z or A-Z or 0-9 or _) characters;
\" another double quotation mark - note: the \ character is not necessary; then
A : colon.
So:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote"}',
'"\w+":', -- Pattern matched
'' -- Replacement string
)
Will output:
{"value","value with \"quote"}
The second pattern {|}|" will match either a {, or a } or a " character (and could have been equivalently written as [{}"]) so:
REGEXP_REPLACE(
'{"value","value with \"quote"}',
'{|}|"', -- Pattern matched
'' -- Replacement string
)
Will output:
value,value with \quote
Which is fine, until (like my example) you have an escaped double quote (or curly braces) in the value string; in which case those will also get stripped leaving the escape character.
(Note: you would not typically find this but it is possible to include escaped quotes in the key. So {"keywith\":quote":"value"} would get replaced to {quote":"value"} and then quote:value which is not the intended output.)
If parsing JSON is what you are trying to do (pre-Oracle 12) then you can use:
REGEXP_REPLACE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'^{|"(\\"|[^"])+":(")?((\\"|[^"])+?)\2((,)|})',
'\3\6'
)
Which outputs:
value,value with \"quote,value with \"{}
Or in Oracle 12 you can do:
SELECT *
FROM JSON_TABLE(
'{"key":"value","key2":"value with \"quote","keywith\":quote":"value with \"{}"}',
'$.*' NULL ON ERROR
COLUMNS (
value VARCHAR2(4000) PATH '$'
)
)
Which outputs:
VALUE
-----------------
value
value with "quote
value with "{}
example:::REGEXP_REPLACE( string, pattern [, replacement_string [, start_position [, nth_appearance [, match_parameter ] ] ] ] )
| is or(CAN MEAN MORE THAN ONE ALTERNATIVE ) , is for at least as in {n,} at least n times
https://www.techonthenet.com/oracle/functions/regexp_replace.php
"where I got my info"
'"\w+\":' why these "" are used and what is mean by '{|}|"',''
Matches a word character(\w)One or more times(+) this has to be messed up it's missing the right quantity of close parentheses by putting \" w+ \"
they allow the " to be shown. This expression takes one expression changes it then uses that as the basis for the next change. Good luck figuring the rest out. Regular expressions aren't too bad, pretty intuitive once you get the basics down.

Unable to substitute escaped characters in string

I have this string:
str = "no,\"contact_last_name\",\"token\""
=> "no,\"contact_last_name\",\"token\""
I want to remove the escaped double quoted string character \". I use gsub:
result = str.gsub('\\"','')
=> "no,\"contact_last_name\",\"token\""
It appears that the string has not substituted the double quote escape characters in the string.
Why am I trying to do this? I have this csv file:
no,"contact_last_name","token",company,urbanization,sec-"property_address","property_address",city-state-zip,ase,oel,presorttrayid,presortdate,imbno,encodedimbno,fca,"property_city","property_state","property_zip"
1,MARIE A JEANTY,1083123,,,,17 SW 6TH AVE,DANIA BEACH FL 33004-3260,Electronic Service Requested,,T00215,12/14/2016,00-314-901373799-105112-33004-3260-17,TATTTADTATTDDDTTFDDFATFTDDDTTFADTTDFAAADDATDAATTFDTDFTTAFFTTATFFF,017,DANIA BEACH,FL, 33004-3260
When I try to open it with CSV, I get the following error:
CSV.foreach(path, headers: true) do |row|
end
CSV::MalformedCSVError: Illegal quoting in line 1.
Once I removed those double quoted strings in the first row (the header), the error went away. So I am trying to remove those double quoted strings before I run it through CSV:
file = File.open "file.csv"
contents = file.read
"no,\"contact_last_name\",\"token\" ... "
contents.gsub!('\\"','')
So again my question is why is gsub not removing the specified characters? Note that this actuall does work:
contents.gsub /"/, ""
as if the string is ignoring the \ character.
There is no escaped double quote in this string:
"no,\"contact_last_name\",\"token\""
The interpreter recognizes the text above as a string because it is enclosed in double quotes. And because of the same reason, the double quotes embedded in the string must be escaped; otherwise they signal the end of the string.
The enclosing double quote characters are part of the language, not part of the string. The use of backslash (\) as an escape character is also the language's way to put inside a string characters that otherwise have special meaning (double quotes f.e.).
The actual string stored in the str variable is:
no,"contact_last_name","token"
You can check this for yourself if you tell the interpreter to put the string on screen (puts str).
To answer the issue from the question's title, all your efforts to substitute escaped characters string were in vain just because the string doesn't contain the character sequences you tried to find and replace.
And the actual problem is that the CSV file is malformed. The 6th value on the first row (sec-"property_address") doesn't follow the format of a correctly encoded CSV file.
It should read either sec-property_address or "sec-property_address"; i.e. the value should be either not enclosed in quotes at all or completely enclosed in quotes. Having it partially enclosed in quotes confuses the Ruby's CSV parser.
The string looks fine; You're not understanding what you're seeing. Meditate on this:
"no,\"contact_last_name\",\"token\"" # => "no,\"contact_last_name\",\"token\""
'no,"contact_last_name","token"' # => "no,\"contact_last_name\",\"token\""
%q[no,"contact_last_name","token"] # => "no,\"contact_last_name\",\"token\""
%Q#no,"contact_last_name","token"# # => "no,\"contact_last_name\",\"token\""
When looking at a string that is delimited by double-quotes, it's necessary to escape certain characters, such as embedded double-quotes. Ruby, along with many other languages, has multiple ways of defining a string to remove that need.

Why won't my simple regex pattern match and remove a file extension?

I have a string:
app_copy--28.ipa
The result I want is:
app_copy
The number after -- could be of variable length, so I want to match everything including and after --.
I've tried a few patterns, but none are matching for some reason:
gsub("--\*", "")
gsub("--*", "")
gsub("--*.ipa", "")
gsub("--\[0-9].ipa", "")
What am I missing?
Let's take a look at your test patterns:
"--\*" is actually equivalent to "--*" (since the \* is an escape sequence).
"--*" will match a single - character, followed by zero or more - characters.
"--*.ipa" will match a single - character, followed by zero or more - characters, followed by any single character, followed by a literal ipa.
"--\[0-9].ipa" is actually equivalent to "--[0-9].ipa" (since the \[ is an escape sequence), which will match a literal --, followed by a single decimal digit, followed by any single character, followed by a literal ipa.
However, none of these patterns would work as you used them because gsub will not treat it as a regular expression:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally…
You'd need to wrap type convert your pattern to a Regexp (using Regexp.new), or use a regular expression literal.
Try this pattern
--.*
This pattern will find any literal --, followed by zero or more of any character.
For example:
"app_copy--28.ipa".gsub(/--.*/, "") # app_copy
Don't use gsub to try to change the string, simply use a pattern to match the part you want:
"app_copy--28.ipa"[/^(.+?)--/, 1] # => "app_copy"
String's [] takes a lot of different types of parameters. You can pass in a pattern, and the index of the capture that you want, to extract just that part. From the documentation:
str[regexp, capture] → new_str or nil
If a Regexp is supplied, the matching portion of the string is returned. If a capture follows the regular expression, which may be a capture group index or name, follows the regular expression that component of the MatchData is returned instead.
How is this ?
str = "app_copy--28.ipa"
str[0..str.index("-")-1]
# => "app_copy"
str = "app_copy--28.ipa"
str.split("--").first
# => "app_copy"

using regular expressions in ruby to find a string in quotations

I am trying to construct a regex to find a string in ruby
str = "foo"
I want to be able to stop trying to find the string after it finds the closing quotation mark. I also want to keep the quotation marks so I can output the string I found as:
puts "the string is:" + str
=> the string is: "foo"
I am pretty new to using regular expressions.
Here is a start:
/".*?"/
Explanation:
" Match a literal double quote.
.*? Match any characters, as few as possible (non-greedy)
" Match a second literal double quote.
Rubular
Note that this won't work if the string contains escaped quotes or is quoted with single quotes.

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources