Using regex to replace all spaces NOT in quotes in Ruby - ruby

I'm trying to write a regex to replace all spaces that are not included in quotes so something like this:
a = 4, b = 2, c = "space here"
would return this:
a=4,b=2,c="space here"
I spent some time searching this site and I found a similar q/a ( Split a string by spaces -- preserving quoted substrings -- in Python ) that would replace all the spaces inside quotes with a token that could be re-substituted in after wiping all the other spaces...but I was hoping there was a cleaner way of doing it.

It's worth noting that any regular expression solution will fail in cases like the following:
a = 4, b = 2, c = "space" here"
While it is true that you could construct a regexp to handle the three-quote case specifically, you cannot solve the problem in the general sense. This is a mathematically provable limitation of simple DFAs, of which regexps are a direct representation. To perform any serious brace/quote matching, you will need the more powerful pushdown automaton, usually in the form of a text parser library (ANTLR, Bison, Parsec).
With that said, it sounds like regular expressions should be sufficient for your needs. Just be aware of the limitations.

This seems to work:
result = string.gsub(/( |(".*?"))/, "\\2")

I consider this very clean:
mystring.scan(/((".*?")|([^ ]))/).map { |x| x[0] }.join
I doubt gsub could do any better (assuming you want a pure regex approach).

try this one, string in single/double quoter is also matched (so you need to filter them, if you only need space):
/( |("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/

Related

Methods to concatenate strings on separate lines

This produces newlines:
%(https://api.foursquare.com/v2/venues/search
?ll=80.914207,%2030.328466&radius=200
&v=20161201&m=foursquare&categoryId=4d4b7105d754a06374d81259
&intent=browse)
This produces spaces:
"https://api.foursquare.com/v2/venues/search
?ll=80.914207,%2030.328466&radius=200
&v=20161201&m=foursquare&categoryId=4d4b7105d754a06374d81259
&intent=browse"
This produces one string:
"https://api.foursquare.com/v2/venues/search"\
"?ll=80.914207,%2030.328466&radius=200"\
"&v=20161201&m=foursquare&categoryId=4d4b7105d754a06374d81259"\
"&intent=browse"
When I want to separate one string on multiple lines to read it better on screen, is it preferred to use the escape character?
My IDE complains that I should use single quoted strings rather than double quoted strings since there is no interpolation.
Normally you'd put something like this on one line, readability be damned, because the alternatives are going to be problematic. There's no way of declaring a string with whitespace ignored, but you can do this:
url = %w[ https://api.foursquare.com/v2/venues/search
?ll=80.914207,%2030.328466&radius=200
&v=20161201&m=foursquare&categoryId=4d4b7105d754a06374d81259
&intent=browse
].join
Where you explicitly remove the whitespace.
I'd actually suggest avoiding this whole mess by properly composing this URI:
uri = url("https://api.foursquare.com/v2/venues/search",
ll: [ 80.914207,30.328466 ],
radius: 200,
v: 20161201,
m: 'foursquare',
categoryId: '4d4b7105d754a06374d81259',
intent: 'browse'
)
Where you have some kind of helper function that properly encodes that using URI or other tools. By keeping your parameters as data, not as encoded strings, for as long as possible you make it easier to spot bugs as well as make last-second changes to them.
The answer by #tadman definitely suggests the proper way to do it; I’ll post another approach just for the sake of diversity:
query = "https://api.foursquare.com/v2/venues/search"
"?ll=80.914207,%2030.328466&radius=200"
"&v=20161201&m=foursquare&categoryId=4d4b7105d754a06374d81259"
"&intent=browse"
Yes, without any visible concatenation, 4 strings in quotes one by one in a row. This example won’t work in irb/pry (due to it’s REPL nature,) but the above is the most efficient way to concatenate strings in ruby without producing any intermediate result.
Contrived example to test in pry/irb:
value = "a" "b" "c" "d"

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Ruby Regular Expression lookahead to Split at pipe unless contained in brackets

I'm trying to decode the following string:
body = '{type:paragaph|class:red|content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]}'
body << '{type:image|class:grid|content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]}'
I need the string to split at the pipes but not where a pipe is contained with square brackets, to do this I think I need to perform a lookahead as described here: How to split string by ',' unless ',' is within brackets using Regex?
My attempt(still splits at every pipe):
x = self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/ *\|(?!\]) */)}
->
[
["type:paragaph", "class:red", "content:[class:intro", "body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1", "title:image1][id:2", "title:image2][id:3", "title:image3]"]
]
Expecting:
->
[
["type:paragaph", "class:red", "content:[class:intro|body:This is the introduction paragraph.][body:This is the second paragraph.]"]
["type:image", "class:grid", "content:[id:1|title:image1][id:2|title:image2][id:3|title:image3]"]
]
Does anyone know the regex required here?
Is it possible to match this regex? I can't seem to modify it correctly Regular Expression to match underscores not surrounded by brackets?
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. Though I'm sure if there's any shortfalls.
Dealing with nested structures that have identical syntax is going to make things difficult for you.
You could try a recursive descent parser (a quick Google turned up https://github.com/Ragmaanir/grammy - not sure if any good)
Personally, I'd go for something really hacky - some gsubs that convert your string into JSON, then parse with a JSON parser :-). That's not particularly easy either, though, but here goes:
require 'json'
b1 = body.gsub(/([^\[\|\]\:\}\{]+)/,'"\1"').gsub(':[',':[{').gsub('][','},{').gsub(']','}]').gsub('}{','},{').gsub('|',',')
JSON.parse('[' + b1 + ']')
It wasn't easy because the string format apparently uses [foo:bar][baz:bam] to represent an array of hashes. If you have a chance to modify the serialised format to make it easier, I would take it.
I modified the answer here Split string in Ruby, ignoring contents of parentheses? to get:
self.body.scan(/\{(.*?)\}/).map {|m| m[0].split(/\|\s*(?=[^\[\]]*(?:\[|$))/)}
Seems to do the trick. If it has any shortfalls please suggest something better.

Regex can this be achieved

I'm too ambitious or is there a way do this
to add a string if not present ?
and
remove a the same string if present?
Do all of this using Regex and avoid the if else statement
Here an example
I have string
"admin,artist,location_manager,event_manager"
so can the substring location_manager be added or removed with regards to above conditions
basically I'm looking to avoid the if else statement and do all of this plainly in regex
"admin,artist,location_manager,event_manager".test(/some_regex/)
The some_regex will remove location_manager from the string if present else it will add it
Am I over over ambitions
You will need to use some sort of logic.
str += ',location_manager' unless str.gsub!(/location_manager,/,'')
I'm assuming that if it's not present you append it to the end of the string
Regex will not actually add or remove anything in any language that I am aware of. It is simply used to match. You must use some other language construct (a regex based replacement function for example) to achieve this functionality. It would probably help to mention your specific language so as to get help from those users.
Here's one kinda off-the-wall solution. It doesn't use regexes, but it also doesn't use any if/else statements either. It's more academic than production-worthy.
Assumptions: Your string is a comma-separated list of titles, and that these are a unique set (no duplicates), and that order doesn't matter:
titles = Set.new(str.split(','))
#=> #<Set: {"admin", "artist", "location_manager", "event_manager"}>
titles_to_toggle = ["location_manager"]
#=> ["location_manager"]
titles ^= titles_to_toggle
#=> #<Set: {"admin", "artist", "event_manager"}>
titles ^= titles_to_toggle
#=> #<Set: {"location_manager", "admin", "artist", "event_manager"}>
titles.to_a.join(",")
#=> "location_manager,admin,artist,event_manager"
All this assumes that you're using a string as a kind of set. If so, you should probably just use a set. If not, and you actually need string-manipulation functions to operate on it, there's probably no way around except for using if-else, or a variant, such as the ternary operator, or unless, or Bergi's answer
Also worth noting regarding regex as a solution: Make sure you consider the edge cases. If 'location_manager' is in the middle of the string, will you remove the extraneous comma? Will you handle removing commas correctly if it's at the beginning or the end of the string? Will you correctly add commas when it's added? For these reasons treating a set as a set or array instead of a string makes more sense.
No. Regex can only match/test whether "a string" is present (or not). Then, the function you've used can do something based on that result, for example replace can remove a match.
Yet, you want to do two actions (each can be done with regex), remove if present and add if not. You can't execute them sequentially, because they overlap - you need to execute either the one or the other. This is where if-else structures (or ternary operators) come into play, and they are required if there is no library/native function that contains them to do exactly this job. I doubt there is one in Ruby.
If you want to avoid the if-else-statement (for one-liners or expressions), you can use the ternary operator. Or, you can use a labda expression returning the correct value:
# kind of pseudo code
string.replace(/location,?|$/, function($0) return $0 ? "" : ",location" )
This matches the string "location" (with optional comma) or the string end, and replaces that with nothing if a match was found or the string ",location" otherwise. I'm sure you can adapt this to Ruby.
to remove something matching a pattern is really easy:
(admin,?|artist,?|location_manager,?|event_manager,?)
then choose the string to replace the match -in your case an empty string- and pass everything to the replace method.
The other operation you suggested was more difficult to achieve with regex only. Maybe someone knows a better answer

Ruby gsub : is there a better way

I need to remove all leading and trailing non-numeric characters. This is what I came up with. Is there a better implementation.
puts s.gsub(/^\D+/,'').gsub(/\D+$/,'')
Instead of eliminating what you don't want, it's often clearer to select what you do want (using parentheses). Also, this only requires one regex evaluation:
s.match(/^\D*(.*?)\D*$/)[1]
Or, this convenient shorthand:
s[/^\D*(.*?)\D*$/, 1]
Perhaps a single #gsub(/(^\D+)|(\D+$)/, '')
Also, when in doubt Rubular it.

Resources