Ruby regex section to match multiline - ruby

So this is my code
convert = contents.gsub(/\\s1(.*?)(\n\\r.*?)?\n((?s)\\ms3(.*?)\\p)/, 'replacement code')
in the first bit: \\s1(.*?)(\n\\r.*?)?\ni only want it to match a newline when i tell it there's one there. But when searching for \\ms3(.*?)\\p i want it to pick up any newlines that are there. Unfortunately it looks like Ruby doesn't support this (?s)prefix. Is there any way of doing this?
thanks

(.*?)==>([\s\S]*?)
You can use this instead of DOTALL modifier.

convert = contents.gsub(/\\s1(.*?)(\n\\r.*?)?\n((\n*)\\ms3(.*?)\\p)/, 'replacement code')
This will capture any(0+) newlines before "\ms3". If it's not what you meant, please, clarify what functionality do you expect from (?s)?

Related

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Changing "word" to "Word" using a RegEx like [A-Z]([a-z]*)\b

The title sums up my conundrum pretty well. I've been searching around the net for a while, and being new to Ruby and Regular Expressions as a whole, I'm stuck trying to figure out how to alter the case of a single word string using a RegEx "filter" such as [A-Z]([a-z]*)\b.
Basically I want the flow to be
input: woRD
filter: [A-Z]([a-z]*)\b
output: Word
I already have the words filtered into a list, so I don't need to match words; I only need to filter the case of the word using a RegEx filter.
I do not want to use standard capitalization methods, I want this to be done using Regular Expressions.
You can use
"woRD".downcase.capitalize
Ruby provides some predefined methods for these type of functionality. Try to use them instead of regex. which saves coding time!
Well, for some reason you want to use regexps. Here you go:
# prepare hashes for gsub
to_down = (to_upper = Hash[('a'..'z').zip('A'..'Z')]).invert
# convert to downcase
downcased = 'woRD'.gsub(/[A-Z]/, to_down)
# ⇛ 'word'
titlecased = downcased.gsub(/^\w/, to_upper)
# ⇒ 'Word'
Hope it helps. Note the usage of String#gsub(re, hash) method.
You can't use Regex to such altering as you want to do.
Please read carefully this topic: How to change case of letters in string using regex in Ruby.
The best way to solve your problem is to use:
"woRD".downcase.capitalize
or
name_of_your_variable.downcase!.capitalize!
if you want to alter string in your variable permanently without need of assign it to other variable.

Ruby how to remove a more than one space character?

Ruby
Okay, I want to remove a more than one space character in a strings if there's any. What I mean is, let's say I have a text like this:
I want to learn ruby more and more.
See there's a more than one space character after "to" and before "learn" either it a tab or just a several spaces. Now what I want is, how can I know if there's something like this in a text file, and I want to make it just one space per word or string. So it will become like this
I want to learn ruby more and more.
Can I use Gsub? or do I need to use other method? I tried Gsub, but can't figure out how to implement it the right way so it can produce the result I want. Hopefully I explained it clear. Any help is appreciated, thanks.
String#squeeze remove runs of more than one character:
'I want to learn ruby more and more.'.squeeze(' ')
# => "I want to learn ruby more and more."
You can use gsub to replace one or more whitespace (regex / +/) to a single whitespace:
'I want to learn ruby more and more.'.gsub(/ +/, " ")
#=> "I want to learn ruby more and more."
Use this regex to remove all whitespace from a string, including spaces and also tabs. I use this for stripping whitespace from email addresses on login fields.
' I want to learn ruby more and more.'.gsub(/\s/,"")
# => "Iwanttolearnrubymoreandmore."
The /\s/ matches any whitespace character including tabs, whereas / +/ won't.

How can I simplify this regular expression?

The format I'm trying to match is:
# (Apple push notification codes)
"11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"
The simplest expression I can think of is: /((\w{8}\s){7}\w{8})/i
Can anyone think of a simpler one?
(I'm using Ruby regular expressions)
UPDATE - thanks to user1096188, I've removed \d - this is included in \w
You can detect a word boundary using \b, and use (?: to prevent capturing groups
/(?:\w{8}\b\s?){8}/
You could do this if the end of the match is the end of the whole string.
(\w{8}(:?\s|$)){7}
Taking #zapthedingbat's solution one stage further, it looks like the code only contains hexadecimal characters (0-9 and a-f) and spaces. So you could possibly sacrifice a little simplicity for accuracy.
I'm making an assumption, but I suspect letters g to z are invalid.
If the format is hexadecimal only (you should check Apple's documentation to be sure), a tighter match would be:
/(?:[0-9a-f]{8}\b\s?){8}/
EDIT
In fact, in Ruby, it looks like you should be able to do:
/(?:\h{8}\b\s?){8}/
> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7".match(/((\w{8}\s)+)/)
> $&
=> "11a735e9 9f696c2f 700b2700 728042c6 137eeb7a 8442c27d 40e59d9e 3c7e0de7"

regex to match trailing whitespace, but not lines which are entirely whitespace (indent placeholders)

I've been trying to construct a ruby regex which matches trailing spaces - but not indentation placeholders - so I can gsub them out.
I had this /\b[\t ]+$/ and it was working a treat until I realised it only works when the line ends are [a-zA-Z]. :-( So I evolved it into this /(?!^[\t ]+)[\t ]+$/ and it seems like it's getting better, but it still doesn't work properly. I've spent hours trying to get this to work to no avail. Please help.
Here's some text test so it's easy to throw into Rubular, but the indent lines are getting stripped so it'll need a few spaces and/or tabs. Once lines 3 & 4 have spaces back in, it shouldn't match on lines 3-5, 7, 9.
some test test
some test test
some other test (text)
some other test (text)
likely here{ dfdf }
likely here{ dfdf }
and this ;
and this ;
Alternatively, is there an simpler / more elegant way to do this?
If you're using 1.9, you can use look-behind:
/(?<=\S)[\t ]+$/
but unfortunately, it's not supported in older versions of ruby, so you'll have to handle the captured character:
str.gsub(/(\S)[\t ]+$/) { $1 }
Your first expression is close, and you just need to change the \b to a negated character class. This should work better:
/([^\t ])[\t ]+$
In plain words, this matches all tabs and spaces on lines that follow a character that is not a tab or a space.
Wouldn't this help?
/([^\t ])([\t ]+)$/
You need to do something with the matched last non-space character, though.
edit: oh, you meant non blank lines. Then you would need something like /([^\s])\s+/ and sub them with the first part
I'm not entirely sure what you are asking for, but wouldn't something like this work if you just want to capture the trailing whitespaces?
([\s]+)$
or if you only wanted to capture tabs
([ \t]+)$
Since regexes are greedy, they'll capture as much as they can. You don't really need to give them context beforehand if you know what you want to capture.
I still am not sure what you mean by trailing indentation placeholders, so I'm sorry if I'm misunderstanding.
perhaps this...
[\t|\s]+?$
or
[ ]+$

Resources