Regex for extracting text after a specific word between quotes - ruby

I have some strings that have a format like this:
"text that comes before\"start\":\"Desired Info\"text that comes after"
I'd like to extract only "Desired Info". It will always be preceded by "\"start\":" and this will only appear once in the string. What regex can I use to do this?

This shall work:
s = "text that comes before\"start\":\"Desired Info\"text that comes after"
s[/(?<="start":")[^"]*(?=")/]
# => "Desired Info"

Here: is the regular expression:
"start":"(.*)"
In code:
match = /"start":(.*)"/.match("text that comes before\"start\":\"Desired Info\"text that comes after");
if match
print match[1]
end

Simplest is:
s[/"start":"(.*?)"/, 1]
#=> "Desired Info"

(?:\\"start\\":\\")(.+)(?:\\")
"Desired Info" into the NON IGNORED capturing group.

Related

How to delete a substring after a certain word?

I have a long String and want to delete the part of the String that comes after a word and I'm looking for the gsub! command that does that. I would appreciate it if you could provide it.
For reference:
I know that the command to delete the part of the String (the String is called contents) that comes before the word "body" is:
contents.gsub!(/.*?(?=body)/im, "")
Thanks.
This code:
"this has a word in it".gsub! /(word).*/, $1
Will change the string to "this has a word"
The "word" in brackets is the first argument returned by the regex, and $1 returns that argument.
See the Ruby docs for gsub
Going by your regex, that requires the / in body to be escaped, I'm assuming you mean every after
contents = "Stuff before </body> stuff after"
contents.gsub(/(?<=\/body>).+/, "")
=> "Stuff before </body>"

Regex for validating constant field with numbers

I am new to ruby. I am trying for a regex pattern matching for my input. My requirement is that my input should strictly adhere to the following format
CHECK ID#<number>
(Eg. my input should be CHECK ID#3213)
How do i frame the pattern for this?
If you want to extract the ID number use this
"CHECK ID#123".scan(/CHECK ID#(\d+)/).last.first.to_i # => 123
Because you just need one result there is not need to use .scan or .match
"CHECK ID#123"[/CHECK ID#(\d+)/, 1].to_i
How about this:
match = "CHECK ID#1221".match /^CHECK ID#(\d+)$/
puts match[1] if match
=> 1221

Deleting all special characters from a string - ruby

I was doing the challenges from pythonchallenge writing code in ruby, specifically this one. It contains a really long string in page source with special characters. I was trying to find a way to delete them/check for the alphabetical chars.
I tried using scan method, but I think I might not use it properly. I also tried delete! like that:
a = "PAGE SOURCE CODE PASTED HERE"
a.delete! "!", "#" #and so on with special chars, does not work(?)
a
How can I do that?
Thanks
You can do this
a.gsub!(/[^0-9A-Za-z]/, '')
try with gsub
a.gsub!(/[!#%&"]/,'')
try the regexp on rubular.com
if you want something more general you can have a string with valid chars and remove what's not in there:
a.gsub!(/[^abcdefghijklmnopqrstuvwxyz ]/,'')
When you give multiple arguments to string#delete, it's the intersection of those arguments that is deleted. a.delete! "!", "#" deletes the intersections of the sets ! and # which means that nothing will be deleted and the method returns nil.
What you wanted to do is a.delete! "!#" with the characters to delete passed as a single string.
Since the challenge is asking to clean up the mess and find a message in it, I would go with a whitelist instead of deleting special characters. The delete method accepts ranges with - and negations with ^ (similar to a regex) so you can do something like this: a.delete! "^A-Za-z ".
You could also use regular expressions as shown by #arieljuod.
gsub is one of the most used Ruby methods in the wild.
specialname="Hello!#$#"
cleanedname = specialname.gsub(/[^a-zA-Z0-9\-]/,"")
I think a.gsub(/[^A-Za-z0-9 ]/, '') works better in this case. Otherwise, if you have a sentence, which typically should start with a capital letter, you will lose your capital letter. You would also lose any 1337 speak, or other possible crypts within the text.
Case in point:
phrase = "Joe can't tell between 'large' and large."
=> "Joe can't tell between 'large' and large."
phrase.gsub(/[^a-z ]/, '')
=> "oe cant tell between large and large"
phrase.gsub(/[^A-Za-z0-9 ]/, '')
=> "Joe cant tell between large and large"
phrase2 = "W3 a11 f10a7 d0wn h3r3!"
phrase2.gsub(/[^a-z ]/, '')
=> " a fa dwn hr"
phrase2.gsub(/[^A-Za-z0-9 ]/, '')
=> "W3 a11 f10a7 d0wn h3r3"
If you don't want to change the original string - i.e. to solve the challenge.
str.each_char do |letter|
if letter =~ /[a-z]/
p letter
end
end
You will have to write down your own string sanitize function, could easily use regex and the gsub method.
Atomic sample:
your_text.gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
API sample:
Route: post 'api/sanitize_text', to: 'api#sanitize_text'
Controller:
def sanitize_text
return render_bad_request unless params[:text].present? && params[:text].present?
sanitized_text = params[:text].gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
render_response( {safe_text: sanitized_text})
end
Then you call it
POST /api/sanitize_text?text=abcdefghijklmnopqrstuvwxyz123456<>$!#%23^%26*[]:;{}()`,.~'"\|/

Ruby regex: split string with match beginning with either a newline or the start of the string?

Here's my regular expression that I have for this. I'm in Ruby, which — if I'm not mistaken — uses POSIX regular expressions.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
Here's my goal: I want to split a string with a regex that is *delimited by asterisks*, including those asterisks. However: I only want to split by the match if it is prefaced with a newline character (\n), or it's the start of the whole string. This is the string I'm working with.
"*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
My regular expression is not splitting properly at the *Friday* match, but it is splitting at the *But break here* match (it's also throwing in a here split). My issue is somewhere in the first group, I think: (?:\n^) — I know it's wrong, and I'm not entirely sure of the correct way to write it. Can someone shed some light? Here's my complete code.
regex = /(?:\n^)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
str.split(regex)
Which results in this:
>>> ["*Friday*\nDo not *break here*", "*But break here*", "But again, not this"]
I want it to be this:
>>> ["*Friday*", "Do not *break here*", "*But break here*", "But again, not this"]
Edit #1: I've updated my regex and result. (2011/10/18 16:26 CST)
Edit #2: I've updated both again. (16:32 CST)
What if you just add a '\n' to the front of each string. That simplifies the processing quite a bit:
regex = /(?:\n)(\*[\w+ ?]+\*)\n/
str = "*Friday*\nDo not *break here*\n*But break here*\nBut again, not this"
res = ("\n"+str).split(regex)
res.shift if res[0] == ""
res
=> [ "*Friday*", "Do not *break here*",
"*But break here*", "But again, not this"]
We have to watch for the initial extra match but it's not too bad. I suspect someone can shorten this a bit.
Groups 1 & 2 of the regex below :
(?:\A|\\n)(\*.*?\*)|(?:\A|\\n)(.*?)(?=\\n|\Z)
Will give you your desired output. I am no ruby expert so you will have to create the list yourself :)
Why not just split at newlines? From your example, it looks that's what you're really trying to do.
str.split("\n")

Ruby regex: "capture string unless it is followed by..."

My regex captures quoted phrases:
"([^"]*)"
I want to improve it, by ignoring quotes, which are followed by ', -' (a comma, a space and a dash in this particular order).
How do I do this?
The test: http://rubular.com/r/xls6vN1w92
This should do it, using a Negative Lookahead:
"(?!, -)([^"]*)"(?!, -)
A little icky, but it works. You want to make sure either quote isn't followed by your string, or else the match will start at the closing quotes.
http://rubular.com/r/yFMyUKJOHL
Regex
"(.*?)"(?!, -)
Working Example
http://rubular.com/r/9kOmZLxLfy
This is unparsable in your context, its open ended. The only way to parse it is to consume the not's as well as the want's, but its still an invalid premise.
/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/
Then check for capture group 1 on each match, something like this:
$rx = qr/"([^"]*?)"(?!, -)|"[^"]*?"(?=, -)/;
while (' "ingnore me", - "but not me" ' =~ /$rx/g) {
print "'$1'\n" if defined $1
}
Add (?!...) at the end of the regex:
"([^"\n]*)"(?!, -)

Resources