Validate that first char has to be integer or letter - ruby

I have this method where I "spot" user input I want to prevent.
def has_forbidden_prefix?(string)
string =~ %r{^(http://|www.)}
end
If has_forbidden_prefix? is true than I don't want to accept the input.
For example:
Allowed: google.com
Not allowed: www.google.com, http://google.com, http://www.google.com
Now I want to detect also any beginning special characters in my method.
Not allowed: .google.com, /google.com, ...
What do I have to include in my regex?

The regex to see if the first character is an alphanumeric or number is:
^[a-zA-Z0-9]
where ^ stands for the start of the string the regex pattern is applied to. For more info refer to http://www.regular-expressions.info/reference.html

That regex matches what you didn't want from the previous question
So expanding you could have
def has_forbidden_prefix?(string)
disallowed_string = string =~ %r{\A(http://|www)}
non_alphanumeric = string =~ /\A[^a-zA-Z0-9]/
disallowed_string || non_alphanumeric
end

Related

How to print an escape character in Ruby?

I have a string containing an escape character:
word = "x\nz"
and I would like to print it as x\nz.
However, puts word gives me:
x
z
How do I get puts word to output x\nz instead of creating a new line?
Use String#inspect
puts word.inspect #=> "x\nz"
Or just p
p word #=> "x\nz"
I have a string containing an escape character:
No, you don't. You have a string containing a newline.
How do I get puts word to output x\nz instead of creating a new line?
The easiest way would be to just create the string in the format you want in the first place:
word = 'x\nz'
# or
word = "x\\nz"
If that isn't possible, you can translate the string the way you want:
word = word.gsub("\n", '\n')
# or
word.gsub!("\n", '\n')
You may be tempted to do something like
puts word.inspect
# or
p word
Don't do that! #inspect is not guaranteed to have any particular format. The only requirement it has, is that it should return a human-readable string representation that is suitable for debugging. You should never rely on the content of #inspect, the only thing you should rely on, is that it is human readable.

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

Regex data validation for input containing only the digits

I'm trying to prompt the user to input a data value, and check that the string they enter only contains the digits 0-9:
class String
def validate(regex)
!self[regex].nil?
end
end
regex = /\A[0-9]\z/
validInput = false
until validInput
puts "Enter digits: "
input = gets.chomp
if !input.validate(regex)
puts "Invalid input. Seeking integer."
else
validInput = true
end
end
However this loop returns false on everything other than single digits excluding 0. I previously had this working with regex = /\A[1-9]\z/, but I going back to where it was working, it doesn't anymore, and this also means the user can't enter 0.
How can I use a regex to validate that user input only contains the digits 0-9?
Add a + quantifier after [1-9](Shouldn't it be [0-9] is you want digits from 0 to 9?). i.e. regex = /\A[0-9]+\z/
You can represent the character class [0-9] as \d so that the regex looks neater: \A\d+\z
The + checks for one(minimum) or more occurrences of a digit in the string. Right now, you are checking for presence of only ONE digit.
s !~ /\D/
.....................

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

Ruby: How to append to each line of a string based on a given regex?

I want to append </tag> to each line where it's missing:
text = '<tag>line 1</tag>
<tag>line2 # no closing tag, append
<tag>line3 # no closing tag, append
line4</tag> # no opening tag, but has a closing tag, so ignore
<tag>line5</tag>'
I tried to create a regular expression to match this but I know its wrong:
text.gsub! /.*?(<\/tag>)Z/, '</tag>'
How can I create a regular expression to conditionally append each line?
Here you go:
text.gsub!(%r{(?<!</tag>)$}, "</tag>")
Explanation:
$ means end of line and \z means end of string. \Z means something similar, with complications.
(?<!) work together to create a negative lookbehind.
Given the example provided, I'd just do something like this:
text.split(/<\/?tag>/).
reject {|t| t.strip.length == 0 }.
map {|t| "<tag>%s</tag>" % t.strip }.
join("\n")
You're basically treating either and as record delimiters, so you can just split on them, reject any blank records, then construct a new combined string from the extracted values. This works nicely when you can't count on newlines being record delimiters and will generally be tolerant of missing tags.
If you're insistent on a pure regex solution, though, and your data format will always match the given format (one record per line), you can use a negative lookbehind:
text.strip.gsub(/(?<!<\/tag>)(\n|$)/, "</tag>\\1")
One that could work is:
/<tag>[^\n ]+[^>][\s]*(\n)/
This is will return all the newline chars without a ">" before them.
Replace it with "\n", i.e.
text.gsub!( /<tag>[^\n ]+[^>][\s]*(\n)/ , "</tag>\n")
For more polishing, try http://rubular.com/
text = '<tag>line 1</tag>
<tag>line2
<tag>line3
line4</tag>
<tag>line5</tag>'
result = ""
text.each_line do |line|
line.rstrip!
line << "</tag>" if not line.end_with?("</tag>")
result << line << "\n"
end
puts result
--output:--
<tag>line 1</tag>
<tag>line2</tag>
<tag>line3</tag>
line4</tag>
<tag>line5</tag>

Resources