def parse( line )
_, remote_addr, status, request, size, referrer, http_user_agent, http_x_forwarded_for = /^([^\s]+) - (\d+) \"(.+)\" (\d+) \"(.*)\" \"([^\"]*)\" \"(.*)\"/.match(line).to_a
print line
print request
if request && request != nil
_, referrer_host, referrer_url = /^http[s]?:\/\/([^\/]+)(\/.*)/.match(referrer).to_a if referrer
method, full_url, _ = request.split(' ')
in parse: private method 'split' called for nil:NilClass (NoMethodError)
So as i understand it's calling split not on a string, but on nil.
This part is parsing web server log. But I can't understand why it's getting nil. As I understand it's null.
Some of the subpatterns in regex failed? So it's the webserver's fault, which sometimes generates wrong logging strings?
By the way how do I write to file in ruby? I can't read properly in this cmd window under windows.
You seem to have a few questions here, so I'll take a stab at what seems to be the main one:
If you want to see if something is nil, just use .nil? - so in your example, you can just say request.nil?, which returns true if it is nil and false otherwise.
Ruby 2.3.0 added a safe navigation operator (&.) that checks for nil before calling a method.
request&.split(' ')
This is functionally* equivalent to
!request.nil? && request.split(' ')
*(They are slightly different. When request is nil, the top expression evaluates to nil, while the bottom expression evaluates to false.)
To write to a file:
File.open("file.txt", "w") do |file|
file.puts "whatever"
end
As I write in a comment above - you didn't say what is nil. Also, check whether referrer contains what you think it contains. EDIT I see it's request that is nil. Obviously, regexp trouble.
Use rubular.com to easily test your regexp. Copy a line from your input file into "Your test string", and your regexp into "Your regular expression", and tweak until you get a highlight in "Match result".
Also, what are "wrong logging strings"? If we're talking Apache, log format is configurable.
Related
I was giving this regex /\A[\w+\-.]+#[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,}\z/ and I'm to use it in order to verify the validity of an email address.
This is my first Ruby code and I'm not sure how to do it. I was told to use the .match method, which returns MatchData object. But how do I go on verifying that the MatchData object confirms the validity?
I attempted using the following, but it seems that it's accepting any string, even not an email address.
#Register new handler
def register_handler
email_regex = /\A[\w+\-.]+#[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,}\z/
email = "invalid"
unless email =~ email_regex then
puts("Insert your e-mail address:")
email = gets
end
puts("email received")
end
What is the correct way to do this? Either using .match or the method I attempted above.
The regex matches email addresses:
'invalid' =~ email_regex
=> nil # which boolean value is false
'email#example.com' =~ email_regex
=> 0 # which boolean value is true
however:
"email#example.com\n" =~ email_regex
=> nil
The newline character \n is appended by gets to every input.
That is why, the until loop will run forever regardless of what you will type in the terminal. The matching result will always be nil because of the newline character.
Try using gets.chomp, which will trim the newline character and your code should work.
Try this:
Edited the regex and put () to capture group and the beginning and the end
re = /^([\w+\-.]+#[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,})$/m
str = 'a#b.com
p#qasdf.com
adbadf#bwdsfqaf.com
....
a#bdotcom
aasdf.com
www.yahoo.com'
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
Running sample code
I wrote the following:
greeting="i am awesome"
puts("I was saying that #{greeting}")
# => I was saying that i am awesome
When I change it to this:
def showman
print("I am awesome ")
end
puts("I was saying that #{showman}")
# => I am awesome I was saying that
why is the method output printed first, and then the string? Why is it not printing like "I was saying that I am awesome"? What can I do to make the output be as such?
If I modify the showman function to:
def showman
return ("I am awesome ")
end
then it gives the desired output. Why is the use of return in this way making difference to the output?
In first output why method output is printed first and then the string.
In order to evaluate the string, showman is evaluated before the whole string is evaluated, which prints "I am awesome ".
Why its not printing like "I was saying that I am awesome"
Because print returns nil, and interpolating nil in a string evaluates to an empty string ("#{showman}" → "#{nil}" → ""). Without print, the showman method returns the string "I am awesome ".
Why is the use of return in this way making difference to the output?
It is not the use of return that is making the difference. It is the absence of print that is making the difference.
I have a variable that represents a path:
path = "/foo/bar"
I want to remove the last part of the path. I tried it with gsub! like this:
path.gsub!("/bar","")
but I also want to throw an error if "/bar" isn't at the end of the string. I also tried path.split("/"), but this seems not very memory efficient. The method is called a lot, so an in-place approach would be perfect. Another variation would be to only remove every string until "/" is hit, without throwing the error.
What would be a fast and memory efficient method to do this?
You could use a Regexp to match only at the end of the string:
'/foo/bar'.gsub!(/\/bar\z/, '')
#=> '/foo'
Since gsub! returns nil if there wasn't a match, just combine it with raising an error:
'/foo/blub'.gsub!(/\/bar\z/, '') || raise(StandardError)
#=> StandardError: StandardError
To get what you want, you can do:
File.dirname("/foo/bar")
# => "/foo"
To raise an error is a different thing:
raise unless "/foo/bar".end_with?("/bar")
You could use String#rindex to find the last occurrence of / and use that value to get the preceding sub-string:
path[0, path.rindex("/")]
I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!
!str[/\H/] will look for invalid hex values.
String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.
One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.
This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.
Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods
Good afternoon,
I'm learning about using RegEx's in Ruby, and have hit a point where I need some assistance.
I am trying to extract 0 to many URLs from a string.
This is the code I'm using:
sStrings = ["hello world: http://www.google.com", "There is only one url in this string http://yahoo.com . Did you get that?", "The first URL in this string is http://www.bing.com and the second is http://digg.com","This one is more complicated http://is.gd/12345 http://is.gd/4567?q=1", "This string contains no urls"]
sStrings.each do |s|
x = s.scan(/((http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.[\w-]*)?)/ix)
x.each do |url|
puts url
end
end
This is what is returned:
http://www.google.com
http
.google
nil
nil
http://yahoo.com
http
nil
nil
nil
http://www.bing.com
http
.bing
nil
nil
http://digg.com
http
nil
nil
nil
http://is.gd/12345
http
nil
/12345
nil
http://is.gd/4567
http
nil
/4567
nil
What is the best way to extract only the full URLs and not the parts of the RegEx?
You could use anonymous capture groups (?:...) instead of (...).
I see that you are doing this in order to learn Regex, but in case you really want to extract URLs from a String, take a look at URI.extract, which extracts URIs from a String. (require "uri" in order to use it)
You can create a non-capturing group using (?:SUB_PATTERN). Here's an illustration, with some additional simplifications thrown in. Also, since you're using the /x option, take advantage of it by laying out your regex in a readable way.
sStrings = [
"hello world: http://www.google.com",
"There is only one url in this string http://yahoo.com . Did you get that?",
"... is http://www.bing.com and the second is http://digg.com",
"This one is more complicated http://is.gd/12345 http://is.gd/4567?q=1",
"This string contains no urls",
]
sStrings.each do |s|
x = s.scan(/
https?:\/\/
\w+
(?: [.-]\w+ )*
(?:
\/
[0-9]{1,5}
\?
[\w=]*
)?
/ix)
p x
end
This is fine for learning, but don't really try to match URLs this way. There are tools for that.