I have a method which parses a string in to a date, but i want to validate that i don't try to parse a non numeric string or a string which dosent represent a date or time format?
how can id o this?
at the moment i have:
if(string=~ /^\D*$/ )
{
return false
else
do something_else
}
this was fine for a non numeric string like "UNKNOWN" but wouldn't work for "UNKNOWN1"
any idea what i can use to make sure that only date or time formats are parsed?
DateTime.strptime v ParseDate.parsedate
No pun intended but the information herein is now out of date (2015) and some methods and modules have been removed from Ruby 2.x I'm leaving it here just in case someone, somewhere is still using 1.8.7
Ok, maybe there was a small pun intended there ;-)
You would think that you could use either Date.parse or DateTime.parse to check for bad dates (see more on Date.parse here)
d = Date.parse(string) rescue nil
if d
do_something
else
return false
end
because bad values throw an exception which you can catch. However the test strings suggested actually return a Date with Date.parse
For example ..
~\> irb
>> Date.parse '12-UNKN/34/OWN1'
=> #<Date: 4910841/2,0,2299161>
>>
Date.parse just isn't clever enough to do the job :-(
ParseDate.parsedate does a better job. You can see that it attempts to parse the date but in the test examples, doesn't find a valid year or month. More information here
>> require 'parsedate'
=> true
>> ParseDate.parsedate '2010-09-09'
=> [2010, 9, 9, nil, nil, nil, nil, nil]
>> ParseDate.parsedate 'dsadasd'
=> [nil, nil, nil, nil, nil, nil, nil, nil]
>> ParseDate.parsedate '12-UNKN/34/OWN1'
=> [nil, nil, 12, nil, nil, nil, nil, nil]
>> ParseDate.parsedate '12-UNKN/34/OWN1'
=> [nil, nil, 12, nil, nil, nil, nil, nil]
Regardless of which method you use to parse a date, you can validate strict conformance by reformatting the resulting date and comparing it with the original input. For example:
def strict_parse(input, format)
Time.strptime(input, format).tap { |output| expect(output.strftime(format)).to eq input }
end
This is strict however, e.g. "1/9/2014" won't parse with format "%d/%m/%Y". It would have to be "01/09/2014" to be acceptable.
Ruby's parsers are optimistic, if you can throw out a bunch of garbage and get a result from the input string, Date.parse and DateTime.strptime will try to do it.
You want a pessimistic and strict check, which means instead of assuming acceptance after trying to hunt for garbage with a regex, you should assume rejection and hunt for treasure with your regex.
Your first check: "Is a string numeric" is using a regex to try and find a string which is comprised entirely of non-numeric characters, and rejecting if it finds it. \D (with a capital D) is looking for non-numeric characters, and input strings will only match your regex if it is comprised entirely of 0 or more non-numeric characters.
You'll likely have better luck with the following logic for numerics:
if(string=~ /^\d*$/ )
something_else
else
return false
end
This matches a string comprised entirely of 0 or more numeric characters, does something_else if it finds it, and returns false otherwise.
For times you want to explicitly search for times and reject all other values. For an HH:MM:SSAM format which tolerates omitting leading 0's for each field, with 12 hour times you could use the following:
if (string =~ /^[01]?\d:[0-5]?\d:[0-5]?\d[AP]M$/)
something_else
else
return false
end
Likewise for dates you want to explicitly search for dates that are valid, and reject all other values. For MM/DD/YYYY which tolerates omitting leading 0's for everything but years field you could go with:
if (string =~ /^[0-1]\d\/[0-3]?\d\/\d{4}/)
something_else
else
return false
end
Ruby's utility functions try to be verbose in what they accept, but for validation that is not a useful trait. Be strict, assume that everything is invalid until it proves otherwise, then accept it.
I'd advise you to establish a list of date and datetime formats that you expect and intend to support. You can define them using strftime compatible strings, and then use the same strings when parsing dates, using DateTime#strptime. Try to parse your input strings with each supported pattern, the first one which doesn't throw an exception will return parsed date. If each throws an exception, the string is not valid date.
Check this out:
Returns true is string is a valid time, false otherwise:
require 'time'
def is_a_time?(string)
!!(Time.parse(string) rescue false)
end
Returns true is string is a valid date, false otherwise:
require 'date'
def is_a_date?(string)
!!(Date.parse(string) rescue false)
end
Related
I created a hash out of file that contains date as a string in different formats (like September 1988, the other line would be July 11th 1960, and sometimes year only)
require 'date'
def create_book_hash(book_array)
{
link: book_array[0],
title: book_array[1],
author: book_array[2],
pages: book_array[3].to_i,
date: book_array[4],
rating: book_array[5].to_f,
genre: book_array[6]
}
end
def books_sorted_by_date (books_array)
books_array.sort_by { |key| Date.strptime(key[:date], '%Y, %m') }
end
book_file= File.read("books.txt")
.split("\n")
.map { |line| line.split("|")}
.map { |book_array| create_book_hash(book_array)}
puts books_sorted_by_date(book_file)
I'm trying to sort books by date, so it would be in ascending order by year and since I have different string types, i put a hash key as the first argument in strptime to access all the values in :date . And that gives me \strptime': invalid date (Date::Error).` I don't understand why and what can I do to convert these strings into date objects? (just ruby, no rails)
Handle Both Standard and Custom Date Strings
Date#parse doesn't handle arbitrary strings in all cases. Even when it does, it may not handle them the way you expect. For example:
parse_date "1/1/18"
#=> #<Date: 2001-01-18 ((2451928j,0s,0n),+0s,2299161j)>
While Date#parse handles many date formats automagically, it only successfully parses objects that match its internal expectations. When you have multiple or arbitrary date formats, you have to define your own date specifications using Date#strptime to handle those formats that Date#parse doesn't understand, or that it handles incorrectly. For example:
require 'date'
def parse_date str
Date.parse str
rescue Date::Error
case str
when /\A\d{4}\z/
Date.strptime str, '%Y'
when /\A\d{2}\z/
Date.strptime str, '%y'
else
raise "unexpected date format: #{str}"
end
end
date_samples = ["July 11th 1960", "September 1988", "1776"]
date_samples.map { |date| parse_date(date) }
#=> [#<Date: 1960-07-11 ((2437127j,0s,0n),+0s,2299161j)>, #<Date: 1988-09-01 ((2447406j,0s,0n),+0s,2299161j)>, #<Date: 1776-01-01 ((2369731j,0s,0n),+0s,2299161j)>]
This obviously is not an exhaustive list of potential formats, but you can add more examples to date_samples and update the case statement to include any unambiguous date formats you expect from your data set.
Date.strptime needs two parameters date-string and format of the date. To use strptime you need to know what is the format of the string beforehand.
see some examples here - https://apidock.com/ruby/Date/strptime/class
In your program you don't know exact format of the date on that line when it parses so you need to try something like -
def books_sorted_by_date (books_array)
books_array.sort_by { |key| Date.parse(key[:date]) }
end
Date.parse needs one argument - date string, it then tries to guess the date.
see details - https://apidock.com/ruby/v2_6_3/Date/parse/class
You will still have problems with just year with this approach.
I am trying to use regex to verify a date format and I would like to check if the day is less than 32. Similarly, that the month is also less than 12. I have no idea how to about it. Currently, this is what I have;
^[0-1]?[0-9]{1}\-[0-3]?[0-9]{1}\-[0-9]{2,4}$
This regex achieves the format (m)m-(d)d-(yy)yy
TL;DR
Don't use regular expressions for comparison operations. Use a regex to split off values to compare, or use an actual parser.
Use Regular Expressions to Extract Comparables
Date comparisons is a really poor problem for regex to solve. At most, you should use a regular expression to extract your days of the month for a numeric comparison. For example:
date = '01-01-1970'
date.split('-')[1].to_i < 32
#=> true
However, the code above won't really tell you if a given date is valid. For example, what about February 30th or November 31st? Instead, you should attempt to parse the date to determine its validity.
Use a Date Parser
The best way to tell if a given date is valid is to parse it with a date parser, and then report a Boolean result or handle the exception. For example, you could attempt to parse the date with Date#parse.
Boolean Results
If you just want a Boolean result, you can coerce a valid/invalid parse to true or false. For example:
require 'date'
date = '01-33-1970'
!!(Date.parse date rescue nil)
#=> false
Rescuing and Reporting the Exception
Less magically, you would need to rescue ArgumentError from Date#parse. For example:
require 'date'
def valid_date? date_string
true if Date.parse date_string
rescue ArgumentError => e
STDERR.puts "#{e.class}: #{e}: '#{date_string}'"
false
end
valid_date? '11-31-1970'
This will do what you expect, albeit more verbosely. For example, the above example will print the exception to standard error, and then return false as the result.
ArgumentError: invalid date: '11-31-1970'
#=> false
^(?:[0-1][1-2]|[1-9])\-(?:3[0-1]|[0-2][1-9]|[1-9])\-[0-9]{2}(?:[0-9]{2})?$
should do what you're looking for. It will only allow months from 1-12 (either 1-9 or 01-12), days from 1-31 (either 1-9 or 01-31) and years of at least 2 digits with a maximum of four. Tested on regex101.
Basic:
Here is a regex that should do what you want:
^(0[1-9]|1[0-2]|[1-9])-(0[1-9]|[1-2][0-9]|3[0-1]|[1-9])-\d{2}(\d{2})?$
It matches months greater than 0 and less than 13, then -, then days greater than 0 and less than 32, then -, then years (2 digits or 4 digits).
Bonus:
Full regex for matching dates in that format with validation:
^((0?[13578]|10|12)-(([1-9])|(0[1-9])|([12])([0-9]?)|(3[01]?))-((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1}))|(0?[2469]|11)-(([1-9])|(0[1-9])|([12])([0-9]?)|(3[0]?))-((19)([2-9])(\d{1})|(20)([01])(\d{1})|([8901])(\d{1})))$
If you want to determine the string is a valid date, you'd be better off attempting to convert it. If it won't convert, it's not valid.
def date_valid?(date_string)
format = '%m/%d/' + (date_string.split(-).last.size == 4 ? '%Y' : '%y')
return true if Date.strptime(date_string, format)
rescue ArgumentError
return false
end
Is there a way to see if a string is a valid month name in ruby?
You can do:
require 'date'
Date::MONTHNAMES.include? string
Note that this will return true if string is nil. All month names are capitalized, so if you don't care for case:
Date::MONTHNAMES.include?(string && string.capitalize)
If you want nil to return false:
!!string && Date::MONTHNAMES.include?(string.capitalize)
I will use the method #grep. It will validate all the possible month strings.
require 'date'
Date::MONTHNAMES.grep(Regexp.new(string, true)).empty?
If the above method returns true, that means the string is not valid month name, otherwise it is.
I passed the second argument to the method Regexp::new as true, to make the regex pattern case insensitive.
Try to use the Date.parse method instead. This has benefits over using the Date::MONTHNAMES.include? string as it will take into account for short handed month strings (eg: jun, aug, dec etc).
require 'date'
if (Date.parse(string) rescue false)
# code for valid month string
else
# code for invalid month string
end
You could also use a regex
months_regex = /(Jan|Febr)uary|March|April|May|June|July|August|September|(Octo|Novem|Decem)ber/
string =~ regex
the position, 0 , will return if there is a match else it will return nill
I am writing a 6502 assembler in Ruby. I am looking for a way to validate hexadecimal operands in string form. I understand that the String object provides a "hex" method to return a number, but here's a problem I run into:
"0A".hex #=> 10 - a valid hexadecimal value
"0Z".hex #=> 0 - invalid, produces a zero
"asfd".hex #=> 10 - Why 10? I guess it reads 'a' first and stops at 's'?
You will get some odd results by typing in a bunch of gibberish. What I need is a way to first verify that the value is a legit hex string.
I was playing around with regular expressions, and realized I can do this:
true if "0A" =~ /[A-Fa-f0-9]/
#=> true
true if "0Z" =~ /[A-Fa-f0-9]/
#=> true <-- PROBLEM
I'm not sure how to address this issue. I need to be able to verify that letters are only A-F and that if it is just numbers that is ok too.
I'm hoping to avoid spaghetti code, riddled with "if" statements. I am hoping that someone could provide a "one-liner" or some form of elegent code.
Thanks!
!str[/\H/] will look for invalid hex values.
String#hex does not interpret the whole string as hex, it extracts from the beginning of the string up to as far as it can be interpreted as hex. With "0Z", the "0" is valid hex, so it interpreted that part. With "asfd", the "a" is valid hex, so it interpreted that part.
One method:
str.to_i(16).to_s(16) == str.downcase
Another:
str =~ /\A[a-f0-9]+\Z/i # or simply /\A\h+\Z/ (see hirolau's answer)
About your regex, you have to use anchors (\A for begin of string and \Z for end of string) to say that you want the full string to match. Also, the + repeats the match for one or more characters.
Note that you could use ^ (begin of line) and $ (end of line), but this would allow strings like "something\n0A" to pass.
This is an old question, but I just had the issue myself. I opted for this in my code:
str =~ /^\h+$/
It has the added benefit of returning nil if str is nil.
Since Ruby has literal hex built-in, you can eval the string and rescue the SyntaxError
eval "0xA" => 10
eval "0xZ" => SyntaxError
You can use this on a method like
def is_hex?(str)
begin
eval("0x#{str}")
true
rescue SyntaxError
false
end
end
is_hex?('0A') => true
is_hex?('0Z') => false
Of course since you are using eval, make sure you are sending only safe values to the methods
def parse( line )
_, remote_addr, status, request, size, referrer, http_user_agent, http_x_forwarded_for = /^([^\s]+) - (\d+) \"(.+)\" (\d+) \"(.*)\" \"([^\"]*)\" \"(.*)\"/.match(line).to_a
print line
print request
if request && request != nil
_, referrer_host, referrer_url = /^http[s]?:\/\/([^\/]+)(\/.*)/.match(referrer).to_a if referrer
method, full_url, _ = request.split(' ')
in parse: private method 'split' called for nil:NilClass (NoMethodError)
So as i understand it's calling split not on a string, but on nil.
This part is parsing web server log. But I can't understand why it's getting nil. As I understand it's null.
Some of the subpatterns in regex failed? So it's the webserver's fault, which sometimes generates wrong logging strings?
By the way how do I write to file in ruby? I can't read properly in this cmd window under windows.
You seem to have a few questions here, so I'll take a stab at what seems to be the main one:
If you want to see if something is nil, just use .nil? - so in your example, you can just say request.nil?, which returns true if it is nil and false otherwise.
Ruby 2.3.0 added a safe navigation operator (&.) that checks for nil before calling a method.
request&.split(' ')
This is functionally* equivalent to
!request.nil? && request.split(' ')
*(They are slightly different. When request is nil, the top expression evaluates to nil, while the bottom expression evaluates to false.)
To write to a file:
File.open("file.txt", "w") do |file|
file.puts "whatever"
end
As I write in a comment above - you didn't say what is nil. Also, check whether referrer contains what you think it contains. EDIT I see it's request that is nil. Obviously, regexp trouble.
Use rubular.com to easily test your regexp. Copy a line from your input file into "Your test string", and your regexp into "Your regular expression", and tweak until you get a highlight in "Match result".
Also, what are "wrong logging strings"? If we're talking Apache, log format is configurable.