I created a hash out of file that contains date as a string in different formats (like September 1988, the other line would be July 11th 1960, and sometimes year only)
require 'date'
def create_book_hash(book_array)
{
link: book_array[0],
title: book_array[1],
author: book_array[2],
pages: book_array[3].to_i,
date: book_array[4],
rating: book_array[5].to_f,
genre: book_array[6]
}
end
def books_sorted_by_date (books_array)
books_array.sort_by { |key| Date.strptime(key[:date], '%Y, %m') }
end
book_file= File.read("books.txt")
.split("\n")
.map { |line| line.split("|")}
.map { |book_array| create_book_hash(book_array)}
puts books_sorted_by_date(book_file)
I'm trying to sort books by date, so it would be in ascending order by year and since I have different string types, i put a hash key as the first argument in strptime to access all the values in :date . And that gives me \strptime': invalid date (Date::Error).` I don't understand why and what can I do to convert these strings into date objects? (just ruby, no rails)
Handle Both Standard and Custom Date Strings
Date#parse doesn't handle arbitrary strings in all cases. Even when it does, it may not handle them the way you expect. For example:
parse_date "1/1/18"
#=> #<Date: 2001-01-18 ((2451928j,0s,0n),+0s,2299161j)>
While Date#parse handles many date formats automagically, it only successfully parses objects that match its internal expectations. When you have multiple or arbitrary date formats, you have to define your own date specifications using Date#strptime to handle those formats that Date#parse doesn't understand, or that it handles incorrectly. For example:
require 'date'
def parse_date str
Date.parse str
rescue Date::Error
case str
when /\A\d{4}\z/
Date.strptime str, '%Y'
when /\A\d{2}\z/
Date.strptime str, '%y'
else
raise "unexpected date format: #{str}"
end
end
date_samples = ["July 11th 1960", "September 1988", "1776"]
date_samples.map { |date| parse_date(date) }
#=> [#<Date: 1960-07-11 ((2437127j,0s,0n),+0s,2299161j)>, #<Date: 1988-09-01 ((2447406j,0s,0n),+0s,2299161j)>, #<Date: 1776-01-01 ((2369731j,0s,0n),+0s,2299161j)>]
This obviously is not an exhaustive list of potential formats, but you can add more examples to date_samples and update the case statement to include any unambiguous date formats you expect from your data set.
Date.strptime needs two parameters date-string and format of the date. To use strptime you need to know what is the format of the string beforehand.
see some examples here - https://apidock.com/ruby/Date/strptime/class
In your program you don't know exact format of the date on that line when it parses so you need to try something like -
def books_sorted_by_date (books_array)
books_array.sort_by { |key| Date.parse(key[:date]) }
end
Date.parse needs one argument - date string, it then tries to guess the date.
see details - https://apidock.com/ruby/v2_6_3/Date/parse/class
You will still have problems with just year with this approach.
How do you get the month as an integer from the below code
3.2.21#2.1.3 (#<VouchersController:0x007ff453)> t = (Date.today + 5).to_s
=> "2015-12-01"
3.2.21#2.1.3 (#<VouchersController:0x007ff453)> t.to_i
=> 2015
3.2.21#2.1.3 (#<VouchersController:0x007ff453)>
I can get the year. But how do I get the month as an integer so this returns 12?
The reason you're getting the year is only that you're converting the string "2015-12-01" to an integer.
When you use to_i on a Ruby string, it uses only leading digit characters, then throws away the rest of the string. When it reaches the first - character, it stops parsing as an integer and returns what it has so far: 2015.
In order to use the actual functionality of Date, don't use to_s to convert the object into a string.
require 'date'
t = Date.today + 5 # => #<Date: 2015-11-30 ((2457357j,0s,0n),+0s,2299161j)>
t.year # => 2015
t.month # => 11
I'm reading from xls & csv files with the dates that have the following formatting;
10-Aug-14
And I need them to be: dd/mm/yyyy (11/08/2014)
Have tried the date_format gem the standard Ruby Date & Time classes with no luck.
Inspection shows it's an array consisting of a Date object & a String;
p date_start #=> #<Date: -4712-01-01 ((0j,0s,0n),+0s,2299161j)> "11-Aug-14"
puts date_start #=> -4712-01-01
#=> 11-Aug-14
puts date_start.class #=> Array
puts date_start[0].class #=> Date
puts date_start[1].class #=> String
Any idea how I can parse this into a date that Ruby understands.
Also I need to get the weekdays in numbers between two dates so getting this right is key.
For parse date:
my_date = Date.strptime("10-Aug-14 ", "%d-%b-%y")
To the other format(dd/mm/yy):
puts my_date.strftime("%d/%m/%Y")
For weekdays count you can use 'weekdays gem' --> https://github.com/mdarby/weekdays
I am learning Ruby and faced with some problem.I tried to compare a sum of expressions with integer and get this return: "comparison of String with 2000 failed". Thanks a lot!
puts "Hello! Please type here your birthday date."
puts "Day"
day = gets.chomp
day.capitalize!
puts "Month"
month = gets.chomp
month.capitalize!
puts "Year"
year = gets.chomp
year.capitalize!
if month + day + year > 2000
puts "Sum of all the numbers from your birthday date is more than 2000"
else month + day + year < 2000
puts "Sum of all the numbers from your birthday date is less than 2000"
end
day = gets.chomp
Here day is a string. And month + day + year is a string too, only longer. To get integers, call .to_i.
day = gets.to_i # to_i will handle the newline, no need to chomp.
# repeat for month and year
(Of course, once you converted strings to integers, you won't be able to capitalize them. It made no sense anyway.)
I got a CSV dump from SQL Server 2008 that has lines like this:
Plumbing,196222006P,REPLACE LEAD WATER SERVICE W/1" COPPER,1996-08-09 00:00:00
Construction,197133031B,"MORGAN SHOES" ALT,1997-05-13 00:00:00
Electrical,197135021E,"SERVICE, "OUTLETS"",1997-05-15 00:00:00
Electrical,197135021E,"SERVICE, "OUTLETS" FOOBAR",1997-05-15 00:00:00
Construction,198120036B,"""MERITER"",""DO IT CTR"", ""NCR"" AND ""TRACE"" ALTERATION",1998-04-30 00:00:00
parse_dbenhur is pretty, but can it be rewritten to support the presence of both commas and quotes? parse_ugly is, well, ugly.
# #dbenhur's excellent answer, which works 100% for what i originally asked for
SEP = /(?:,|\Z)/
QUOTED = /"([^"]*)"/
UNQUOTED = /([^,]*)/
FIELD = /(?:#{QUOTED}|#{UNQUOTED})#{SEP}/
def parse_dbenhur(line)
line.scan(FIELD)[0...-1].map{ |matches| matches[0] || matches[1] }
end
def parse_ugly(line)
dumb_fields = line.chomp.split(',').map { |v| v.gsub(/\s+/, ' ') }
fields = []
open = false
dumb_fields.each_with_index do |v, i|
open ? fields.last.concat(v) : fields.push(v)
open = (v.start_with?('"') and (v.count('"') % 2 == 1) and dumb_fields[i+1] and dumb_fields[i+1].start_with?(' ')) || (open and !v.end_with?('"'))
end
fields.map { |v| (v.start_with?('"') and v.end_with?('"')) ? v[1..-2] : v }
end
lines = []
lines << 'Plumbing,196222006P,REPLACE LEAD WATER SERVICE W/1" COPPER,1996-08-09 00:00:00'
lines << 'Construction,197133031B,"MORGAN SHOES" ALT,1997-05-13 00:00:00'
lines << 'Electrical,197135021E,"SERVICE, "OUTLETS"",1997-05-15 00:00:00'
lines << 'Electrical,197135021E,"SERVICE, "OUTLETS" FOOBAR",1997-05-15 00:00:00'
lines << 'Construction,198120036B,"""MERITER"",""DO IT CTR"", ""NCR"" AND ""TRACE"" ALTERATION",1998-04-30 00:00:00'
require 'csv'
lines.each do |line|
puts
puts line
begin
c = CSV.parse_line(line)
puts "#{c.to_csv.chomp} (size #{c.length})"
rescue
puts "FasterCSV says: #{$!}"
end
a = parse_ugly(line)
puts "#{a.to_csv.chomp} (size #{a.length})"
b = parse_dbenhur(line)
puts "#{b.to_csv.chomp} (size #{b.length})"
end
Here's the output when I run it:
Plumbing,196222006P,REPLACE LEAD WATER SERVICE W/1" COPPER,1996-08-09 00:00:00
FasterCSV says: Illegal quoting in line 1.
Plumbing,196222006P,"REPLACE LEAD WATER SERVICE W/1"" COPPER",1996-08-09 00:00:00 (size 4)
Plumbing,196222006P,"REPLACE LEAD WATER SERVICE W/1"" COPPER",1996-08-09 00:00:00 (size 4)
Construction,197133031B,"MORGAN SHOES" ALT,1997-05-13 00:00:00
FasterCSV says: Unclosed quoted field on line 1.
Construction,197133031B,"""MORGAN SHOES"" ALT",1997-05-13 00:00:00 (size 4)
Construction,197133031B,"""MORGAN SHOES"" ALT",1997-05-13 00:00:00 (size 4)
Electrical,197135021E,"SERVICE, "OUTLETS"",1997-05-15 00:00:00
FasterCSV says: Missing or stray quote in line 1
Electrical,197135021E,"SERVICE ""OUTLETS""",1997-05-15 00:00:00 (size 4)
Electrical,197135021E,"""SERVICE"," ""OUTLETS""""",1997-05-15 00:00:00 (size 5)
Electrical,197135021E,"SERVICE, "OUTLETS" FOOBAR",1997-05-15 00:00:00
FasterCSV says: Missing or stray quote in line 1
Electrical,197135021E,"SERVICE ""OUTLETS"" FOOBAR",1997-05-15 00:00:00 (size 4)
Electrical,197135021E,"""SERVICE"," ""OUTLETS"" FOOBAR""",1997-05-15 00:00:00 (size 5)
Construction,198120036B,"""MERITER"",""DO IT CTR"", ""NCR"" AND ""TRACE"" ALTERATION",1998-04-30 00:00:00
Construction,198120036B,"""MERITER"",""DO IT CTR"", ""NCR"" AND ""TRACE"" ALTERATION",1998-04-30 00:00:00 (size 4)
Construction,198120036B,"""""MERITER""","""DO IT CTR"""," """"NCR"""" AND """"TRACE"""" ALTERATION""",1998-04-30 00:00:00 (size 6)
Construction,198120036B,"""""""MERITER""""","""""DO IT CTR"""""," """"NCR"""" AND """"TRACE"""" ALTERATION""",1998-04-30 00:00:00 (size 6)
UPDATE
Note that the CSV uses double quotes when a field has a comma.
UPDATE 2
It's fine if commas are stripped out of the fields in question... my parse_ugly method doesn't preserve them.
UPDATE 3
I learned from the client that it's SQL Server 2008 that's exporting this strange CSV - which has been reported to Microsoft here and here
UPDATE 4
#dbenhur's answer worked perfectly for what I originally asked for, but pointed out that I neglected to show lines with both commas and quotes. I will accept d#benhur's answer - but I'm hoping it can be improved to work on all lines above.
HOPEFULLY FINAL UPDATE
This code works (and I would consider it "semantically correct"):
QUOTED = /"((?:[^"]|(?:""(?!")))*)"/
SEPQ = /,(?! )/
UNQUOTED = /([^,]*)/
SEPU = /,(?=(?:[^ ]|(?: +[^",]*,)))/
FIELD = /(?:#{QUOTED}#{SEPQ})|(?:#{UNQUOTED}#{SEPU})|\Z/
def parse_sql_server_2008_csv_line(line)
line.scan(FIELD)[0...-1].map{ |matches| (matches[0] || matches[1]).tr(',', ' ').gsub(/\s+/, ' ') }
end
Adapted from #dbenhur and #ghostdog74's answer in How can I process a CSV file with “bad commas”?
The following uses regexp and String#scan. I observe that in the broken CSV format you're dealing with, that " only has quoting properties when it comes at the beginning and end of a field.
Scan moves through the string successively matching the regexp, so the regexp can assume its start match point is the beginning of a field. We construct the regexp so it can match a balanced quoted field with no internal quotes (QUOTED) or a string of non-commas (UNQUOTED). When either alternative field representation is matched, it must be followed by a separator which can be either comma or end of string (SEP)
Because UNQUOTED can match a zero length field before a separator, the scan always matches an empty field at the end which we discard with [0...-1]. Scan produces an array of tuples; each tuple is an array of the capture groups, so we map over each element picking the captured alternate with matches[0] || matches[1].
None of your example lines show a field which contains both a comma and a quote -- I have no idea how it would be legally represented and this code probably wont recognize such a field correctly.
SEP = /(?:,|\Z)/
QUOTED = /"([^"]*)"/
UNQUOTED = /([^,]*)/
FIELD = /(?:#{QUOTED}|#{UNQUOTED})#{SEP}/
def ugly_parse line
line.scan(FIELD)[0...-1].map{ |matches| matches[0] || matches[1] }
end
lines.each do |l|
puts l
puts ugly_parse(l).inspect
puts
end
# Electrical,197135021E,"SERVICE, OUTLETS",1997-05-15 00:00:00
# ["Electrical", "197135021E", "SERVICE, OUTLETS", "1997-05-15 00:00:00"]
#
# Plumbing,196222006P,REPLACE LEAD WATER SERVICE W/1" COPPER,1996-08-09 00:00:00
# ["Plumbing", "196222006P", "REPLACE LEAD WATER SERVICE W/1\" COPPER", "1996-08-09 00:00:00"]
#
# Construction,197133031B,"MORGAN SHOES" ALT,1997-05-13 00:00:00
# ["Construction", "197133031B", "MORGAN SHOES\" ALT", "1997-05-13 00:00:00"]
If your CSV doesn't ever use a double quote as a legitimate quoting character, tweak the options to CSV to pass :quote_char => "\0" and then you can do this (wrapped strings for clarity)
1.9.3p327 > puts 'Construction,197133031B,"MORGAN SHOES" ALT,
1997-05-13 00:00:00'.parse_csv(:quote_char => "\0")
Construction
197133031B
"MORGAN SHOES" ALT
1997-05-13 00:00:00
1.9.3p327 > puts 'Plumbing,196222006P,REPLACE LEAD WATER SERVICE W/1" COPPER,
1996-08-09 00:00:00'.parse_csv(:quote_char => "\0")
Plumbing
196222006P
REPLACE LEAD WATER SERVICE W/1" COPPER
1996-08-09 00:00:00