I'm working on an example problem from Chris Pine's Learn to Program book and I'm having an issue removing white space in my hash values.
I start with a txt file that contains names and birthday information, like so:
Christopher Alexander, Oct 4, 1936
Christopher Lambert, Mar 29, 1957
Christopher Lee, May 27, 1922
Christopher Lloyd, Oct 22, 1938
Christopher Pine, Aug 3, 1976
Then I go through each line, split at the first comma, and then try to go through each key,value to strip the white space.
birth_dates = Hash.new {}
File.open 'birthdays.txt', 'r' do |f|
f.read.each_line do |line|
name, date = line.split(/,/, 2)
birth_dates[name] = date
birth_dates.each_key { |a| birth_dates[a].strip! }
end
But nothing is getting stripped.
{"Christopher Alexander"=>" Oct 4, 1936", "Christopher Lambert"=>" Mar 29, 1957", "Christopher Lee"=>" May 27, 1922", "Christopher Lloyd"=>" Oct 22, 1938", "Christopher Pine"=>" Aug 3, 1976", "Christopher Plummer"=>" Dec 13, 1927", "Christopher Walken"=>" Mar 31, 1943", "The King of Spain"=>" Jan 5, 1938"}
I've seen a handful of solutions for Arrays using .map - but this was the only hash example I came across. Any idea why it may not be working for me?
UPDATE: removed the redundant chomp as per sawa's comment.
For parsing comma delimited files i use CSV like this
def parse_birthdays(file='birthdays.txt', hash={})
CSV.foreach(file, :converters=> lambda {|f| f ? f.strip : nil}){|name, date, year|hash[name] = "#{year}-#{date.gsub(/ +/,'-')}" }
hash
end
parse_birthdays
# {"Christopher Alexander"=>"1936-Oct-4", "Christopher Lambert"=>"1957-Mar-29", "Christopher Lee"=>"1922-May-27", "Christopher Lloyd"=>"1938-Oct-22", "Christopher Pine"=>"1976-Aug-3"}
of if you need real date's you can drop the lambda
def parse_birthdays(file='birthdays.txt', hash={})
CSV.foreach(file){|name, date, year|hash[name] = Date.parse("#{year}-#{date}")}
hash
end
parse_birthdays
# {"Christopher Alexander"=>#<Date: 2014-10-04 ((2456935j,0s,0n),+0s,2299161j)>, "Christopher Lambert"=>#<Date: 2014-03-29 ((2456746j,0s,0n),+0s,2299161j)>, "Christopher Lee"=>#<Date: 2014-05-27 ((2456805j,0s,0n),+0s,2299161j)>, "Christopher Lloyd"=>#<Date: 2014-10-22 ((2456953j,0s,0n),+0s,2299161j)>, "Christopher Pine"=>#<Date: 2014-08-03 ((2456873j,0s,0n),+0s,2299161j)>}
I would write this
File.open 'birthdays.txt', 'r' do |f|
f.read.each_line do |line|
name, date = line.split(/,/, 2)
birth_dates[name] = date.chomp
birth_dates.each_key { |a| birth_dates[a].strip! }
end
as below:
File.open 'birthdays.txt', 'r' do |f|
f.read.each_line do |line|
name, date = line.split(/,/, 2)
birth_dates[name] = date.chomp.strip
end
end
or
birth_dates = File.readlines('birthdays.txt').with_object({}) do |line,hsh|
name, date = line.split(/,/, 2)
hsh[name] = date.chomp.strip
end
Related
I am scraping the website https://www.bananatic.com/de/forum/games/. I want to extract only the year of the dates.
require 'nokogiri'
require 'open-uri'
require 'pp'
unless File.readable?('data.html')
url = 'https://www.bananatic.com/de/forum/games/'
data = URI.open(url).read
File.open('data.html', 'wb') { |f| f << data }
end
data = File.read('data.html')
document = Nokogiri::HTML(data)
links3 = document.css('.topics ul li div')
re = links3.map do |lk3|
name = lk3.css('.name').children.text.strip.split("\n")[2]
end
date = ' '
size_dates = re.length
(0..size_dates).each do |i|
unless i.nil?
date = re[i]
print date
end
end
As a result of the execution I get dates in what appears to be a String with the following format:
day .month.year, hour:minutes
But I only need the year I have made a split but I get an error.
Your issue is that if you look at the output from this block
re = links3.map do |lk3|
lk3.css('.name').children.text.strip.split("\n")[2]
end
You will see:
[" 07.08.2016, 13:47", nil, nil, nil, nil, " 06.08.2016, 9:24", nil, nil, nil, nil,...]
So you could solve your immediate issue by just adding .compact to the end or switching map to filter_map.
That being said here is another way to solve your issue:
You can get just the year from that text on that page using the following:
require 'nokogiri'
require 'open-uri'
url = "https://www.bananatic.com/de/forum/games/"
doc = Nokogiri::HTML(URI.open(url))
doc
.xpath('//div[#class="name"]/text()[string-length(normalize-space(.)) > 0]')
.map {|node| node.to_s[/\d{4}/]}
#=> ["2016", "2016", "2022", "2022", "2022", "2021", "2022", "2017", "2022", "2021", "2019", "2016", "2021", "2021", "2021", "2021", "2020", "2021", "2017", "2021"]
The 2 parts are:
//div[#class="name"]/text()[string-length(normalize-space(.)) > 0] - the XPath which finds all divs with the class "name" and then pulls the non zero length (trimmed of white space) text nodes.
.map {|node| node.to_s[/\d{4}/]} - map these into an array by slicing the String based on a regex for 4 contiguous digits.
If you would like the XPath to be as specific as your post you can use:
'//div[#class="topics"]/ul/li//div[#class="name"]/text()[string-length(normalize-space(.)) > 0]'
You could use REGEX to get only the year after having the list.
Of course, if what you showing is the pattern. Will work. Years would be the only one with 4 straight digits.
Example:
17.01.2023, 17:40
this \b\d{4}\b will result in 2023.
I have a string that looks like this: log/archive/2016-12-21.zip, and I need to extract the date part.
So far I have tried these solutions:
1) ["log/archive/2016-12-21.zip"].map{|i|i[/\d{4}-\d{2}-\d{2}/]}.first
2) "log/archive/2016-12-21.zip".to_date
3) "log/archive/2016-12-21.zip".split("/").last.split(".").first
Is there a better way of doing this?
You can use File.basename passing the extension:
File.basename("log/archive/2016-12-21.zip", ".zip")
# => "2016-12-21"
If you want the value to be a Date, simply use Date.parse to convert the string into a `Date.
require 'date'
Date.parse(File.basename("log/archive/2016-12-21.zip", ".zip"))
require 'date'
def pull_dates(str)
str.split(/[\/.]/).map { |s| Date.strptime(s, '%Y-%m-%d') rescue nil }.compact
end
pull_dates "log/archive/2016-12-21.zip"
#=> [#<Date: 2016-12-21 ((2457744j,0s,0n),+0s,2299161j)>]
pull_dates "log/2016-12-21/archive.zip"
#=> [#<Date: 2016-12-21 ((2457744j,0s,0n),+0s,2299161j)>]
pull_dates "log/2016-12-21/2016-12-22.zip"
#=> [#<Date: 2016-12-21 ((2457744j,0s,0n),+0s,2299161j)>,
# #<Date: 2016-12-22 ((2457745j,0s,0n),+0s,2299161j)>]
pull_dates "log/2016-12-21/2016-12-32.zip"
#=> [#<Date: 2016-12-21 ((2457744j,0s,0n),+0s,2299161j)>]
pull_dates "log/archive/2016A-12-21.zip"
#=> []
pull_dates "log/archive/2016/12/21.zip"
#=> []
If you just want the date string, rather than the date object, change the method as follows.
def pull_dates(str)
str.split(/[\/.]/).
each_with_object([]) { |s,a|
a << s if (Date.strptime(s, '%Y-%m-%d') rescue nil)}
end
pull_dates "log/archive/2016-12-21.zip"
#=> ["2016-12-21"]
This regex should cover most cases. It allows an optional non-digit between year, month and day :
require 'date'
def extract_date(filename)
if filename =~ /((?:19|20)\d{2})\D?(\d{2})\D?(\d{2})/ then
year, month, day = $1.to_i, $2.to_i, $3.to_i
# Do something with year, month, day, or just leave it like this to return an array : [2016, 12, 21]
# Date.new(year, month, day)
end
end
p extract_date("log/archive/2016-12-21.zip")
p extract_date("log/archive/2016.12.21.zip")
p extract_date("log/archive/2016:12:21.zip")
p extract_date("log/archive/2016_12_21.zip")
p extract_date("log/archive/20161221.zip")
p extract_date("log/archive/2016/12/21.zip")
p extract_date("log/archive/2016/12/21")
#=> Every example returns [2016, 12, 21]
Please try this
"log/archive/2016-12-21.zip".scan(/\d{4}-\d{2}-\d{2}/).pop
=> "2016-12-21"
If the date format is invalid, it will return nil.
Example:-
"log/archive/20-12-21.zip".scan(/\d{4}-\d{2}-\d{2}/).pop
^^
=> nil
Hope it helps.
I have a few strings that I am retrieving from a file birthdays.txt. An example of a string is below:
Christopher Alexander, Oct 4, 1936
I would like to separate the strings and let variable name be a hash key and birthdate the hash value. Here is my code:
birthdays = {}
File.read('birthdays.txt').each_line do |line|
line = line.chomp
name, birthdate = line.split(/\s*,\s*/).first
birthdays = {"#{name}" => "#{birthdate}"}
puts birthdays
end
I managed to assign name to the key. However, birthdate returns "".
File.new('birthdays.txt').each.with_object({}) do
|line, birthdays|
birthdays.store(*line.chomp.split(/\s*,\s*/, 2))
puts birthdays
end
I feel like some of the other solutions are overthinking this a bit. All you need to do is split each line into two parts, the part before the first comma and the part after, which you can do with line.split(/,\s*/, 2), then call to_h on the resulting array of arrays:
data = <<END
Christopher Alexander, Oct 4, 1936
Winston Churchill, Nov 30, 1874
Max Headroom, Apr 4, 1985
END
data.each_line.map do |line|
line.chomp.split(/,\s*/, 2)
end.to_h
# => { "Christopher Alexander" => "Oct 4, 1936",
# "Winston Churchill" => "Nov 30, 1874",
# "Max Headroom" => "April 4, 1985" }
(You will, of course, want to replace data with your File object.)
birthdays = Hash.new
File.read('birthdays.txt').each_line do |line|
line = line.chomp
name, birthdate = line.split(/\s*,\s*/, 2)
birthdays[name]= birthdate
puts birthdays
end
Using #Jordan's data:
data.each_line.with_object({}) do |line, h|
name, _, bdate = line.chomp.partition(/,\s*/)
h[name] = bdate
end
#=> {"Christopher Alexander"=>"Oct 4, 1936",
# "Winston Churchill"=>"Nov 30, 1874",
# "Max Headroom"=>"Apr 4, 1985"}
I'm creating a small app based on the conditions of the results of the last game played, or the last row with game data (win/lose and game number).
My issue is accessing the first column of the last row (most recent game played). How is that accomplished?
require 'open-uri'
class BrooklynPizzaController < ApplicationController
def index
# URL for dynamic content
url = "http://www.basketball-reference.com/teams/BRK/2015_games.html"
# Open URL using nokogiri
doc = Nokogiri::HTML(open(url))
# Scrape result from Web site
#result = doc.css("#teams_games").xpath("//table/tbody/tr/td[8]/text()")
# IN PROGRESS - Get date of last game played
#result_date = doc.xpath('//table/tbody/tr/td[2]/a/text()') do |link|
#result_date[link.text.strip] = link['a']
end
###############################################################
# IN PROGRESS - Get number of last game played from 1st column
# doc.xpath('//table/tbody/tr/td[1]/text()') do |game|
# last_game_number =
# end
################################################################
# #result_date = doc.css("#teams_games").xpath("//table/tbody/tr/td[2]/text()")
# Set date to current
#date = Date.today
# Get date of last game played
if (#result.last.next == nil)
flag = doc.xpath("//table/tbody/tr[#{#result}]")
#result_date = doc.xpath("//table/tbody/tr#{flag}/td[2]/a/text()")
end
end
end
Please let me know what lack of information I'm giving you, because I feel like I've left out some things.
To get the row you would do this:
win_loss_tds = doc.css("#teams_games tbody tr td:nth-child(8):not(:empty)").last
last_win_loss_row = win_loss_tds.last.parent
There's undoubtedly a way to do that in a single XPath expression, but I'll leave that as an exercise to the reader since I don't care for XPath.
To get the game number from the first column you would do this:
game_num_col = last_win_loss_row.at("td:first-child")
game_num = game_num_col.text.to_i
# => 82
And to get the date from the second column:
date_col = last_win_loss_row.at("td:nth-child(2)") # XPath: td[2]
date = DateTime.parse(date_col.text)
# => 2015-04-15T00:00:00+00:00
If you want date and time, you could do this:
time_col = last_win_loss_row.at("td:nth-child(3)")
date_time = DateTime.parse("#{date_col.text} #{time_col.text}")
# => 2015-04-15T08:00:00-03:00
Well, I'd do this:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.basketball-reference.com/teams/BRK/2015_games.html"))
latest_score_row = doc.search('//tr/td/a[contains(.,"Box Score")]/../..').last
latest_text = latest_score_row.search('td').map(&:text)
# => ["13",
# "Sat, Nov 22, 2014",
# "8:30p EST",
# "",
# "Box Score",
# "#",
# "San Antonio Spurs",
# "L",
# "",
# "87",
# "99",
# "5",
# "8",
# "L 1",
# ""]
But YMMV.
How does it work? Easy. It looks for <a> nodes in the page containing "Box Score", then, for each one found, backs up two levels to the <tr> node and returns an array to Nokogiri/Ruby. last takes the last one found.
Then it's just a matter of looking in that row for the <td> nodes and grabbing their text.
The time stamp is then a matter of pulling the date and time from the array, doing a tiny bit of massaging of the "am/pm" and letting Ruby build an object:
latest_time = Time.strptime(
[
latest_text[1], # => "Sat, Nov 22, 2014"
latest_text[2].sub(/([ap])/, '\1m') # => "8:30pm EST"
].join(' '), # => "Sat, Nov 22, 2014 8:30pm EST"
'%a, %b %d, %Y %H:%M%P %Z' # => "%a, %b %d, %Y %H:%M%P %Z"
) # => 2014-11-22 18:30:00 -0700
This is home work so I would prefer not to put up my code. I have 2 parallel arrays, 1.names 2. ages. The idea is to puts all ages less than 21. I can do this. The problem is that when I puts "#{count}. #{names[count]}, #{ages[count]}" <---The beginning count prints out the index number or position of element in array. Obviously what I want is for it to start at 1. if there are three names...
name, age
name, age
name, age
NOT
5, name, age
6, name, age
I am using a while loop with an if statement. I don't need code, just would like some feedback to trigger more ideas. Thanks for your time, much appreciated.
names[name1, name2, name3]
ages[age1, age2, age3]
#view all younger than 21
count = 0
while count < names.length
if ages[count] < 21
puts "#{count}. #{names[count]}, #{ages[count]}" #works
end
count += 1
end
pause
You shouldn't have "parallel arrays" in the first place! Data that belongs together should be manipulated together, not separately.
Instead of something like
names = %w[john bob alice liz]
ages = [16, 22, 18, 23 ]
You could, for example, have a map (called Hash in Ruby):
people = { 'john' => 16, 'bob' => 22, 'alice' => 18, 'liz' => 23 }
Then you would have something like:
puts people.select {|_name, age| age > 21 }.map.
with_index(1) {|(name, age), i| "#{i}. #{name}, #{age}" }
# 1. bob, 22
# 2. liz, 23
If you have no control over the creation of those parallel arrays, then it is still better to convert them to a sensible data structure first, and avoid the pain of having to juggle them in your algorithm:
people = Hash[names.zip(ages)]
Even better yet: you should have Person objects. After all, Ruby is object-oriented, not array-oriented or hash-oriented:
class Person < Struct.new(:name, :age)
def to_s
"#{name}, #{age}"
end
def adult?
age > 21
end
end
people = [
Person.new('john', 16),
Person.new('bob', 22),
Person.new('alice', 18),
Person.new('liz', 23)]
puts people.select(&:adult?).map.with_index(1) {|p, i| "#{i}. #{p}" }
Again, if you don't have control of the creation of those two parallel arrays, you can still convert them first:
people = names.zip(ages).map {|name, age| Person.new(name, age) }