How to download Github Raw CSV file using Ruby - ruby

I'm trying to grab an updated CSV file, COVID-19, that's posted on GitHub, but I keep getting an error that it's not there. It's a file that's constantly updated so I want to grab it at the source, which is GitHub.
COVID-19 Time Series is the third item on the page.
I tried the raw file URL, the CSV page URL, and GitHub consistently tells me that there is "no such file or directory".
Here's my code:
require 'open-uri'
require 'csv'
covids = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
puts File.exist?(covids)
keys = CSV.open(covids, &:readline)
How can I reference this file? I know I am logged in, but Ruby should be able to see those file paths.

A URL is not a file, so you can't open it with CSV.open neither use it in a File.exist? call. I see you've already included open-uri in your code, so the quick way to solve this would be to download the file using open and pass it to CSV.open:
keys = CSV.open(open(covids), &:readline)
puts keys

The selected answer has some problems:
OpenURI's open is deprecated. Instead use URI.open:
pry(main)> open(covids)
(pry):9: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
CSV.open, while it works, is counter to the signature of the method, which wants a filename, not an IO object. It's conceivable that relying on CSV.open to continue taking an IO object will break in the future if they fix this behavior.
Instead, the CSV documentation's first example recommends:
csv = CSV.new(string_or_io, **options)
# Reading: IO object should be open for read
csv.read # => array of rows
# or
csv.each do |row|
# ...
end
...
foreach is the form of each I'd use because that fits my brain better, YMMV:
CSV.foreach(URI.open(covids))
as a starting point. Here's an example looking at the first record in the file:
require 'open-uri'
require 'csv'
covids = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
CSV.foreach(URI.open(covids)).first
# => ["Province/State",
# "Country/Region",
# "Lat",
# "Long",
# "1/22/20",
# "1/23/20",
# "1/24/20",
# "1/25/20",
# "1/26/20",
# "1/27/20",
# "1/28/20",
# "1/29/20",
# "1/30/20",
# "1/31/20",
# "2/1/20",
# "2/2/20",
# "2/3/20",
# "2/4/20",
# "2/5/20",
# "2/6/20",
# "2/7/20",
# "2/8/20",
# "2/9/20",
# "2/10/20",
# "2/11/20",
# "2/12/20",
# "2/13/20",
# "2/14/20",
# "2/15/20",
# "2/16/20",
# "2/17/20",
# "2/18/20",
# "2/19/20",
# "2/20/20",
# "2/21/20",
# "2/22/20",
# "2/23/20",
# "2/24/20",
# "2/25/20",
# "2/26/20",
# "2/27/20",
# "2/28/20",
# "2/29/20",
# "3/1/20",
# "3/2/20",
# "3/3/20",
# "3/4/20",
# "3/5/20",
# "3/6/20",
# "3/7/20",
# "3/8/20",
# "3/9/20",
# "3/10/20"]
While OpenURI is convenient, it's not the most full-featured of the Ruby HTTP clients. I'd recommend working with something at the top of the Ruby HTTP client list.
Also, write your code carefully so you don't beat your network or GitHub's following best practices for using HEAD requests to check the last time the file was updated; Don't repeatedly GET (download) a file that hasn't been updated because that's just bad network manners.
At this point you'd be prepared to parse the file, saving the information to disk or reusing it for something else. I'd recommend dumping it into a database for easier reuse using something like Sequel, which makes it trivial to build and access the schema and data of SQLite writing to a disk-based DB, or PostgreSQL or MySQL for more full-featured DBMs.

Related

How do I share object from main file to supporting file in ruby?

I have something similar.
# MAIN.RB
require 'sockets'
require_relative 'replies.rb'
hostname = 'localhost'
port = 6500
s = TCPSocket.open(hostname, port)
$connected = 0
while line = s.gets # Read lines from the socket
#DO A BUNCH OF STUFF
if line == "Hi"
reply line
end
end
s.close
Then I have the reply function in a secondary file.
# REPLIES.RB
def reply(input)
if input == "Hi"
s.write("Hello my friend.\n"
end
end
However calling on the object s from the second file does not seem to work. How would I go about making this work. I'm new to Ruby. I've searched google for the answer, but the only results I have found is with sharing variables across files. I could always do a return "Hello my friend.\n", but I rather be able to write to the socket object directly from the function in REPLIES.rb
Remember that variables are strictly local unless you expressly pass them in. This means s only exists in the main context. You can fix this by passing it in:
reply(s, line)
And on the receiving side:
def reply(s, input)
# ...
end
I'd strongly encourage you to try and indent things consistently here, this code is really out of sorts, and avoid using global variables like $connected. Using a simple self-contained class you could clean up this code considerably.
Also, don't add .rb extensions when calling require. It's implied.

How to create, read and transform an XML file with Ruby

I am downloading an XML record from Musicbrainz.org, applying an XSLT transformation and outputting a new and different XML record.
I am running into one issue that I wonder if it is a limitation with my approach, XSLT transformations or applying Ruby to text.
I download the record:
require 'open-uri'
mb_metadata = open('http://musicbrainz.org/ws/2/release/?query=barcode:744861082927', 'User-Agent' => 'MarcBrainz marc4brainz#gmail.com').read
File.open('mb_record.xml', 'w').write(mb_metadata)
This works fine.
Then I want to transform that record. First I tried using Nokogiri:
# mb_metadata to transformed record
mb_record = Nokogiri::XML(File.read('mb_record.xml'))
#if we have the xslt document locally this introduces it
template = Nokogiri::XSLT(File.read('mb_to_marc.xsl'))
# this transforms the input document with the template.xslt
puts template.transform(mb_record)
If I run this on its own it works, however if I download the record and then run this it doesn't, it produces a transformed record which just contains some inserts, no element from the original XML file is transformed.
So I thought this might be an issue with Nokogiri and then I tried using the Ruby/XSLT gem:
xslt = XML::XSLT.new()
xslt.xml = 'mb_record.xml'
xslt.xsl = 'mb_to_marc.xsl'
out = xslt.serve()
print out;
Again, if I'm running this on a local file it works, but if I download it and try to transform it it doesn't work - it produces the following error:
xslt.xml = 'mb_record.xml'
Both methods work fine if I just run them on a file which has been downloaded already.
So what's the issue? Is it a naming problem, an XSLT issue, or something else?
Here's the whole script:
#!/usr/bin/env ruby
# encoding: UTF-8
require 'rubygems' if RUBY_VERSION >= '1.9'
require 'pathname'
require 'httpclient'
require 'xml/xslt'
require 'nokogiri'
require 'open-uri'
# DOWNLOAD RECORD FROM MusicBrainz.org - this works
mb_metadata = open('http://musicbrainz.org/ws/2/release/?query=barcode:744861082927', 'User-Agent' => 'MarcBrainz marc4brainz#gmail.com').read
#puts record
File.open('mb_record.xml', 'w').write(mb_metadata)
# mb_metadata to transformed record - this works on a saved file but not if the file is created earlier in this file .
#
#mb_record = Nokogiri::XML(File.read('mb_record.xml'))
#if we have the xslt document locally this introduces it
#template = Nokogiri::XSLT(File.read('mb_to_marc.xsl'))
# this is supposed to transform the input document with the template.xslt
#puts template.transform(mb_record)
# TRYING ANOTHER TACK
# This works if acting on a saved file. i.e. if I comment out the nokogiri lines above and just run the lines below - to 'print out' the xml is correctly transfored by the xslt to produce more xml.
# I added 'sleep 3' to see if that would help but it doesn't make a difference.
xslt = XML::XSLT.new()
xslt.xml = 'mb_record.xml'
xslt.xsl = 'mb_to_marc.xsl'
out = xslt.serve()
print out;
File.open('mb_record.xml', 'w').write(mb_metadata)
is better written as
File.write('mb_record.xml', mb_metadata)
The first will result in a file that hasn't been closed, and possibly not flushed to the disk, which can mean the file has no contents, or only partial contents.
The second writes the file and immediately flushes and closes it.

Changing information in a CSV file

I'm trying to write a ruby script that will read through a CSV file and prepend information to certain cells (for instance adding a path to a file). I am able to open and mutate the text just fine, but am having issues writing back to the CSV without overriding everything. This is a sample of what I have so far:
CSV.foreach(path) { |row|
text = row[0].to_s
new_text = "test:#{text}"
}
I would like to add something within that block that would then write new_textback to the same reference cell(row) in the file. The only way I have to found to write to a file is
CSV.open(path, "wb") { |row|
row << new_text
}
But I think that is bad practice since you are reopening the file within the file block already. Is there a better way I could do this?
EX: I have a CSV file that looks something like:
file,destination
test.txt,A101
and need it to be:
file,destination
path/test.txt,id:A101
Hope that makes sense. Thanks in advance!
Depending on the size if the file, you might consider loading the contents of the file into a local variable and then manipulating that, overwriting the original file.
lines = CSV.read(path)
File.open(path, "wb") do |file|
lines.each do |line|
text = line[0].to_s
line[0] = "test:#{text}" # Replace this with your editing logic
file.write CSV.generate_line(line)
end
end
Alternately, if the file is big, you could write each modified line to a new file along the way and then replace the old file with the new one at the end.
Given that you don't appear to be doing anything that draws on CSV capabilities, I'd recommend using Ruby's "in-place" option variable $-i.
Some of the stats software I use wants just the data, and can't deal with a header line. Here's a script I wrote a while back to (appear to) strip the first line out of one or more data files specified on the command-line.
#! /usr/bin/env ruby -w
#
# User supplies the name of one or more files to be "stripped"
# on the command-line.
#
# This script ignores the first line of each file.
# Subsequent lines of the file are copied to the new version.
#
# The operation saves each original input file with a suffix of
# ".orig" and then operates in-place on the specified files.
$-i = ".orig" # specify backup suffix
oldfilename = ""
ARGF.each do |line|
if ARGF.filename == oldfilename # If it's an old file
puts line # copy lines through.
else # If it's a new file remember it
oldfilename = ARGF.filename # but don't copy the first line.
end
end
Obviously you'd want to change the puts line pass-through to whatever edit operations you want to perform.
I like this solution because even if you screw it up, you've preserved your original file as its original name with .orig (or whatever suffix you choose) appended.

How to write a file that is both valid ruby syntax and valid YAML syntax

In order to have only a single point of configuration for my app I need to make a YAML config file that is also valid ruby code. I.e. a mixed syntax file that can be parsed as YAML and parsed as ruby.
My application is a suite of processes managed by the god gem. I want to load a new group of maintained processes (watches) for each new configuration file.
God allows loading a new app.god (ruby) file with new watches defined, but I don't want an app.god and app.yml, just one file. Simplest might be to just have the app.god file and include the configuration within that, but I preferred the idea of a yml file that was also valid ruby code.
#I found this that might be helpful:
#This is a valid ruby and a valid YAML file
#Comments are the same in YAML and ruby
true ?true:
- <<YAML.to_i
# At this point in ruby it is the contents of a here doc (that will be
# converted to an integer and negated if true happens not to be true)
# In YAML it is a hash with the first entry having key "true ?true"
# representing a list containing the string "- <<YAML.to_i"
# If your YAML object should be a list not a hash you could remove the first line
any_valid_yaml: from here
a_list:
- or
- anything
- really
#Then mark the end of the YAML document with
---
#And YAML is done and ignores anything from here on
#Next terminate the ruby here document
YAML
#Now we're in ruby world
#this = "pure ruby"
def anything(ruby)
"here"
end

Discover the file ruby require method would load?

The require method in ruby will search the lib_path and load the first matching files found if needed. Is there anyway to print the path to the file which would be loaded. I'm looking for, ideally built-in, functionality similar to the which command in bash and hoping it can be that simple too. Thanks.
I don't know of a built-in functionality, but defining your own isn't hard. Here's a solution adapted from this question:
def which(string)
$:.each do |p|
if File.exist? File.join(p, string)
puts File.join(p, string)
break
end
end
end
which 'nokogiri'
#=> /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri
Explanation: $: is a pre-defined variable. It's an array of places to search for files you can load or require. The which method iterates through each path looking for the file you called it on. If it finds a match, it returns the file path.
I'm assuming you just want the output to be a single line showing the full filepath of the required file, like which. If you want to also see the files your required file will load itself, something like the solution in the linked question might be more appropriate:
module Kernel
def require_and_print(string)
$:.each do |p|
if File.exist? File.join(p, string)
puts File.join(p, string)
break
end
end
require_original(string)
end
alias_method :require_original, :require
alias_method :require, :require_and_print
end
require 'nokogiri'
#=> /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/rubygems-update-1.3.5/lib/rbconfig
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/pp
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/sax
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/node
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/xpath
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xslt
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/html
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/css
# /opt/local/lib/ruby1.9/1.9.1/racc/parser.rb
$ gem which filename # (no .rb suffix) is what I use...

Resources