Get lines added/deleted for list of pull requests - ruby

Assume I have a list of pull request IDs, such as in this gist.
If I simply want to have two variables for each ID: "lines added" and "lines deleted". How can I use octokit to get these variables for each pull request?
I'd imagine I'd start like this in ruby:
require 'octokit'
require 'csv'
list = [2825, 2119, 2629]
output = []
for id in list
output.push(Octokit.pull_request('rubinius/rubinius', id, options = {}))
end
begin
file = File.open("/Users/Username/Desktop/pr_mining_output.txt", "w")
file.write(output)
rescue IOError => e
#some error occur, dir not writable etc.
ensure
file.close unless file == nil
end
But this seems to simply overwrite the file and just give me one result instead of 3 (or however many are in the list object. How can I make it give me the data for all 3?

require 'octokit'
require 'csv'
client = Octokit::Client.new :login => 'mylogin', :password => 'mypass'
repo = 'rubinius/rubinius'
numbers = [2825, 2119, 2629]
CSV.open('results.csv', 'w') do |csv|
for number in numbers
begin
pull = client.pull_request(repo, number)
csv << [pull.number, pull.additions, pull.deletions]
rescue Octokit::NotFound
end
end
end

require 'octokit'
require 'csv'
client = Octokit::Client.new :login => 'username', :password => 'password'
repo = 'rubinius/rubinius'
numbers = CSV.read('/Users/User/Downloads/numbers.csv').flatten
CSV.open('results.csv', 'w') do |csv|
for number in numbers
begin
pull = client.pull_request(repo, number)
csv << [pull.number, pull.additions, pull.deletions]
rescue
csv << [number, 0, 0]
next
end
end
end

Related

Ruby CSV: Comparison of columns (from two csvs), write new column in one

I've searched and haven't found a method for this particular conundrum. I have two CSV files of data that sometimes relate to the same thing. Here's an example:
CSV1 (500 lines):
date,reference,amount,type
10/13/2015,,1510.40,sale
10/13/2015,,312.90,sale
10/14/2015,,928.50,sale
10/15/2015,,820.25,sale
10/12/2015,,702.70,credit
CSV2 (20000 lines):
reference,date,amount
243534985,10/13/2015,312.90
345893745,10/15/2015,820.25
086234523,10/14/2015,928.50
458235832,10/13/2015,1510.40
My goal is to match the date and amount from CSV2 with the date and amount in CSV1, and write the reference from CSV2 to the reference column in the corresponding row.
This is a simplified view, as CSV2 actually contains many many more columns - these are just the relevant ones, so ideally I'd like to refer to them by header name or maybe index somehow?
Here's what I've attempted, but I'm a bit stuck.
require 'csv'
data1 = {}
data2 = {}
CSV.foreach("data1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data1[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
CSV.foreach("data2.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data2[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
data1.each do |data1_row|
data2.each do |data2_row|
if (data1_row['comparitive'] == data2_row['comparitive'])
puts data1_row['identifier'] + data2_row['column_thats_important_and_wanted']
end
end
end
Result:
22:in `[]': no implicit conversion of String into Integer (TypeError)
I've also tried:
CSV.foreach('data2.csv') do |data2|
CSV.foreach('data1.csv') do |data1|
if (data1[3] == data2[4])
data1[1] << data2[1]
puts "Change made!"
else
puts "nothing changed."
end
end
end
This however did not match anything inside the if statement, so perhaps not the right approach?
The headers method should help you match columns--from there it's a matter of parsing and writing the modified data back out to a file.
Solved.
data1 = CSV.read('data1.csv')
data2 = CSV.read('data2.csv')
data2.each do |data2|
data1.each do |data1|
if (data1[5] == data2[4])
data1[1] = data2[1]
puts "Change made!"
puts data1
end
end
end
File.open('referenced.csv','w'){ |f| f << data1.map(&:to_csv).join("")}

How to use CSV.open and CSV.foreach methods to convert specific data in a csv file?

The Old.csv file contains these headers, "article_category_id", "articleID", "timestamp", "udid", but some of the values in those columns are strings. So, I am trying to convert them to integers and store in another CSV file, New.csv. This is my code:
require 'csv'
require 'time'
CSV.foreach('New.csv', "wb", :write_headers=> true, :headers =>["article_category_id", "articleID", "timestamp", "udid"]) do |csv|
CSV.open('Old.csv', :headers=>true) do |row|
csv['article_category_id']=row['article_category_id'].to_i
csv['articleID']=row['articleID'].to_i
csv['timestamp'] = row['timestamp'].to_time.to_i unless row['timestamp'].nil?
unless udids.include?(row['udid'])
udids << row['udid']
end
csv['udid'] = udids.index(row['udid']) + 1
csv<<row
end
end
But, I am getting the following error: in 'foreach': ruby wrong number of arguments (3 for 1..2) (ArgumentError).
When I change the foreach to open, I get the following error: undefined method '[]' for #<CSV:0x36e0298> (NoMethodError). Why is that? And how can I resolve it? Thanks.
CSV#foreach does not accept file access rights as second parameter:
CSV.open('New.csv', :headers=>true) do |csv|
CSV.foreach('Old.csv',
:write_headers => true,
:headers => ["article_category_id", "articleID", "timestamp", "udid"]
) do |row|
row['article_category_id'] = row['article_category_id'].to_i
...
csv << row
end
end
CSV#open should be placed before foreach. You are to iterate the old one and produce the new one. Inside the loop you should change row and than append it to the output.
You can refer my code:
require 'csv'
require 'time'
CSV.open('New.csv', "wb") do |csv|
csv << ["article_category_id", "articleID", "timestamp", "udid"]
CSV.foreach('Old.csv', :headers=>true) do |row|
array = []
article_category_id=row['article_category_id'].to_i
articleID=row['articleID'].to_i
timestamp = row['timestamp'].to_i unless row['timestamp'].nil?
unless udids.include?(row['udid'])
udids << row['udid']
end
udid = udids.index(row['udid']) + 1
array << [article_category_id, articleID, timestamp, udid]
csv<<array
end
end
The problem with Vinh answer is that at the end array variable is an array which has array inside.
So what is inserted indo CVS looks like
[[article_category_id, articleID, timestamp, udid]]
And that is why you get results in double quotes.
Please try something like this:
require 'csv'
require 'time'
CSV.open('New.csv', "wb") do |csv|
csv << ["article_category_id", "articleID", "timestamp", "udid"]
CSV.foreach('Old.csv', :headers=>true) do |row|
article_category_id = row['article_category_id'].to_i
articleID = row['articleID'].to_i
timestamp = row['timestamp'].to_i unless row['timestamp'].nil?
unless udids.include?(row['udid'])
udids << row['udid']
end
udid = udids.index(row['udid']) + 1
output_row = [article_category_id, articleID, timestamp, udid]
csv << output_row
end
end

Moving forward in a for loop despite an error

I have this code:
require 'octokit'
require 'csv'
client = Octokit::Client.new :login => 'github_username', :password => 'github_password'
repo = 'rubinius/rubinius'
numbers = CSV.read('/Users/Name/Downloads/numbers.csv').flatten
# at this point, essentially numbers = [642, 630, 623, 643, 626]
CSV.open('results.csv', 'w') do |csv|
for number in numbers
begin
pull = client.pull_request(repo, number)
csv << [pull.number, pull.additions, pull.deletions]
rescue
next
end
end
end
However, at times the client.pull_request encounters a 404 and then jumps over and goes to the next. However, it still needs to print the number in the numbers array, and then put a blank or zero for pull.additions and pull.deletions and then move on to the next item in the array, thus producing something like:
pull.number pull.additions pull.deletions
642, 12, 3
630, ,
623, 15, 23
...
How can this be done?
I have removed the for loop as it is not rubyish in nature, the below should work
require 'octokit'
require 'csv'
client = Octokit::Client.new :login => 'github_username', :password => 'github_password'
repo = 'rubinius/rubinius'
numbers = CSV.read('/Users/Name/Downloads/numbers.csv').flatten
# at this point, essentially numbers = [642, 630, 623, 643, 626]
CSV.open('results.csv', 'w') do |csv|
numbers.each do |number|
begin
pull = client.pull_request(repo, number)
csv << [pull.number, pull.additions, pull.deletions]
rescue
csv << [0,0,0]
next
end
end
end
Have you tried using a begin/rescue/ensure such that the rescue/ensure code will set the pull variable appropriately? See https://stackoverflow.com/a/2192010/832648 for examples.

Export content of a SQLite3 table in CSV

I have a Ruby script that generates a SQLite3 database.
I want to be able to generate an "output.csv" file containing one of the database tables.
Is there a way to handle that in Ruby?
It is easy with Sequel and to_csv:
require 'sequel'
DB = Sequel.sqlite
# since Sequel 3.48.0 to_csv is deprecated,
# we must load the to_csv feature via a extension
DB.extension(:sequel_3_dataset_methods) #define to_csv
DB.create_table(:test){
Fixnum :one
Fixnum :two
Fixnum :three
}
#Prepare some test data
5.times{|i|
DB[:test].insert(i,i*2,i*3)
}
File.open('test.csv', 'w'){|f|
f << DB[:test].to_csv
}
The result is:
one, two, three
0, 0, 0
1, 2, 3
2, 4, 6
3, 6, 9
4, 8, 12
In my test I had problems with line ends, so I needed an additional gsub:
File.open('test.csv', 'w'){|f|
f << DB[:test].to_csv.gsub("\r\n","\n")
}
If you want the export without the header line, use to_csv(false)
Remarks:
.to_csv is deprecated since Sequel 3.48.0 (2013-06-01).
You may use an old version with gem 'sequel', '< 3.48.0' or load the extension sequel_3_dataset_methods).
To get support for other seperators and other CSV-features you may use a combination of Sequel and CSV:
require 'sequel'
require 'csv'
#Build test data
DB = Sequel.sqlite
DB.create_table(:test){
Fixnum :one
Fixnum :two
Fixnum :three
String :four
}
#Prepare some test data
5.times{|i|
DB[:test].insert(i,i*2,i*3, 'test, no %i' % i)
}
#Build csv-file
File.open('test.csv', 'w'){|f|
DB[:test].each{|data|
f << data.values.to_csv(:col_sep=>';')
}
}
Result:
0;0;0;"test, no 0"
1;2;3;"test, no 1"
2;4;6;"test, no 2"
3;6;9;"test, no 3"
4;8;12;"test, no 4"
As an alternative you may patch Sequel::Dataset (modified code from a post of marcalc at Github):
class Sequel::Dataset
require 'csv'
#
#Options:
#* include_column_titles: true/false. default true
#* Other options are forwarded to CSV.generate
def to_csv(options={})
include_column_titles = options.delete(:include_column_titles){true} #default: true
n = naked
cols = n.columns
csv_string = CSV.generate(options) do |csv|
csv << cols if include_column_titles
n.each{|r| csv << cols.collect{|c| r[c] } }
end
csv_string
end
end
# Assume that model is an activerecord model
#secrets = Model.all
#csv = CSV.generate do |csv|
#secrets.each { |secret|
csv << ["#{secret.attr1.to_s}", "#{secret.attr2.to_s"] # and so on till your row is finished
}
end
render :text => #csv, :content_type => 'application/csv'
If you have further problems, leave a comment.
Adding an update for 2020. Since Sequel v5, sequel_3_dataset_methods has been completely removed and is unavailable. As such, generating a CSV as a Database extension has also been completely removed.
It appears the current "best practice" is to add the csv_serializer plugin to a Sequel::Model class. There is a catch here though, that the Sequel::Model class you define must be defined after the call to Sequel.connect. The act of subclassing Sequel::Model invokes a read from the database.
This prevents a typical workflow of pre-defining your classes as part of any generic Gem.
According to the Sequel author, the preferred way to do this is through MyClass = Class.new(Sequel::Model(:tablename)) in-line, or otherwise only calling require within your method definitions.
Making no promises about efficiency, here is a code sample that defines 'best practice'
require 'sequel'
require 'csv'
module SequelTsv
class One
def self.main
db = Sequel.connect('sqlite://blog.db') # requires sqlite3
db.create_table :items do
primary_key :id
String :name
Float :price
end
items = db[:items] # Create a dataset
items.insert(:name => 'abc', :price => rand * 100)
items.insert(:name => 'def', :price => rand * 100)
items.insert(:name => 'ghi', :price => rand * 100)
item_class = Class.new(Sequel::Model(:items))
item_class.class_eval do
plugin :csv_serializer
end
tsv = item_class.to_csv(write_headers: true, col_sep:"\t")
CSV.open('output.tsv', 'w') do |csv|
CSV.parse(tsv) do | c |
csv << c
end
end
end
end
end
SequelTsv::One.main
output:
id name price
1 abc 39.307899453608364
2 def 99.28471503410731
3 ghi 58.0295131255661

Ruby: Reading contents of a xls file and getting each cells information

This is the link of a XLS file. I am trying to use Spreadsheet gem to extract the contents of the XLS file. In particular, I want to collect all the column headers like (Year, Gross National Product etc.). But, the issue is they are not in the same row. For example, Gross National Income comprised of three rows. I also want to know how many row cells are merged to make the cell 'Year'.
I have started writing the program and I am upto this:
require 'rubygems'
require 'open-uri'
require 'spreadsheet'
rows = Array.new
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
if row.is_a? Spreadsheet::Formula
# puts row.value
rows << row.value
else
# puts row
rows << row
end
# puts row.value
end
But, now I am stuck and really need some guideline to proceed. Any kind of help is well appreciated.
require 'rubygems'
require 'open-uri'
require 'spreadsheet'
rows = Array.new
temp_rows = Array.new
column_headers = Array.new
index = 0
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
rows << row.to_a
end
rows.each_with_index do |row,ind|
if row[0]=="Year"
index = ind
break
end
end
(index..7).each do |i|
# puts rows[i].inspect
if rows[i][0] =~ /[0-9]/
break
else
temp_rows << rows[i]
end
end
col_size = temp_rows[0].size
# puts temp_rows.inspect
col_size.times do |c|
temp_str = ""
temp_rows.each do |row|
temp_str +=' '+ row[c] unless row[c].nil?
end
# puts temp_str.inspect
column_headers << temp_str unless temp_str.nil?
end
puts 'Column Headers of this xls file are : '
# puts column_headers.inspect
column_headers.each do |col|
puts col.strip.inspect if col.length >1
end

Resources