Plotting Tweets from DB in Ruby, grouping by hour - ruby

Hey guys I've got a couple of issues with my code.
I was wondering that I am plotting
the results very ineffectively, since
the grouping by hour takes ages
the DB is very simple it contains the tweets, created date and username. It is fed by the twitter gardenhose.
Thanks for your help !
require 'rubygems'
require 'sequel'
require 'gnuplot'
DB = Sequel.sqlite("volcano.sqlite")
tweets = DB[:tweets]
def get_values(keyword,tweets)
my_tweets = tweets.filter(:text.like("%#{keyword}%"))
r = Hash.new
start = my_tweets.first[:created_at]
my_tweets.each do |t|
hour = ((t[:created_at]-start)/3600).round
r[hour] == nil ? r[hour] = 1 : r[hour] += 1
end
x = []
y = []
r.sort.each do |e|
x << e[0]
y << e[1]
end
[x,y]
end
keywords = ["iceland", "island", "vulkan", "volcano"]
values = {}
keywords.each do |k|
values[k] = get_values(k,tweets)
end
Gnuplot.open do |gp|
Gnuplot::Plot.new(gp) do |plot|
plot.terminal "png"
plot.output "volcano.png"
plot.data = []
values.each do |k,v|
plot.data << Gnuplot::DataSet.new([v[0],v[1]]){ |ds|
ds.with = "linespoints"
ds.title = k
}
end
end
end

This is one of those cases where it makes more sense to use SQL. I'd recommend doing something like what is described in this other grouping question and just modify it to use SQLite date functions instead of MySQL ones.

Related

Ruby Sinatra storing variables

In the code below, the initial get '/' contains a form, whose action is post '/'. when the user inputs a number, it should be converted to a variable that will be used to call the Game class, for which I have generated another action to reveal a new form at get '/game'. the variable generated in the post method is not being stored. how can I both store the variable created in post and then link into the get '/game' action?
require 'sinatra'
require 'sinatra/reloader'
##count = 5
Dict = File.open("enable.txt")
class Game
attr_accessor :letters, :number, :guess, :disp
##count = 5
def initialize (number)
letters = find(number)
end
def find (n)
words =[]
dictionary = File.read(Dict)
dictionary.scan(/\w+/).each {|word| words << word if word.length == n}
letters = words.sample.split("").to_a
letters
end
def counter
if letters.include?guess
correct = check_guess(guess, letters)
else
##count -= 1
end
end
end
get '/' do
erb :index
end
post '/' do
n = params['number'].to_i
#letters = Game.new(n)
redirect '/game'
end
get "/game" do
guess = params['guess']
letters = #letters
if guess != nil
correct = check_guess(guess, letters)
end
disp = display(letters, correct)
erb :game, :locals => {:letters => letters, :disp => disp}
end
def display(letters, correct)
line = "__"
d=[]
letters.each do |x|
if correct == nil
d << line
elsif correct.include?x
d << x
else
d << line
end
end
d.join(" ")
end
def check_guess(guess, letters)
correct = []
if guess != nil
if letters.include?guess
correct << guess
end
end
correct
end
You cannot do this:
#letters = Game.new(n)
each time you create a request, and new Request instance created and so the #letters attribute no longer exists.
It's the equivalent of
r = Request.new()
r.letters = Game.new()
r = Request.new()
r.letters # not defined anymore!!
You could achieve what you want using a class variable instead
##letters = Game.new(n)
Although this will become a nightmare when you have multiple users and will only work when you have a single ruby server process.
A more advanced approach would be to store params['number'] in a session cookie or in a database.

Merging Ranges using Sets - Error - Stack level too deep (SystemStackError)

I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.

Ruby how to merge two CSV files with slightly different headers

I have two CSV files with some common headers and others that only appear in one or in the other, for example:
# csv_1.csv
H1,H2,H3
V11,V22,V33
V14,V25,V35
# csv_2.csv
H1,H4
V1a,V4b
V1c,V4d
I would like to merge both and obtain a new CSV file that combines all the information for the previous CSV files. Injecting new columns when needed, and feeding the new cells with null values.
Result example:
H1,H2,H3,H4
V11,V22,V33,
V14,V25,V35,
V1a,,,V4b
V1c,,,V4d
Challenge accepted :)
#!/usr/bin/env ruby
require "csv"
module MergeCsv
class << self
def run(csv_paths)
csv_files = csv_paths.map { |p| CSV.read(p, headers: true) }
merge(csv_files)
end
private
def merge(csv_files)
headers = csv_files.flat_map(&:headers).uniq.sort
hash_array = csv_files.flat_map(&method(:csv_to_hash_array))
CSV.generate do |merged_csv|
merged_csv << headers
hash_array.each do |row|
merged_csv << row.values_at(*headers)
end
end
end
# Probably not the most performant way, but easy
def csv_to_hash_array(csv)
csv.to_a[1..-1].map { |row| csv.headers.zip(row).to_h }
end
end
end
if(ARGV.length == 0)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV)
I have the answer, I just wanted to help people that is looking for the same solution
require "csv"
module MergeCsv
def self.run(csv_1_path, csv_2_path)
merge(File.read(csv_1_path), File.read(csv_2_path))
end
def self.merge(csv_1, csv_2)
csv_1_table = CSV.parse(csv_1, :headers => true)
csv_2_table = CSV.parse(csv_2, :headers => true)
return csv_2_table.to_csv if csv_1_table.headers.empty?
return csv_1_table.to_csv if csv_2_table.headers.empty?
headers_in_1_not_in_2 = csv_1_table.headers - csv_2_table.headers
headers_in_1_not_in_2.each do |header_in_1_not_in_2|
csv_2_table[header_in_1_not_in_2] = nil
end
headers_in_2_not_in_1 = csv_2_table.headers - csv_1_table.headers
headers_in_2_not_in_1.each do |header_in_2_not_in_1|
csv_1_table[header_in_2_not_in_1] = nil
end
csv_2_table.each do |csv_2_row|
csv_1_table << csv_1_table.headers.map { |csv_1_header| csv_2_row[csv_1_header] }
end
csv_1_table.to_csv
end
end
if(ARGV.length != 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV[0], ARGV[1])
And execute it from the console this way:
$ ruby merge_csv.rb csv_1.csv csv_2.csv
Any other, maybe cleaner, solution is welcome.
Simplied first answer:
How to use it:
listPart_A = CSV.read(csv_path_A, headers:true)
listPart_B = CSV.read(csv_path_B, headers:true)
listPart_C = CSV.read(csv_path_C, headers:true)
list = merge(listPart_A,listPart_B,listPart_C)
Function:
def merge(*csvs)
headers = csvs.map {|csv| csv.headers }.flatten.compact.uniq.sort
csvs.flat_map(&method(:csv_to_hash_array))
end
def csv_to_hash_array(csv)
csv.to_a[1..-1].map do |row|
Hash[csv.headers.zip(row)]
end
end
I had to do something very similar
to merge n CSV files that the might share some of the columns but some may not
if you want to keep a structure and do it easily,
I think the best way is to convert to hash and then re-convert to CSV file
my solution:
#!/usr/bin/env ruby
require "csv"
def join_multiple_csv(csv_path_array)
return nil if csv_path_array.nil? or csv_path_array.empty?
f = CSV.parse(File.read(csv_path_array[0]), :headers => true)
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true)
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_string = CSV.generate do |csv|
csv << f_h.keys
(0..n_rows-1).each do |i|
row = []
f_h.each_key do |header|
row << f_h[header][i]
end
csv << row
end
end
return csv_string
end
if(ARGV.length < 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2> .. <file_path_csv_n>"
exit 1
end
csv_str = join_multiple_csv(ARGV)
f = File.open("results.csv", "w")
f.write(csv_str)
puts "CSV merge is done"

ruby range parts

I have a problem, ruby code:
def give_me_all_periods(period, paid_periods)
# Can you help me?
end
period = [1..10]
paid_periods = [1..2, 5..8]
give_me_all_periods(period, paid_periods).should == [1...2, 2...5, 5...8, 8...10]
def give_me_all_periods(period, paid_periods)
p = period | paid_periods
union = p.inject([]){|u,x| u = u|range_to_arr(x)}.sort
ranges =[]
union.each_cons(2){|a| ranges << Range.new(a[0],a[1]) }
ranges
end
def range_to_arr(r)
[r.first,r.last]
end

Recursively merge multidimensional arrays, hashes and symbols

I need a chunk of Ruby code to combine an array of contents like such:
[{:dim_location=>[{:dim_city=>:dim_state}]},
:dim_marital_status,
{:dim_location=>[:dim_zip, :dim_business]}]
into:
[{:dim_location => [:dim_business, {:dim_city=>:dim_state}, :dim_zip]},
:dim_marital_status]
It needs to support an arbitrary level of depth, though the depth will rarely be beyond 8 levels deep.
Revised after comment:
source = [{:dim_location=>[{:dim_city=>:dim_state}]}, :dim_marital_status, {:dim_location=>[:dim_zip, :dim_business]}]
expected = [{:dim_location => [:dim_business, {:dim_city=>:dim_state}, :dim_zip]}, :dim_marital_status]
source2 = [{:dim_location=>{:dim_city=>:dim_state}}, {:dim_location=>:dim_city}]
def merge_dim_locations(array)
return array unless array.is_a?(Array)
values = array.dup
dim_locations = values.select {|x| x.is_a?(Hash) && x.has_key?(:dim_location)}
old_index = values.index(dim_locations[0]) unless dim_locations.empty?
merged = dim_locations.inject({}) do |memo, obj|
values.delete(obj)
x = merge_dim_locations(obj[:dim_location])
if x.is_a?(Array)
memo[:dim_location] = (memo[:dim_location] || []) + x
else
memo[:dim_location] ||= []
memo[:dim_location] << x
end
memo
end
unless merged.empty?
values.insert(old_index, merged)
end
values
end
puts "source1:"
puts source.inspect
puts "result1:"
puts merge_dim_locations(source).inspect
puts "expected1:"
puts expected.inspect
puts "\nsource2:"
puts source2.inspect
puts "result2:"
puts merge_dim_locations(source2).inspect
I don't think there's enough detail in your question to give you a complete answer, but this might get you started:
class Hash
def recursive_merge!(other)
other.keys.each do |k|
if self[k].is_a?(Array) && other[k].is_a?(Array)
self[k] += other[k]
elsif self[k].is_a?(Hash) && other[k].is_a?(Hash)
self[k].recursive_merge!(other[k])
else
self[k] = other[k]
end
end
self
end
end

Resources