The example
require 'gnuplot'
require 'gnuplot/multiplot'
def sample
x = (0..50).collect { |v| v.to_f }
mult2 = x.map {|v| v * 2 }
squares = x.map {|v| v * 4 }
Gnuplot.open do |gp|
Gnuplot::Multiplot.new(gp, layout: [2,1]) do |mp|
Gnuplot::Plot.new(mp) { |plot| plot.data << Gnuplot::DataSet.new( [x, mult2] ) }
Gnuplot::Plot.new(mp) { |plot| plot.data << Gnuplot::DataSet.new( [x, squares] ) }
end
end
end
works pretty well. But how can I send this to a file instead of the screen? Where to put plot.terminal "png enhanced truecolor" and plot.output "data.png"?
Indeed, I don't even know where I should call #terminal and #output methods since the plot object are inside a multiplot block.
As a workaround, the following would work as expected.
Gnuplot.open do |gp|
...
end
The block parameter gp in this part is passed the IO object to send the command to gnuplot through the pipe. Thus, we can send commands ("set terminal", "set output") directly to gnuplot via gp.
Gnuplot.open do |gp|
gp << 'set terminal png enhanced truecolor' << "\n"
gp << 'set output "data.png"' << "\n"
Gnuplot::Multiplot.new(gp, layout: [2,1]) do |mp|
Gnuplot::Plot.new(mp) { |plot| plot.data << Gnuplot::DataSet.new( [x, mult2] ) }
Gnuplot::Plot.new(mp) { | plot| plot.data << Gnuplot::DataSet.new( [x, squares] ) }
end
end
I have a csv file "harvest.csv", one of the columns contains dates.
Here is what I came to (plot.rb):
require 'csv'
require 'gnuplot'
days = Array.new
mg = Array.new
csv = CSV.open("../data/harvest.csv", headers: :first_row, converters: :numeric)
csv.each do |row|
days << row[1]
mg << row[3]
end
dates = []
days.each {|n| dates << Date.strptime(n,"%Y-%m-%d")}
Gnuplot.open do |gp|
Gnuplot::Plot.new( gp ) do |plot|
plot.timefmt "'%Y%m%d'"
plot.title "Best Harvest Day"
plot.xlabel "Time"
**plot.xrange "[('2013-04-01'):('2013-06-01')]"**
plot.ylabel "Harvested"
plot.data << Gnuplot::DataSet.new( [dates,mg] ) do |ds|
ds.with = "linespoints"
ds.title = "Pollen harvested"
end
end
end
When I run plot.rb an error is raised:
line 735: Can't plot with an empty x range!
Should I convert [dates] to something else?
The format you're setting with plot.timefmt must match the one you're using in range. Right now the - are missing. Also, you need to set xdata to time to set datatype on the x axis to time.
Gnuplot::Plot.new(gp) do |plot|
plot.timefmt "'%Y-%m-%d'"
plot.title "Best Harvest Day"
plot.xlabel "Time"
plot.xdata "time"
plot.xrange '["2013-04-01":"2013-06-01"]'
plot.ylabel "Harvested"
plot.data << Gnuplot::DataSet.new([dates, mg]) do |ds|
ds.with = "linespoints"
ds.title = "Pollen harvested"
ds.using = "1:2"
end
end
I have a shop filled with 26597 unique products.
The data I use to import the products into the shop looks something like this:
{
"description":"AH Uien rood",
"category":"/Aardappel, groente, fruit/Kruiden, uien, knoflook/Uien/",
"brand":"AH"
}, {...}
530 of the 26597 products don't have the brand value. However, the brand name is present in the description. For the above example product, in "description":"AH Uien rood", AH is the brand name of. The brand name is always the first 1+ words in the description. But brand names vary in length and word size, and often have spaces in between. Therefore I cannot simply extract the first word from the description and assign it as the product brand name.
I figured I'd use Machine Learning to help me classify product brand names based on the description and category.
It's my first real experience with Machine Learning, and I decided to use the ai4r Ruby gem. It looks good, is well maintained and properly documented here.
For 530 products only 13 get sort of classified, the rest return the error:
Ai4r::Classifiers::ModelFailureError: There was not enough information during training to do a proper induction for the data element ...
Which I don't quite understand, the size of DATA_SET, which is used to train the model, is 25266.
This is what my code looks like:
require 'json'
require 'open-uri'
require 'csv'
require 'ai4r'
r = JSON.parse(open('http://goo.gl/2IHtVU') {|f| f.read }.force_encoding('UTF-8'))
def extract_categories(product)
a = product['category'].split('/')
a.delete('')
b = []
a.each { |category| b << category.gsub(',', ' -') }
c = b.join(', ')
end
nb = []
r.each {|p| nb << p if p['brand'].nil? }
DATA_LABELS = ["title", "category", "brand"]
DATA_SET = []
r.each {|pnb| DATA_SET << [pnb['description'], extract_categories(pnb), pnb['brand']] unless pnb['brand'].nil? || pnb['category'].nil? }
data_set = Ai4r::Data::DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS)
id3 = Ai4r::Classifiers::ID3.new.build(data_set)
classified = []
nb.each do |pnb|
begin
classified << id3.eval([ pnb['description'], extract_categories(pnb) ])
rescue => e
puts 'There was not enough information during training to do a proper induction for the data element, moving on...'
end
end
classified.size
# => 13
# Save DATA_SET to csv
# CSV.open('/data_set.csv','wb', :quote_char => '"', encoding: "UTF-8") do |csv|
# csv << DATA_LABELS
#
# DATA_SET.each do |data|
# csv << [data[0], data[1], data[2]]
# end
# end
#
# => https://gist.github.com/narzero/ba8c521a370326a57a68
What is a better way to classify the brand name of a product based on the description?
I would go for a Naive-Bayes classifier instead of a decision tree in this case. There is a gem for it. stuff-classifier
In the code below I trained your data set with the gem and classified 10 random entries. I used the description for training and not the categories. See how the performance is. Otherwise you can include the categories by combining the categories into the desciption but prepending the category token with something like cattt to distinguish the category tokens from the description.
require 'json'
require 'open-uri'
require 'stuff-classifier'
r = JSON.parse(open('data_file.json') {|f| f.read }.force_encoding('UTF-8'))
def extract_categories(product)
a = product['category'].split('/')
a.delete('')
b = []
a.each { |category| b << category.gsub(',', ' -') }
c = b.join(', ')
end
nb = []
r.each {|p| nb << p if p['brand'].nil? }
DATA_LABELS = ["title", "category", "brand"]
DATA_SET = []
r.each {|pnb| DATA_SET << [pnb['description'], extract_categories(pnb), pnb['brand']] unless pnb['brand'].nil? || pnb['category'].nil? }
cls = StuffClassifier::Bayes.new("Prodcut Label")
#train the classifier by feeding it the label and then the features
DATA_SET.each do |record|
begin
cls.train(record[2], record[0])
rescue
end
end
# print 10 random classifications
1.upto(10){
random_entry = DATA_SET.sample[0]
puts "#{random_entry} - Classified as - #{cls.classify(random_entry)}"
}
Results:
Organix Goodies squeezy banaan, aardbei & zuivel - Classified as - Organix
AH Dames hipster elastisch zwart maat M => John Cabot / AH
Piramide Sterrenmix fair trade => - Piramide
Royal Club Bitter lemon => Royal Club
AH Fruitbiscuit yoghurt/ aardbei => AH
Toni & Guy Mask reconstruction treatment => Toni & Guy
AH Kinder enkelsok wit mt 23-26 => AH
Theramed Aardbei junior 6+ jaar => Theramed
Arla Bio drinkyoghurt limoen/ munt => Arla
AH Rauwkost Amsterdamse ui => AH
I'm in an introductory software development class, and my homework is to
create a rock paper scissors program that takes two arguments (rock,
paper), etc, and returns the arg that wins.
Now I would make quick work of this problem if I could use conditionals,
but the assignment says everything we need to know is in the first three
chapters of the ruby textbook, and these chapters DO NOT include
conditionals! Would it be possible to create this program without them?
Or is he just expecting us to be resourceful and use the conditionals?
It's a very easy assignment with conditionals though...I'm thinking that
I might be missing something here.
EDIT: I'm thinking of that chmod numerical system and think a solution may be possible through that additive system...
Here's one only using hashes:
RULES = {
:rock => {:rock => :draw, :paper => :paper, :scissors => :rock},
:paper => {:rock => :paper, :paper => :draw, :scissors => :scissors},
:scissors => {:rock => :rock, :paper => :scissors, :scissors => :draw}
}
def play(p1, p2)
RULES[p1][p2]
end
puts play(:rock, :paper) # :paper
puts play(:scissors, :rock) # :rock
puts play(:scissors, :scissors) # :draw
def winner(p1, p2)
wins = {rock: :scissors, scissors: :paper, paper: :rock}
{true => p1, false => p2}[wins[p1] == p2]
end
winner(:rock, :rock) # => :rock d'oh! – tokland
Per #sarnold, leaving this as an exercise for the student :).
I very much doubt you've seen array/set intersections, so just for fun:
def who_wins(p1, p2)
win_moves = {"rock" => "paper", "paper" => "scissors", "scissors" => "rock"}
([p1, p2] & win_moves.values_at(p1, p2)).first
end
who_wins("rock", "paper") # "paper"
who_wins("scissors", "rock") # "rock"
who_wins("scissors", "scissors") # nil
A simple hash to the rescue:
def tell_me(a1, a2)
input = [a1 , a2].sort.join('_').to_sym
rules = { :paper_rock => "paper", :rock_scissor => "rock", :paper_scissor => "scissor"}
rules[input]
end
I just think the simplest solution has to be something like:
#results = {
'rock/paper' => 'paper',
'rock/scissors' => 'rock',
'paper/scissors' => 'scissors',
'paper/rock' => 'paper',
'scissors/paper' => 'scissors',
'scissors/rock' => 'rock'
}
def winner p1, p2
#results["#{p1}/#{p2}"]
end
WINNAHS = [[:rock, :scissors], [:scissors, :paper], [:paper, :rock]]
def winner(p1, p2)
(WINNAHS.include?([p1,p2]) && p1) || (WINNAHS.include?([p2,p1]) && p2) || :tie
end
winner(:rock, :paper) #=> :paper
winner(:scissors, :paper) #=> :scissors
winner(:scissors, :scissors) #=> :tie
I don't know much about ruby, but I solved a problem like this long ago by using values for each one (eg, R = 1, P = 2, S=3).
Actually, I just googled after thinking about that and someone solved the problem in python using an array.
pguardiario's solution above can be modified per the below to show both (1) which player won (as opposed to the choice of object that won) and (2) the result when there is a draw:
def rps(p1, p2)
#results = {
'rock/paper' => "Player 2 won!",
'rock/scissors' => "Player 1 won!",
'paper/scissors' => "Player 2 won!",
'paper/rock' => "Player 1 won!",
'scissors/paper' => "Player 1 won!",
'scissors/rock' => "Player 2 won!",
'rock/rock' => "Draw!",
'scissors/scissors' => "Draw!",
'paper/paper' => "Draw!"
}
#results["#{p1}/#{p2}"]
end
rps("rock", "rock") => "Draw!"
rps("rock", "scissors") => "Player 1 won!"
rps("rock", "paper") => "Player 2 won!"
...etc
Hey guys I've got a couple of issues with my code.
I was wondering that I am plotting
the results very ineffectively, since
the grouping by hour takes ages
the DB is very simple it contains the tweets, created date and username. It is fed by the twitter gardenhose.
Thanks for your help !
require 'rubygems'
require 'sequel'
require 'gnuplot'
DB = Sequel.sqlite("volcano.sqlite")
tweets = DB[:tweets]
def get_values(keyword,tweets)
my_tweets = tweets.filter(:text.like("%#{keyword}%"))
r = Hash.new
start = my_tweets.first[:created_at]
my_tweets.each do |t|
hour = ((t[:created_at]-start)/3600).round
r[hour] == nil ? r[hour] = 1 : r[hour] += 1
end
x = []
y = []
r.sort.each do |e|
x << e[0]
y << e[1]
end
[x,y]
end
keywords = ["iceland", "island", "vulkan", "volcano"]
values = {}
keywords.each do |k|
values[k] = get_values(k,tweets)
end
Gnuplot.open do |gp|
Gnuplot::Plot.new(gp) do |plot|
plot.terminal "png"
plot.output "volcano.png"
plot.data = []
values.each do |k,v|
plot.data << Gnuplot::DataSet.new([v[0],v[1]]){ |ds|
ds.with = "linespoints"
ds.title = k
}
end
end
end
This is one of those cases where it makes more sense to use SQL. I'd recommend doing something like what is described in this other grouping question and just modify it to use SQLite date functions instead of MySQL ones.