The code below comes from the documentation for the Ruby Gem rroc. I desperately need to calculate the AUC for my AI project. However I have virtually no knowledge of Ruby file I/O, not having had occasion to learn. The documentation says rroc expects an n by 2 array but the first line of code below suggest that the data is in a csv file and it will be formatted into my_data for roc to calculate the auc.
I have tried every conceivable combination of csv data and arrays as both files for the first line to read or direct input into the line calculating auc. At best the code works, without error but gives a useless output of 0. My hope is that if I had a fuller understand of what that line does, I could either fix the problem or give up on the gem since a previous version of this gem was shown to be obsolete and this one's 8 years old. I took the data from the article referenced by the gem author and am pretty sure it's not the problem, but then,...
So, to refine the question: from that statement, can we tell what kind of data should be in 'some_data.cvs'? And what will be done to it to make my_data?
require 'rroc'
my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
auc = ROC.auc(my_data)
puts auc
Below I've copied the output for two runs, the first with array data read in, the second with csv values (each in separate files). I added a line to read out the input file just to be sure.
RoyiMac:ruby $ ruby PDaucT.rb
[[90, 1], [80, 1], [70,-1], [60,1], [55,1], [54,1], [53,-1], [52,-1], [51,1], [50,-1], [40,1], [39,-1], [38,1], [37,-1], [36,-1], [35,-1], [34,1], [33,-1], [30,1], [10,-1]]
0.0
RoyiMac:ruby $ ruby PDaucT.rb
90,1,80,1,70,-1,60,1,55,1,54,1,53,-1,52,-1,51,1,50,-1,40,1,39,-1,38,1,37,-1,36,-1,35,-1,34,1,33,-1,30,1,10,-1
0.0
The explanation of the code:
open('some_data.csv') # open the some_data.csv file
.readlines # returns an array with each element being a line
.collect { |l| # for each line do the following tranformation
l.strip # remove proceeding and trailing whitespace characters
.split(',') # split the line based on the "," character (returning an array)
.map(&:to_f) # call .to_f on each element in the array, converting them to a float value
}
map/collect are aliases of each other.
However, like tadman already said in the comments you're better of using the csv standard library. The same can be achieved with:
require 'csv'
my_data = CSV.read('some_data.csv', converters: :float)
# should output
#=> [[90, 1], [80, 1], [70,-1], [60,1], [55,1], [54,1], [53,-1], [52,-1], [51,1], [50,-1], [40,1], [39,-1], [38,1], [37,-1], [36,-1], [35,-1], [34,1], [33,-1], [30,1], [10,-1]]
Related
When processing a file, I used to use the special variable $. to get the last line number being read. For instance, the following program
require 'csv'
IFS=';'
CSV_OPTIONS = { col_sep: IFS, external_encoding: Encoding::ISO_8859_1, internal_encoding: Encoding::UTF_8 }
CSV.new($stdin, CSV_OPTIONS).each do |row|
puts "::::line #{$.} row=#{row}"
end
is supposed to dump a CSV file (where the fields are delimited by semicolon instead of comma, as is the case in our project) and prepend each output line by the line number.
After updating Ruby to
_ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-cygwin]_
the lines are still dumped, but the line number is always displayed as zero.
What strikes me, is that this Ruby Wiki on special Ruby variables, while still having $. in its list, doesn't have a description for this variable anymore. So I wonder: Is this variable gone, or was it never supposed to work with the csv class and just worked for me by accident in the earlier versions?
I'm not sure why $. isn't working for you, but it's also not the best solution here. When it works, $. gives you the number of lines read from input, but since quoted fields in a CSV file can span multiple lines the number you get from $. won't always be the number of rows that have been read.
As mentioned above, each_with_index is a good alternative:
CSV.new($stdin, CSV_OPTIONS).each_with_index do |row, i|
puts "::::row #{i} row=#{row}"
end
Another alternative is CSV#lineno:
lineno()
The line number of the last row read from this file. Fields with nested line-end characters will not affect this count.
You would use it like this:
csv = CSV.new($stdin, CSV_OPTIONS)
csv.each do |row|
puts "::::row #{csv.lineno} row=#{row}"
end
Note that each_with_index will start counting at 0, whereas lineno starts at 1.
You can see both approaches in action on repl.it: https://repl.it/#jrunning/LoudBlushingCharactercode
I'm new to ruby. I understand that, when I see a ruby script, it usually contains lines similar to this:
#!/usr/bin/env ruby
require 'rubyfunction1'
require 'rubyfunction2'
I understand that the require lines are basically (to put it in simple basic terms), calling other scripts. That is really all there is to it. These other scripts are functions.
Now, suppose, I put the content of the rubyfunction1 and rubyfunction2 scripts into two different variables. How do I require the content of a variable?
Or, suppose I want to be able to do something like this:
require '`/home/swenson/rubyfunction1.rb`'
I understand this is a roundabout way of requiring gems/ruby functions, but I'm curious to know if it is at all possible in this manner.
Basically, if I were to run the /home/swenson/rubyfunction1.rb script by itself on the command line, it will basically output to you the content of the script. It would be equivalent to doing "cat /home/swenson/rubyfunction1.rb".
I want to be able to do something like this:
require '`/home/swenson/rubyfunction1.rb`'
require '`/home/swenson/rubyfunction2.rb`'
or
specvar1 = `/home/swenson/rubyfunction1.rb`
specvar2 = `/home/swenson/rubyfunction2.rb`
require specvar1
require specvar2
Is this possible? Any suggestions I can apply to get it to work?
UPDATE:
So here's what I ended up doing.
Main Script called example.rb:
#!/usr/bin/env ruby
add = `./add.rb` # for my purposes, this will serve as require
subtract = `./subtract.rb` # for my purposes, this will serve as require
eval add
puts "I can add: #{add(3, 2)}"
eval subtract
puts "I can now subtract #{subtract(3, 2)}"
Content of add.rb:
#!/usr/bin/env ruby
puts <<-function
#!/usr/bin/env ruby
def add(a, b)
a + b
end
function
Content of subtract.rb:
#!/usr/bin/env ruby
puts <<-function
#!/usr/bin/env ruby
def subtract(a, b)
a - b
end
function
When run from the command line, I get no errors:
# ./example.rb
I can add: 5
I can now subtract 1
Basically, what I want done is precisely this. However, I know there's probably a optimized way of doing this (without having to directly require the relative file). So please, feel free to help me update or optimize this.
I understand that the require lines are basically (to put it in simple basic terms), calling other scripts. That is really all there is to it.
Yes. load, require, and require_relative simply run a Ruby file. That's it.
These other scripts are functions.
No. They are scripts. There is no such thing as a function in Ruby.
Now, suppose, I put the content of the rubyfunction1 and rubyfunction2 scripts into two different variables. How do I require the content of a variable?
You can't. require runs a file. It takes the name of a file (more precisely, a relative path) as an argument. Ruby code is not the name of a file.
Or, suppose i want to be able to do something like this:
require '`/home/swenson/rubyfunction1.rb`'
I understand this is a roundabout way of requiring gems/ruby functions, but im curious to know if it is at all possible in this manner.
This is possible. There's nothing special about this. It will simply run a file at the path `/home/swenson/rubyfunction1.rb`. That is a slightly unusual path, but there is nothing special about it. It's just a path like any other, with some funny characters in it.
so to iterate what im trying to do, i want to be able to do something like this:
require '`/home/swenson/rubyfunction1.rb`'
require '`/home/swenson/rubyfunction2.rb`'
or
specvar1 = `/home/swenson/rubyfunction1.rb`
specvar2 = `/home/swenson/rubyfunction2.rb`
require specvar1
require specvar2
Is this possible? Any suggestions I can apply to get it to work?
It's not quite clear what you want here. Those two code snippets are in no way equivalent, they do completely different things!
The first one passes the literal strings '`/home/swenson/rubyfunction1.rb`' and '`/home/swenson/rubyfunction2.rb`' as arguments to require. The second one executes two files named /home/swenson/rubyfunction1.rb and /home/swenson/rubyfunction2.rb using the default system shell (CMD.EXE on Windows, /bin/sh on POSIX), gets the standard output as String and passes those strings to require.
Note that in the first case, the backticks ` are part of the filename, whereas in the second case, they are Ruby syntax for calling the Kernel#` method.
So, I think I understand your question correctly, let's say we have 3 files
add.rb
#!/usr/bin/env ruby
def add(a, b)
a + b
end
subtract.rb
#!/usr/bin/env ruby
puts "def subtract(a, b)"
puts " a - b"
puts "end"
example.rb
require './add.rb'
subtract = `./subtract.rb`
puts "I can add: #{add(3, 2)}"
# can't do `subtract`, yet, as we haven't `eval`ed the code even though we've run executed the file
eval subtract
puts "I can now subtract #{subtract(3, 2)}"
And the output of running ruby example.rb on the command line is:
$ ruby example.rb
I can add: 5
I can now subtract 1
So, add.rb just defines a function add. When we require that file, it gets loaded in so we can use that function in our code with no problems.
But, subtract.rb doesn't define a function...it just outputs some code, so running it on the command line looks like:
$ ./subtract.rb
def subtract(a, b)
a - b
end
So now, in our third file example.rb, we require the add.rb and then we can start using add in our code as is, but then we want to execute the subtract.rb (using back ticks here) and capture the output of it. At this point, we can't subtract 2 numbers, because we haven't done anything with the output. Then we use eval to evaluate the output of the subtract method, which will define a method for us, then we can subtract the 2 numbers without a problem.
Note that eval is generally frowned upon because it allows arbitrary code to be executed. Never eval untrusted code unless you know how to tame it. In this case, as #JörgWMittag has pointed out in the comments, this code should be trusted, otherwise you just executed an un-trusted file to get this code. Be careful with user input, though, as that's not trusted.
I want to insert data in specific positions in a text file, like in line 1 starting from position 10, how can I do it using ruby?
I also want to pass fake data into this file using fakker gem or in any other way possible. Like sending phone number, name, SSN etc.
Here's a sample script that takes two arguments and writes a modified copy of the first file's contents to the second file:
require 'faker'
input = File.open(ARGV[0], 'r')
lines = input.readlines
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
output = File.open(ARGV[1], 'w')
lines.each do |line|
output.write(line)
end
If you have an input file that looks like:
12345678901234567890
^^^ fake data
the output might look like:
12345678909451234567890
^^^ fake data
Since I opened the output file after reading the input file, you can pass the same file name as both the first and the second argument. That isn't exactly inserting the string into the file, but it's as close as you'll get.
The key line is:
lines[0].gsub!(/^(.{0,10})/, '\1' + Faker::Base.numerify('###').to_s)
It takes the fist line and substitutes in place a random 3-digit integer. If there are fewer than 10 characters in the first line, it'll append the random data to the end of the line. If you'd prefer to not substitute, you might want to remove the beginning of the range in the regex:
/^(.{10})/
Or maybe do something else if lines[0].length < 10.
I'm writing an import script that processes a file that has potentially hundreds of thousands of lines (log file). Using a very simple approach (below) took enough time and memory that I felt like it would take out my MBP at any moment, so I killed the process.
#...
File.open(file, 'r') do |f|
f.each_line do |line|
# do stuff here to line
end
end
This file in particular has 642,868 lines:
$ wc -l nginx.log /code/src/myimport
642868 ../nginx.log
Does anyone know of a more efficient (memory/cpu) way to process each line in this file?
UPDATE
The code inside of the f.each_line from above is simply matching a regex against the line. If the match fails, I add the line to a #skipped array. If it passes, I format the matches into a hash (keyed by the "fields" of the match) and append it to a #results array.
# regex built in `def initialize` (not on each line iteration)
#regex = /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - (.{0})- \[([^\]]+?)\] "(GET|POST|PUT|DELETE) ([^\s]+?) (HTTP\/1\.1)" (\d+) (\d+) "-" "(.*)"/
#... loop lines
match = line.match(#regex)
if match.nil?
#skipped << line
else
#results << convert_to_hash(match)
end
I'm completely open to this being an inefficient process. I could make the code inside of convert_to_hash use a precomputed lambda instead of figuring out the computation each time. I guess I just assumed it was the line iteration itself that was the problem, not the per-line code.
I just did a test on a 600,000 line file and it iterated over the file in less than half a second. I'm guessing the slowness is not in the file looping but the line parsing. Can you paste your parse code also?
This blogpost includes several approaches to parsing large log files. Maybe thats an inspiration. Also have a look at the file-tail gem
If you are using bash (or similar) you might be able to optimize like this:
In input.rb:
while x = gets
# Parse
end
then in bash:
cat nginx.log | ruby -n input.rb
The -n flag tells ruby to assume 'while gets(); ... end' loop around your script, which might cause it to do something special to optimize.
You might also want to look into a prewritten solution to the problem, as that will be faster.
I've written a little Ruby script that requires some user input. I anticipate that users might be a little lazy at some point during the data entry where long entries are required and that they might cut and paste from another document containing newlines.
I've been playing with the Highline gem and quite like it. I suspect I am just missing something in the docs but is there a way to get variable length multiline input?
Edit: The problem is that the newline terminates that input and the characters after the newline end up as the input for the next question.
Here's what the author uses in his example: (from highline-1.5.0/examples)
#!/usr/local/bin/ruby -w
# asking_for_arrays.rb
#
# Created by James Edward Gray II on 2005-07-05.
# Copyright 2005 Gray Productions. All rights reserved.
require "rubygems"
require "highline/import"
require "pp"
grades = ask( "Enter test scores (or a blank line to quit):",
lambda { |ans| ans =~ /^-?\d+$/ ? Integer(ans) : ans} ) do |q|
q.gather = ""
end
say("Grades:")
pp grades
General documentation on HighLine::Question#gather (from highline-1.5.0/lib/highline/question.rb)
# When set, the user will be prompted for multiple answers which will
# be collected into an Array or Hash and returned as the final answer.
#
# You can set _gather_ to an Integer to have an Array of exactly that
# many answers collected, or a String/Regexp to match an end input which
# will not be returned in the Array.
#
# Optionally _gather_ can be set to a Hash. In this case, the question
# will be asked once for each key and the answers will be returned in a
# Hash, mapped by key. The <tt>#key</tt> variable is set before each
# question is evaluated, so you can use it in your question.
#
attr_accessor :gather
These seem to be your main options w/in the library. Anything else, you'd have to do yourself.
Wouldn't it be something like:
input.gsub!('\r\n', '')