Related
So far, I have this code that reads a file and sorts it using Ruby. But this doesn't sort the numbers correctly and I think it will be inefficient, given that the file can be as big as 200GB and contains a number on each line. Can you suggest what else to do?
File.open("topN.txt", "w") do |file|
File.readlines("N.txt").sort.reverse.each do |line|
file.write(line.chomp<<"\n")
end
End
After everyone help over here this is how my code is looking so far...
begin
puts "What is the file name?"
file = gets.chomp
puts "Whats is the N number?"
myN = Integer(gets.chomp)
rescue ArgumentError
puts "That's not a number, try again"
retry
end
topN = File.open(file).each_line.max(myN){|a,b| a.to_i <=> b.to_i}
puts topN
Sorting 200GB of data in memory will not be very performant. I would write a little helper class which only remembers the N biggest elements added so far.
class SortedList
attr_reader :list
def initialize(size)
#list = []
#size = size
end
def add(element)
return if #min && #min > element
list.push(element)
reorganize_list
end
private
def reorganize_list
#list = list.sort.reverse.first(#size)
#min = list.last
end
end
Initialize an instance with the require N and the just add the values parsed from each line to this instance.
sorted_list = SortedList.new(n)
File.readlines("N.txt").each do |line|
sorted_list.add(line.to_i)
end
puts sorted_list.list
Suppose
str = File.read(in_filename)
#=> "117\n106\n143\n147\n63\n118\n146\n93\n"
You could convert that string to an enumerator that enumerates lines, use Enumerable#sort_by to sort those lines in descending order, join the resulting lines (that end in newlines) to form a string that can be written to file:
str.each_line.sort_by { |line| -line.to_i }.join
#=> "147\n146\n143\n118\n117\n106\n93\n63\n"
Another way is to convert the string to array of integers, sort the array using Array#sort, reverse the resulting array and then join the elements of the array back into a string that can be written to file:
str.each_line.map(&:to_i).sort.reverse.join("\n") << "\n"
#=> "147\n146\n143\n118\n117\n106\n93\n63\n"
Let's do a quick benchmark.
require 'benchmark/ips'
(str = 1_000_000.times.map { rand(10_000) }.join("\n") << "\n").size
Benchmark.ips do |x|
x.report("sort_by") { str.each_line.sort_by { |line| -line.to_i }.join }
x.report("sort") { str.each_line.map(&:to_i).sort.reverse.join("\n") << "\n" }
x.compare!
end
Comparison:
sort: 0.4 i/s
sort_by: 0.3 i/s - 1.30x slower
The mighty sort wins again!
You left this comment on your question:
"Write a program, topN, that given a number N and an arbitrarily large file that contains individual numbers on each line (e.g. 200Gb file), will output the largest N numbers, highest first."
That problem seems to me as somewhat different than the one described in the question, and also constitutes a more interesting problem. I have addressed that problem in this answer.
Code
def topN(fname, n, m=n)
raise ArgumentError, "m cannot be smaller than n" if m < n
f = File.open(fname)
best = Array.new(n)
n.times do |i|
break best.replace(best[0,i]) if f.eof?
best[i] = f.readline.to_i
end
best.sort!.reverse!
return best if f.eof?
new_best = Array.new(n)
cand = Array.new(m)
until f.eof?
rd(f, cand)
merge_arrays(best, new_best, cand)
end
f.close
best
end
def rd(f, cand)
cand.size.times { |i| cand[i] = (f.eof? ? -Float::INFINITY : f.readline.to_i) }
cand.sort!.reverse!
end
def merge_arrays(best, new_best, cand)
cand_largest = cand.first
best_idx = best.bsearch_index { |n| cand_largest > n }
return if best_idx.nil?
bi = best_idx
cand_idx = 0
nbr_to_compare = best.size-best_idx
nbr_to_compare.times do |i|
if cand[cand_idx] > best[bi]
new_best[i] = cand[cand_idx]
cand_idx += 1
else
new_best[i] = best[bi]
bi += 1
end
end
best[best_idx..-1] = new_best[0, nbr_to_compare]
end
Examples
Let's create a file with 10 million representations of integers, one per line.
require 'time'
FName = 'test'
(s = 10_000_000.times.with_object('') { |_,s| s << rand(100_000_000).to_s << "\n" }).size
s[0,27]
#=> "86752031\n84524374\n29347072\n"
File.write(FName, s)
#=> 88_888_701
Next, create a simple method to invoke topN with different arguments and to also show execution times.
def try_one(n, m=n)
t = Time.now
a = topN(FName, n, m)
puts "#{(Time.new-t).round(2)} seconds"
puts "top 5: #{a.first(5)}"
puts "bot 5: #{a[n-5..n-1]}"
end
In testing I found that setting m less than n was never desirable in terms of computational time. Requiring that m >= n allowed a small simplification to the code and a small efficiency improvement. I therefore made m >= n a requirement.
try_one 100, 100
9.44 seconds
top 5: [99999993, 99999993, 99999991, 99999971, 99999964]
bot 5: [99999136, 99999127, 99999125, 99999109, 99999078]
try_one 100, 1000
9.53 seconds
top 5: [99999993, 99999993, 99999991, 99999971, 99999964]
bot 5: [99999136, 99999127, 99999125, 99999109, 99999078]
try_one 100, 10_000
9.95 seconds
top 5: [99999993, 99999993, 99999991, 99999971, 99999964]
bot 5: [99999136, 99999127, 99999125, 99999109, 99999078]
Here I've tested for the case of producing the 100 largest values with different number of lines of the file to read at a time m. As seen, the method is insensitive to this latter value. As expected, the largest 5 values and the smallest 5 values (of the 100 returned) are the same in all cases.
try_one 1_000
9.31 seconds
top 5: [99999993, 99999993, 99999991, 99999971, 99999964]
bot 5: [99990425, 99990423, 99990415, 99990406, 99990399]
try_one 1000, 10_000
9.24 seconds
The time required to return the 1,000 largest values is, in fact, slightly less than the times for returning the largest 100. I expect that's not reproducible. The top 5 are of course the same as when returning the largest 100 values. I therefore will not display that line below. The smallest 5 values of the 1000 returned are of course smaller than when the largest 100 values are returned.
try_one 10_000
12.15 seconds
bot 5: [99898951, 99898950, 99898946, 99898932, 99898922]
try_one 100_000
13.2 seconds
bot 5: [98995266, 98995259, 98995258, 98995254, 98995252]
try_one 1_000_000
14.34 seconds
bot 5: [89999305, 89999302, 89999301, 89999301, 89999287]
Explanation
Notice that reuse three arrays, best, cand and new_best. Specifically, I replace the contents of these arrays many times rather than continually creating new (potentially very large) arrays, leaving orphaned arrays to be garbage-collected. A little testing showed this approach improved performance.
We can create a small example and then step through the calculations.
fname = 'temp'
File.write(fname, 20.times.map { rand(100) }.join("\n") << "\n")
#=> 58
This file contains representations of integers in the following array.
arr = File.read(fname).lines.map(&:to_i)
#=> [9, 66, 80, 64, 67, 67, 89, 10, 62, 94, 41, 16, 0, 22, 68, 72, 41, 64, 87, 24]
Sorted, this is:
arr.sort_by! { |n| -n }
#=> [94, 89, 87, 80, 72, 68, 67, 67, 66, 64, 64, 62, 41, 41, 24, 22, 16, 10, 9, 0]
Let's assume we want the 5 largest values.
arr[0,5]
#=> [94, 89, 87, 80, 72]
First, set the two parameters: n, the number of largest values to return, and m, the number of lines to read from the file at a time.
n = 5
m = 5
The calculation follow.
m < n
#=> false, so do not raise ArgumentError
f = File.open(fname)
#=> #<File:temp>
best = Array.new(n)
#=> [nil, nil, nil, nil, nil]
n.times { |i| f.eof? ? (return best.replace(best[0,i])) : best[i] = f.readline.to_i }
best
#=> [9, 66, 80, 64, 67]
best.sort!.reverse!
#=> [80, 67, 66, 64, 9]
f.eof?
#=> false, so do not return
new_best = Array.new(n)
#=> [nil, nil, nil, nil, nil]
cand = Array.new(m)
#=> [nil, nil, nil, nil, nil]
puts "best=#{best}".rjust(52)
until f.eof?
rd(f, cand)
merge_arrays(best, new_best, cand)
puts "cand=#{cand}, best=#{best}"
end
f.close
best
#=> [94, 89, 87, 80, 72]
The following is displayed.
best=[80, 67, 66, 64, 9]
cand=[94, 89, 67, 62, 10], best=[94, 89, 80, 67, 67]
cand=[68, 41, 22, 16, 0], best=[94, 89, 80, 68, 67]
cand=[87, 72, 64, 41, 24], best=[94, 89, 87, 80, 72]
Enumerable.max takes an argument which specifies how many elements will be returned, and a block which specifies how elements are compared:
N = 5
p File.open("test.txt").each_line.max(N){|a,b| a.to_i <=> b.to_i}
This does not read the entire file in memory; the file is read line by line.
I'm having problems with this practice question.
Write a Ruby script to display the number of Armstrong numbers in an array of numbers.
An Armstrong number is a number in which the sum of [the] cubes of the digits of the number is [the] same as the number. For instance, 153, 370 and 371 are Armstrong numbers. Example:
153 = 1^3 + 5^3 + 3^3 = 1 + 125 + 27 = 153
Sample input
numbers = [123, 124, 153, 370, 234, 23, 45]
Then the output will be
There are 2 Armstrong numbers in the list.
My code is the following:
def get
number = [123, 124, 153, 370, 234, 23, 45]
s = number.count{}
new_num = number
sum = 0
while new_num > 0
sum = sum + (new_num % 10) * (new_num % 10) * (new_num % 10)
new_num = new_num / 10
number.count(new_num)
end
if sum == number
puts "There are #{s} Armstrong"
end
end
It gives no output, and I'm not sure why.
As already suggested you can utilise digits. Used with reduce you can write something like this:
number.select { |n| n.digits.reduce(0) { |m, n| m + n**3 } == n }
#=> [153, 370]
If the method must be used frequently, one can save time by defining a constant holding a set (rather than an array, for faster lookup) of the first so-many Armstrong numbers. For example,
require 'set'
FIRST_ARMSTRONG_NUMBERS = [1, 2, 3, 4, 5, 6, 7, 8, 9, 153, 370, 371, 407, 1634,
8208, 9474, 54748, 92727, 93084, 548834, 1741725, 4210818, 9800817, 9926315,
24678050, 24678051, 88593477, 146511208, 472335975, 534494836, 912985153,
4679307774, 32164049650, 32164049651].to_set
#=> #<Set: {1, 2, 3, 4, 5, 6, 7, 8, 9, 153, 370, 371, 407, 1634, 8208,
# 9474, 54748, 92727, 93084, 548834, 1741725, 4210818,
# 9800817, 9926315, 24678050, 24678051, 88593477, 146511208,
# 472335975, 534494836, 912985153, 4679307774, 32164049650,
# 32164049651}>
MAX_FIRST_ARMSTRONG_NUMBERS = FIRST_ARMSTRONG_NUMBERS.max
#=> 32164049651
def count_armstrong_numbers(arr)
arr.count do |n|
if n <= MAX_FIRST_ARMSTRONG_NUMBERS
FIRST_ARMSTRONG_NUMBERS.include?(n)
else
n.digits.sum { |d| d**3 } == n
end
end
end
Another solution using a recent ruby version
def armstrong?(number)
number.digits.sum { |x| x**number.digits.size } == number
end
Could someone tell me how I can achieve replacing an element in this 2D array? I tried each, include and replace and wasn't able to figure out where I am going wrong. Thank you in advance for any help.
class Lotto
def initialize
#lotto_slip = Array.new(5) {Array(6.times.map{rand(1..60)})}
end
def current_pick
#number = rand(1..60).to_s
puts "The number is #{#number}."
end
def has_number
#prints out initial slip
#lotto_slip.each {|x| p x}
#Prints slip with an "X" replacing number if is on slip
#Ex: #number equals 4th number on slip --> 1, 2, 3, X, 5, 6
#lotto_slip.each do |z|
if z.include?(#number)
z = "X"
p #lotto_slip
else
z = z
p #lotto_slip
end
end
end
end
test = Lotto.new
test.current_pick
test.has_number
Let me know if this works out (tried to reduce the variations from 1 to 10 in order to be able to test easier):
class Lotto
def initialize
#lotto_slip = Array.new(5) {Array(6.times.map{rand(1..10)})}
end
def current_pick
#number = rand(1..10)
puts "The number is #{#number}."
end
def has_number
#prints out initial slip
#lotto_slip.each {|x| p x}
#Prints slip with an "X" replacing number if is on slip
#Ex: #number equals 4th number on slip --> 1, 2, 3, X, 5, 6
#lotto_slip.each do |z|
if z.include?(#number)
p "#{#number} included in #{z}"
z.map! { |x| x == #number ? 'X' : x}
end
end
#lotto_slip
end
end
test = Lotto.new
test.current_pick
p test.has_number
The problems I saw with your code are:
You don't need the to_s for this line #number = rand(1..60).to_s, else how are you going to compare the numbers produced by the array with an actual string?
You need to re-generate the array instead of re-assigning, that's why I've replaced all of that code with z.map! { |x| x == #number ? 'X' : x} which basically re-generates the entire array.
Not necessary iterate with each, use map:
#lotto_slip = Array.new(5) {Array(6.times.map{rand(1..60)})}
#=> [[25, 22, 10, 10, 57, 17], [37, 4, 8, 52, 55, 7], [44, 30, 58, 58, 50, 19], [49, 49, 24, 31, 26, 28], [24, 18, 39, 27, 8, 54]]
#number = 24
#lotto_slip.map{|x| x.map{|x| x == #number ? 'X' : x}}
#=> [[25, 22, 10, 10, 57, 17], [37, 4, 8, 52, 55, 7], [44, 30, 58, 58, 50, 19], [49, 49, "X", 31, 26, 28], ["X", 18, 39, 27, 8, 54]]
I want to increase my step value in each loop iteration, but my solution is not working.
n=1
(0..100).step(n) do |x|
puts x
n+=1
end
Is there any way to change "n" or I'm must using "while loop" or smth else?
I assuming you are trying to print 1, 3, 6, 10, 15, 21 etc.
step documentation says:
Iterates over range, passing each nth element to the block. If the range
contains numbers, n is added for each iteration.
So what you are trying to do can't be done with step. A while or traditional for loop should do the trick.
Here's a custom Enumerator based on Aurélien Bottazini's answer:
tri = Enumerator.new do |y|
n = 0
step = 1
loop do
y << n
n = n + step
step += 1
end
end
tri.take(10)
#=> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45]
tri.take_while { |i| i < 100 }
#=> [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 66, 78, 91]
I think the best way to do what you want is to use a while loop.
step = 1
last = 100
i = 0
while (i < last )
puts i
step += 1
i += step
end
It might be possible to do it with step, but it requires you to fully understand how it works, and maybe using some hacky stuff to make it work. But why to do that when you have a simple solution already available to you?
I would do it with loop and a break
n = 0
step = 1
loop do
puts n
n = n + step
step += 1
break if n > 100
end
What I think you are asking, now that I look at it again, is changing the argument given to step inside the block given to the step method. This is not possible, due to the argument for step being evaluated before step can continue on its job of running the given block, and assigning the block variable x.
I have a code to get list of prime numbers:
def primes_numbers num
primes = [2]
3.step(Math.sqrt(num) + 1, 2) do |i|
is_prime = true
primes.each do |p| # (here)
if (p > Math.sqrt(i) + 1)
break
end
if (i % p == 0)
is_prime = false
break
end
end
if is_prime
primes << i
end
end
primes
end
Is it possible rewrite code using Array methods (select, collect and so on...)?
Something like:
s = (3..n)
s.select { |x| x % 2 == 1}.select{ |x| ..... }
The problem is that I need to iterate throught result array (comment 'here') in the select method.
Ruby 1.9 has a very nice Prime class:
http://www.ruby-doc.org/core-1.9/classes/Prime.html
But I'm assuming you don't care about any standard classes, but want to see some code, so here we go:
>> n = 100 #=> 100
>> s = (2..n) #=> 2..100
>> s.select { |num| (2..Math.sqrt(num)).none? { |d| (num % d).zero? }}
#=> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
Note: I wrote it this way because you wanted Enumerable methods, for efficiency's sake you probably want to read up on prime finding methods.
You can list prime numbers like this as well.
Example Array: ar = (2..30).to_a
ar.select{ |n| (2..n).count{ |d| (n % d).zero? } == 1 }
Features :
Check a number is Prime, get a number factors and get list of prime numbers and also you can easily transform it in any language you want
As Ruby has its own Prime class so you don't need to worry
but if you want to do it your own without using ruby core things
n=100 #=> a
def prime_numbers(n)
prime_numbers = []
(1..n).each do |number|
prime_numbers << number if is_prime(number)
end
prime_numbers
end
def is_prime(n)
if factors(n).count > 2
return true
end
return false
end
# find factors of a number
def factors(n)
factors = []
(1..n).each {|d| factors << d if (n%d).zero?}
factors
end
Note: There are three functions involved and I deliberately do this for beginners, So that they can easily understand about it
Optimization Guide:
1) You can start loop from 2 and end at n-1 if you want to save iterations
2) use Ruby core functions and enjoy things :)