How to remove some items from a big array - ruby

I want to check the files that don't exist, so I write a code as follows:
MAX_ID = 43148178
def extract_ids
done = Dir['res/*.html'].map {|name| name[/\d+/].to_i}
all = (1..MAX_ID).to_a
all.delete_if { |i| done.include?(i) }
all.shuffle
end
ls res | wc -l returns 35854.
I find that this is slow. How do I do this effectively?

If 'done' is an array of items you wish to remove from the 'all' array, you can simply do this:
all = [1,2,3,4,5,6,7,8,9,10]
done = [1,3,5]
all - done
# => [2, 4, 6, 7, 8, 9, 10]
Or, as you want to change the all array
all -= done

Related

Reassign entire array to the same reference

I've searched extensively but sadly couldn't find a solution to this surely often-asked question.
In Perl I can reassign an entire array within a function and have my changes reflected outside the function:
#!/usr/bin/perl -w
use v5.20;
use Data::Dumper;
sub foo {
my ($ref) = #_;
#$ref = (3, 4, 5);
}
my $ref = [1, 2];
foo($ref);
say Dumper $ref; # prints [3, 4, 5]
Now I'm trying to learn Ruby and have written a function where I'd like to change an array items in-place by filtering out elements matching a condition and returning the removed items:
def filterItems(items)
removed, items = items.partition { ... }
After running the function, items returns to its state before calling the function. How should I approach this please?
I'd like to change an array items in-place by filtering out elements matching a condition and returning the removed items [...] How should I approach this please?
You could replace the array content within your method:
def filter_items(items)
removed, kept = items.partition { |i| i.odd? }
items.replace(kept)
removed
end
ary = [1, 2, 3, 4, 5]
filter_items(ary)
#=> [1, 3, 5]
ary
#=> [2, 4]
I would search for pass by value/reference in ruby. Here is one I found first https://mixandgo.com/learn/is-ruby-pass-by-reference-or-pass-by-value.
You pass reference value of items to the function, not the reference to items. Variable items is defined out of method scope and always refers to same value, unless you reassign it in the variable scope.
Also filterItems is not ruby style, see https://rubystyle.guide/
TL;DR
To access or modify an outer variable within a block, declare the variable outside the block. To access a variable outside of a method, store it in an instance or class variable. There's a lot more to it than that, but this covers the use case in your original post.
Explanation and Examples
In Ruby, you have scope gates and closures. In particular, methods and blocks represent scope gates, but there are certainly ways (both routine and meta) for accessing variables outside of your local scope.
In a class, this is usually handled by instance variables. So, as a simple example of String#parition (because it's easier to explain than Enumerable#partition on an Array):
def filter items, separator
head, sep, tail = items.partition separator
#items = tail
end
filter "foobarbaz", "bar"
#=> "baz"
#items
#=> "baz"
Inside a class or within irb, this will modify whatever's passed and then assign it to the instance variable outside the method.
Partitioning Arrays Instead of Strings
If you really don't want to pass things as arguments, or if #items should be an Array, then you can certainly do that too. However, Arrays behave differently, so I'm not sure what you really expect Array#partition (which is inherited from Enumerable) to yield. This works, using Enumerable#slice_after:
class Filter
def initialize
#items = []
end
def filter_array items, separator
#items = [3,4,5].slice_after { |i| i == separator }.to_a.pop
end
end
f = Filter.new
f.filter_array [3, 4, 5], 4
#=> [5]
Look into the Array class for any method which mutates the object, for example all the method with a bang or methods that insert elements.
Here is an Array#push:
ary = [1,2,3,4,5]
def foo(ary)
ary.push *[6, 7]
end
foo(ary)
ary
#=> [1, 2, 3, 4, 5, 6, 7]
Here is an Array#insert:
ary = [1,2,3,4,5]
def baz(ary)
ary.insert(2, 10, 20)
end
baz(ary)
ary
#=> [1, 2, 10, 20, 3, 4, 5]
Here is an example with a bang Array#reject!:
ary = [1,2,3,4,5]
def zoo(ary)
ary.reject!(&:even?)
end
zoo(ary)
ary
#=> [1, 3, 5]
Another with a bang Array#map!:
ary = [1,2,3,4,5]
def bar(ary)
ary.map! { |e| e**2 }
end
bar(ary)
ary
#=> [1, 4, 9, 16, 25]

How to extract multiple columns from a csv file with ruby?

Right now I can extract 1 column (column 6) from the csv file. How could I edit the script below to extract more than 1 column? Let's say I also want to extract column 9 and 10 as well as 6. I would want the output to be such that column 6 ends up in column 1 of the output file, 9 in the 2nd column of the output file, and column 10 in the 3rd column of the output file.
ruby -rcsv -e 'CSV.foreach(ARGV.shift) {|row| puts row [5]}' input.csv &> output.csv
Since row is an array, your question boils down to how to pick certain elements from an array; this is not related to CSV.
You can use values_at:
row.values_at(5,6,9,10)
returns the fields 5,6,9 and 10.
If you want to present these picked fields in a different order, it is however easier to map each index explicitly:
output_row = Array.new(row.size) # Or row.dup, depending on your needs
output_row[1] = row[6]
# Or, if you have used row.dup and want to swap the rows:
output_row[1],output_row[6] = row[6],row[1]
# and so on
out_csv.puts(output_row)
This assumes that you have defined before
out_csv=CSV.new(STDOUT)
since you want to have your new CSV be created on standard output.
Let's first create a (header-less) CSV file:
enum = 1.step
FNameIn = 't_in.csv'
CSV.open(FNameIn, "wb") { |csv| 3.times { csv << 5.times.map { enum.next } } }
#=> 3
I've assumed the file contains string representations of integers.
The file contains the three lines:
File.read(FNameIn).each_line { |line| p line }
"1,2,3,4,5\n"
"6,7,8,9,10\n"
"11,12,13,14,15\n"
Now let's extract the columns at indices 1 and 3. These columns are to be written to the output file in that order.
cols = [1, 3]
Now write to the CSV output file.
arr = CSV.read(FNameIn, converters: :integer).
map { |row| row.values_at(*cols) }
#=> [[2, 4], [7, 9], [12, 14]]
FNameOut = 't_out.csv'
CSV.open(FNameOut, 'wb') { |csv| arr.each { |row| csv << row } }
We have written three lines:
File.read(FNameOut).each_line { |line| p line }
"2,4\n"
"7,9\n"
"12,14\n"
which we can read back into an array:
CSV.read(FNameOut, converters: :integer)
#=> [[2, 4], [7, 9], [12, 14]]
A straightforward transformation of these operations is required to perform these operations from the command line.

How to get reference to object you're calling in method?

I am trying to create a method for objects that I create. In this case it's an extension of the Array class. The method below, my_uniq, works. When I call puts [1, 1, 2, 6, 8, 8, 9].my_uniq.to_s it outputs to [1, 2, 6, 8].
In a similar manner, in median, I'm trying to get the reference of the object itself and then manipulate that data. So far I can only think of using the map function to assign a variable arr as an array to manipulate that data from.
Is there any method that you can call that gets the reference to what you're trying to manipulate? Example pseudo-code that I could replace arr = map {|n| n} with something like: arr = self.
class Array
def my_uniq
hash = {}
each do |num|
hash[num] = 0;
end
hash.keys
end
end
class Array
def median
arr = map {|n| n}
puts arr.to_s
end
end
Thanks in advance!
dup
class Array
def new_self
dup
end
def plus_one
arr = dup
arr.map! { |i| i + 1 }
end
def plus_one!
arr = self
arr.map! { |i| i + 1 }
end
end
array = [1, 3, 5]
array.new_self # => [1, 3, 5]
array.plus_one # => [2, 4, 6]
array # => [1, 3, 5]
array.plus_one! # => [2, 4, 6]
array # => [2, 4, 6]
dup makes a copy of the object, making it a safer choice if you need to manipulate data without mutating the original object. You could use self i.e. arr = self, but anything you do that changes arr will also change the value of self. It's a good idea to just use dup.
If you do want to manipulate and change the original object, then you can use self instead of dup, but you should make it a "bang" ! method. It is a convention in ruby to put a bang ! at the end of a method name if it mutates the receiving object. This is particularly important if other developers might use your code. Most Ruby developers would be very surprised if a non-bang method mutated the receiving object.
class Array
def median
arr = self
puts arr.to_s
end
end
[1,2,3].median # => [1,2,3]

How to change more than one string variables to integer and minus 1?

I want to change 8 strings to integer type and minus 1. I wrote the code as:
foo1, foo2, too3 ... = foo1.to_i - 1, foo2.to_i - 1, foo3.to_i -1, ...
But I think it's too complex. Are there some better ways to achieve this goal?
[:foo1, :foo2, ... etc. ...].each { |foo| eval "#{foo} = #{foo}.to_i - 1" }
Although it's a bad idea if you've decided to do it.
Putting them in an Array would be the simplest thing.
%w{ 1 2 3 4 5 6 7 8 }.map!(&:to_i).map!(&:pred)
=> [0, 1, 2, 3, 4, 5, 6, 7]
You should not use this, eval is dangerous and dynamic. I would recommend moving your values into a hash where you can control the keys. If you want to do literally what you asked:
(1..2).map {|n | eval("foo#{n}").to_i - 1}
Example:
> foo1 = 2
=> 2
> foo2 = 3
=> 3
> (1..2).map {|n | eval("foo#{n}").to_i - 1}
=> [1, 2]
... non-terrifying way to store/process these values:
> hash = { :foo1 => 2, :foo2 => 3 }
=> {:foo1=>2, :foo2=>3}
hash.keys.inject({}) { |h, key| h[key] = hash[key].to_i - 1; h }
=> {:foo1=>1, :foo2=>2}
When you are working with 8 variables and need to do the same operation on them it usually means that they are linked somehow and can be grouped, for example in a hash:
data = {:foo1 => "1", :foo2 => "2", ...}
data2 = Hash[data.map { |key, value| [key, value.to_i - 1] }]
Note that instead of updating inplace values I create a new object, a functional approach is usually more clear.
Based on your comment to #Winfields solution this might be enough:
foo1 = "123"
foo2 = "222"
too3 = "0"
def one_less_int(*args) # the splat sign (#) puts all arguments in an array
args.map{|str| str.to_i-1}
end
p one_less_int(foo1, foo2, too3) #=> [122, 221, -1]
But putting everything in an array beforehand, as others are suggesting, is more explicit.

How to act differently on the first iteration in a Ruby loop?

I always use a counter to check for the first item (i==0) in a loop:
i = 0
my_array.each do |item|
if i==0
# do something with the first item
end
# common stuff
i += 1
end
Is there a more elegant way to do this (perhaps a method)?
You can do this:
my_array.each_with_index do |item, index|
if index == 0
# do something with the first item
end
# common stuff
end
Try it on ideone.
Using each_with_index, as others have described, would work fine, but for the sake of variety here is another approach.
If you want to do something specific for the first element only and something general for all elements including the first, you could do:
# do something with my_array[0] or my_array.first
my_array.each do |e|
# do the same general thing to all elements
end
But if you want to not do the general thing with the first element you could do:
# do something with my_array[0] or my_array.first
my_array.drop(1).each do |e|
# do the same general thing to all elements except the first
end
Arrays have an "each_with_index" method which is handy for this situation:
my_array.each_with_index do |item, i|
item.do_something if i==0
#common stuff
end
What fits best is depending on the situation.
Another option (if you know your array is not empty):
# treat the first element (my_array.first)
my_array.each do | item |
# do the common_stuff
end
each_with_index from Enumerable (Enumerable is already mixed in with Array, so you can call it on an array without any trouble):
irb(main):001:0> nums = (1..10).to_a
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
irb(main):003:0> nums.each_with_index do |num, idx|
irb(main):004:1* if idx == 0
irb(main):005:2> puts "At index #{idx}, the number is #{num}."
irb(main):006:2> end
irb(main):007:1> end
At index 0, the number is 1.
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
If you don't need the array afterwards:
ar = %w(reversed hello world)
puts ar.shift.upcase
ar.each{|item| puts item.reverse}
#=>REVERSED
#=>olleh
#=>dlrow
Ruby's Enumerable#inject provides an argument that can be used for doing something differently on the first iteration of a loop:
> l=[1,2,3,4]
=> [1, 2, 3, 4]
> l.inject(0) {|sum, elem| sum+elem}
=> 10
The argument is not strictly necessary for common things like sums and products:
> l.inject {|sum, elem| sum+elem}
=> 10
But when you want to do something different on the first iteration, that argument might be useful to you:
> puts fruits.inject("I like to eat: ") {|acc, elem| acc << elem << " "}
I like to eat: apples pears peaches plums oranges
=> nil
Here's a solution that doesn't need to be in an immediately enclosing loop and avoids the redundancy of specifying a status placeholder more than once unless you really need to.
do_this if ($first_time_only ||= [true]).shift
Its scope matches the holder: $first_time_only will be globally once; #first_time_only will be once for the instance, and first_time_only will be once for the current scope.
If you want the first several times, etc, you can easily put [1,2,3] if you need to distinguish which of the first iterations you're in, or even something fancy [1, false, 3, 4] if you need something weird.

Resources