Horizontal concat in Daru::DataFrame? - ruby

I know that by Daru::DataFrame#concat one can concatenate dataframes, appending the argument df to the bottom of the caller df.
Now I want to achieve what is df.concat(other, axis=1) in Pandas. In other words, given I have two dataframes where the index are the same, append one df to the right of the other df, resulting df having same index but the concatenated vectors.
Is this possible by some method? Or do I need to iterate and add each columns in for loop?

Is this what you are looking for possibly?
data_frame = data_frame.join(jobs_data_frame, how: :left, on: [:user_id])

You can use add_vector method.
For eg:
2.6.3 :001 > require 'daru'
2.6.3 :007 > df = Daru::DataFrame.new([[00,01,02], [10,11,12],[20,21,22]], order: ["a", "b", "c"])
=> #<Daru::DataFrame(3x3)>
a b c
0 0 10 20
1 1 11 21
2 2 12 22
2.6.3 :008 > df.add_vector("d", [30, 31, 32])
=> [30, 31, 32]
2.6.3 :009 > df
=> #<Daru::DataFrame(3x4)>
a b c d
0 0 10 20 30
1 1 11 21 31
2 2 12 22 32
Although you'll have to add each vector separately.

Related

I keep getting an error that I don't understand

I am trying to create a function that takes a string in it's parameters. It's supposed to determine the highest and lowest numeric values in the string and return them unchanged.
Here's my code:
def high_and_low(numbers)
numbers.split
numbers.each {|x| x.to_i}
return numbers.max().to_s, numbers.min().to_s
end
Here's the error:
main.rb:5:in `high_and_low': undefined method `each' for "4 5 29 54 4 0 -214 542 -64 1 -3 6 -6":String (NoMethodError)
from main.rb:8:in `<main>'
You have not changed the value from string to array.
Replace numbers.split with numbers = numbers.split.
Also you will need to change from numbers.each { |x| x.to_i } to numbers.map!(&:to_i). Otherwise you don't save integers anywhere.
BTW you don't have to use () and return (if it's in the end) so you can write [numbers.max.to_s, numbers.min.to_s].
Something like this should work:
def high_and_low(numbers)
numbers = numbers.split.map(&:to_i)
[numbers.max, numbers.min].map(&:to_s)
end
high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6") #=> ["542", "-214"]
And bonus (one liner, not that you should write code this way):
def high_and_low(numbers)
numbers.split.map(&:to_i).sort.values_at(-1, 0).map(&:to_s)
end
high_and_low("4 5 29 54 4 0 -214 542 -64 1 -3 6 -6") #=> ["542", "-214"]
The other answer is a good approach too so I include it here:
numbers.split.minmax_by { |n| -n.to_i }
Ruby has some nice methods available to make this much more simple:
"2 1 0 -1 -2".split.map(&:to_i).minmax
# => [-2, 2]
Breaking it down:
"2 1 0 -1 -2".split # => ["2", "1", "0", "-1", "-2"]
.map(&:to_i) # => [2, 1, 0, -1, -2]
.minmax # => [-2, 2]
If you want string versions of the values back, compare two integers in a block. minmax will return the values at the corresponding positions in the source array:
"2 1 0 -1 -2".split.minmax{ |a, b| a.to_i <=> b.to_i }
# => ["-2", "2"]
or:
"2 1 0 -1 -2".split.minmax_by{ |a| a.to_i }
# => ["-2", "2"]
minmax and minmax_by do the heavy lifting. The first is faster when there isn't a costly lookup to find the values being compared such as this case where the values are in an array and only needed to_i to compare them.
The *_by version performs a "Schwartzian transform" which basically remembers the values in the block as they're compared so the costly lookup only occurs once. (Many of Enumerable's methods have *_by versions.) These versions of the methods can improve the speed when you want to compare two values that are nested, perhaps in arrays of hashes of hashes, or objects within objects within objects.
Note: When comparing string versions of numbers it's important to convert to a numeric value when comparing. ASCII and strings order differently than numbers, hence the use of to_i.

Drop the first n rows from daru dataframe

One can drop the first n elements of an array by using Array#drop.
a = [1,2,3]
a.drop(2) # => [3]
I want to drop the first n rows from a Daru::DataFrame object. It seems this class does not implement such drop method.
How can I delete the first n rows from a Daru::DataFrame object?
You can use row_at to retrieve all the rows without the first 4.
Example:
2.4.5 :001 > require 'daru'
=> true
2.4.5 :002 > df = Daru::DataFrame.new({
2.4.5 :003 > 'col0' => [1,2,3,4,5,6],
2.4.5 :004 > 'col2' => ['a','b','c','d','e','f'],
2.4.5 :005 > 'col1' => [11,22,33,44,55,66]
2.4.5 :006?> })
=> #<Daru::DataFrame(6x3)>
col0 col2 col1
0 1 a 11
1 2 b 22
2 3 c 33
3 4 d 44
4 5 e 55
5 6 f 66
Retrieve rows:
2.4.5 :010 > df.row_at(4..df.shape()[0])
=> #<Daru::DataFrame(2x3)>
col0 col2 col1
4 5 e 55
5 6 f 66
You can put this into a loop:
df.delete_row(0)
https://www.rubydoc.info/gems/daru/0.1.4.1/Daru/DataFrame#delete_row-instance_method

Setting With Enlargement in daru

There is some way to Setting With Enlargement in daru? Something similar to pandas with loc.
Yes you can.
For Daru::Vector objects use the #push method like so:
require 'daru'
v = Daru::Vector.new([1,2,3], index: [:a,:b,:c])
v.push(23, :r)
v
#=>
#<Daru::Vector:74005360 #name = nil #size = 4 >
# nil
# a 1
# b 2
# c 3
# r 23
For setting a new vector in Daru::DataFrame, call the #[]= method with your new name inside the []. You can either assign a Daru::Vector or an Array.
If you assign Daru::Vector, the data will be aligned so that the indexes of the DataFrame and Vector match.
For example,
require 'daru'
df = Daru::DataFrame.new({a: [1,2,3], b: [5,6,7]})
df[:r] = [11,22,33]
df
# =>
#<Daru::DataFrame:73956870 #name = c8a65ffe-217d-43bb-b6f8-50d2530ec053 #size = 3>
# a b r
# 0 1 5 11
# 1 2 6 22
# 2 3 7 33
You assign a row with the DataFrame#row[]= method. For example, using the previous dataframe df:
df.row[:a] = [23,35,2]
df
#=>
#<Daru::DataFrame:73956870 #name = c8a65ffe-217d-43bb-b6f8-50d2530ec053 #size = 4>
# a b r
# 0 1 5 11
# 1 2 6 22
# 2 3 7 33
# a 23 35 2
Assigning a Daru::Vector will align according to the names of the vectors of the Daru::DataFrame.
You can see further details in these notebooks.
Hope this answers your question.

matlab - turn arrays into index values

Given a = [1, 7] and b = [4, 10], I want to create a new vector [1:4,7:10]. I can do this with a loop, but I was looking for vectorized solution. I tried using the bsxfun by defining the following function fun = #(c,d) c:d but then using bsxfun(fun, a, b). It generates 1:4 but not 7:10. Thanks.
See if this works for you -
lens = (b - a)+1; %// extents of each group
maxlens = max(lens) %// maximum extent
mask = bsxfun(#le,[1:maxlens]',lens) %// mask of valid numbers
vals = bsxfun(#plus,a,[0:maxlens-1]') %// all values
out = vals(mask).' %// only valid values are collected to form desired output
Sample run -
a =
1 7 15
b =
3 12 21
out =
1 2 3 7 8 9 10 11 12 15 16 17 18 19 20 21

How to find remainder of a division in Ruby?

I'm trying to get the remainder of a division using Ruby.
Let's say we're trying to divide 208 by 11.
The final should be "18 with a remainder of 10"...what I ultimately need is that 10.
Here's what I've got so far, but it chokes in this use case (saying the remainder is 0).
division = 208.to_f / 11
rounded = (division*10).ceil/10.0
remainder = rounded.round(1).to_s.last.to_i
The modulo operator:
> 208 % 11
=> 10
If you need just the integer portion, use integers with the / operator, or the Numeric#div method:
quotient = 208 / 11
#=> 18
quotient = 208.0.div 11
#=> 18
If you need just the remainder, use the % operator or the Numeric#modulo method:
modulus = 208 % 11
#=> 10
modulus = 208.0.modulo 11
#=> 10.0
If you need both, use the Numeric#divmod method. This even works if either the receiver or argument is a float:
quotient, modulus = 208.divmod(11)
#=> [18, 10]
208.0.divmod(11)
#=> [18, 10.0]
208.divmod(11.0)
#=> [18, 10.0]
Also of interest is the Numeric#remainder method. The differences between all of these can be seen in the documentation for divmod.
please use Numeric#remainder because mod is not remainder
Modulo:
5.modulo(3)
#=> 2
5.modulo(-3)
#=> -1
Remainder:
5.remainder(3)
#=> 2
5.remainder(-3)
#=> 2
here is the link discussing the problem
https://rob.conery.io/2018/08/21/mod-and-remainder-are-not-the-same/

Resources