Combining multiple array/hash selects - ruby

I have the following code:
sum = array_of_hashes.select{ |key| (date_range).include? Date.parse(key[:created])}.map { |h| h[:amount] }.sum
size = array_of_hashes.select{ |key| (date_range).include? Date.parse(key[:created])}.size
total = sum / size
sum selects all hashes with a date that is inside a date range and the adds up all the values of the :amount key.
size counts the number of hashes that are in the date range.
total divides the sum by the size.
How can I combine those so it's not 3 separate items?

I think it's as simple as:
selected = array_of_hashes.select { ... }
avarage = selected.map { ... }.sum / selected.size
Note: using include? with ranges of dates is pretty inefficient since it needs to traverse the whole dates range, I suggest to use cover? instead.

There is really no nice way of doing this more compact. One alternative could be the following:
average = (selected = array_of_hashes.select { ... }.map { ... }).sum/selected.size.to_f

Related

Elixir/Phoenix sum of the column

I'm trying to get the sum of the particular column.
I have a schema of orders, with the field total, that stores the total price.
Now I'm trying to created a query that will sum total value of all the orders, however not sure if I'm doing it right.
Here is what i have so far:
def create(conn, %{"statistic" => %{"date_from" => %{"day" => day_from, "month" => month_from, "year" => year_from}}}) do
date_from = Ecto.DateTime.cast!({{year_from, month_from, day_from}, {0, 0, 0, 0}})
revenue = Repo.all(from p in Order, where: p.inserted_at >= ^date_from, select: sum(p.total))
render(conn, "result.html", revenue: revenue)
end
And just calling it like <%= #revenue %> in the html.eex.
As of right now, it doesn't return errors, just renders random symbol on the page, instead of the total revenue.
I think my query is wrong, but couldn't find good information about how to make it work properly. Any help appreciated, thanks!
Your query returns just 1 value, and Repo.all wraps it in a list. When you print a list using <%= ... %>, it treats integers inside the list as Unicode codepoints, and you get the character with that codepoint as output on the page. The fix is to use Repo.one instead, which will return the value directly, which in this case is an integer.
revenue = Repo.one(from p in Order, where: p.inserted_at >= ^date_from, select: sum(p.total))
#Dogbert's answer is correct. It is worth noting that if you are using Ecto 2.0 (currently in release candidate) then you can use Repo.aggregate/4:
revenue = Repo.aggregate(from p in Order, where: p.inserted_at >= ^date_from, :sum, :total)

How to deal with DISTINCT with salt

I follow a help How to handle spill memory in pig from alexeipab, it really works fine, but I have another question now, same sample code:
pymt = LOAD 'pymt' USING PigStorage('|') AS ($pymt_schema);
pymt_grp_with_salt = GROUP pymt BY (key,salt)
results_with_salt = FOREACH pymt_grp {
--distinct
mid_set = FILTER pymt BY xxx=='abc';
mid_set_result = DISTINCT mid_set.yyy;
result = COUNT(mid_set_result)
}
pymt_grp = GROUP results_with_salt BY key;
result = FOREACH pymt_grp {
GENERATE SUM(results_with_salt.result); --it is WRONG!!
}
I can't use sum in that group, which it will be very different from result that calculated without salt.
is there any solution? if filter first, it will cost many JOIN job, and slow down the performance.
For this to work, you need to have many to one relationship between mid_set.yyy and salt, so that same value for mid_set.yyy from different rows is mapped into the same value of salt. If it is not, than that value of mid_set.yyy will appear in different bags produced by GROUP pymt BY (key, salt), survive DISTINCT in different salts, thus are included multiple times in the final rollup. That is why you can get wrong results when using salts and COUNT of DISTINCT.
An easy way could be to replace salt with mid_set.yyy itself or to write a UDF/static method which calculates salt by taking hash of mid_set.yyy and does mod N, where N could be 1 to infinity, for best distribution N should be a prime number.
Thanks alexeipab, you give me a great help, what i do as below
pymt = LOAD 'pymt' USING PigStorage('|') AS ($pymt_schema);
pymt = FOREACH pymt GENERATE *, (yyy%$prime_num) as salt;
pymt_grp_with_salt = GROUP pymt BY (key,salt);
It works!!
if yyy is num integer, you can use hash to convert string or others to a integer

Looking up an array of keys on an array of hashes in ruby like excel vlookup

This post is very similar to my previous one, but the data structures are different here:
Joining an array of keys to a hash with key value pairs like excel vlookup
My data from my Mysql2::Result comes back like this array of hashes:
data = [{"isbn" => "1234", "title"=>"apple"},{"isbn" => "5678", "title"=>"banana"},{"isbn" => "2121", "title"=>"car"}]
And my original list of isbns that I would like to compare is this array:
isbns = ["1234","2121", "5454", "5678"]
I'm seeking a function which uses the isbns array and returns a result like this:
result = [{"isbn"=>"1234","title"=>"apple"}, {"isbn"=> "2121", "title"=>"car"}, nil, {"isbn"=>"5678","title"=>"banana"}]
The "driving" array is the isbns... imagine doing a vlookup from isbns to data ... any items that are not in data, but in isbns should return nil. The original order of isbns should be returned, and the return data should be an array of hashes.
isbns.map { |isbn| data.find { |h| h["isbn"] == isbn} }
#=> [{"isbn"=>"1234", "title"=>"apple"}, {"isbn"=>"2121", "title"=>"car"}, nil, {"isbn"=>"5678", "title"=>"banana"}]
#Michael Kohl's answer is succinct and correct. However if these data sets are big, it's inefficient O(n*m/2). An alternative is to transform the data vector into a hash in O(m) then do the map in O(n) for a runtime of O(n+m).
data_lookup = data.inject({}) {|m,v| m[v["isbn"]] = v; m} # O(data.size)
result = isbns.map { |isbn| data_lookup[isbn] } # O(isbns.size)
If your data and isbn collections were of size 1000 each, this would be faster by a factor of 250.

How to get rows as Arrays (not Hashes) in Sequel ORM?

In the Sequel ORM for Ruby, the Dataset class has an all method which produces an Array of row hashes: each row is a Hash with column names as keys.
For example, given a table T:
a b c
--------------
0 22 "Abe"
1 35 "Betty"
2 58 "Chris"
then:
ds = DB['select a, b, c from T']
ah = ds.all # Array of row Hashes
should produce:
[{"a":0,"b":22,"c":"Abe"},{"a":1,"b":35,"c":"Betty"},{"a":2,"b":58,"c":"Chris"}]
Is there a way built in to Sequel to instead produce an Array of row Arrays, where each row is an array of only the values in each row in the order specified in the query? Sort of how select_rows works in ActiveRecord? Something like this:
aa = ds.rows # Array of row Arrays
which would produce:
[[0,22,"Abe"],[1,35,"Betty"],[2,58,"Chris"]]
Note: the expression:
aa = ds.map { |h| h.values }
produces an array of arrays, but the order of values in the rows is NOT guaranteed to match the order requested in the original query. In this example, aa might look like:
[["Abe",0,22],["Betty",1,35],["Chris",2,58]]
Old versions of Sequel (pre 2.0) had the ability in some adapters to return arrays instead of hashes. But it caused numerous issues, nobody used it, and I didn't want to maintain it, so it was removed. If you really want arrays, you need to drop down to the connection level and use a connection specific method:
DB.synchronize do |conn|
rows = conn.exec('SQL Here') # Hypothetical example code
end
The actual code you need will depend on the adapter you are using.
DB[:table].where().select_map(:id)
If you want just an array of array of values...
DB['select * from T'].map { |h| h.values }
seems to work
UPDATE given the updated requirement of the column order matching the query order...
cols= [:a, :c, :b]
DB[:T].select{cols}.collect{ |h| cols.collect {|c| h[c]}}
not very pretty but guaranteed order is the same as the select order.
There does not appear to be a builtin to do this.
You could make a request for the feature.
I haven't yet found a built-in method to return an array of row arrays where the values in the row arrays are ordered by the column order in the original query. The following function does* although I suspect an internal method could be more effecient:
def rows( ds )
ret = []
column_keys = ds.columns # guaranteed to match query order?
ds.all { |row_hash|
row_array = []
column_keys.map { |column_key| row_array << row_hash[column_key] }
ret << row_array
}
ret
end
*This function depends on the order of the array returned by Dataset.columns. If this order is undefined, then this rows function isn't very useful.
have you tried this?
ds = DB['select a, b, c from T'].to_a
not sure it it works but give it a shot.

Lua - Sorting a table alphabetically

I have a table that is filled with random content that a user enters. I want my users to be able to rapidly search through this table, and one way of facilitating their search is by sorting the table alphabetically. Originally, the table looked something like this:
myTable = {
Zebra = "black and white",
Apple = "I love them!",
Coin = "25cents"
}
I was able to implement a pairsByKeys() function which allowed me to output the tables contents in alphabetical order, but not to store them that way. Because of the way the searching is setup, the table itself needs to be in alphabetical order.
function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do
table.insert(a, n)
end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then
return nil
else
return a[i], t[a[i]]
end
end
return iter
end
After a time I came to understand (perhaps incorrectly - you tell me) that non-numerically indexed tables cannot be sorted alphabetically. So then I started thinking of ways around that - one way I thought of is sorting the table and then putting each value into a numerically indexed array, something like below:
myTable = {
[1] = { Apple = "I love them!" },
[2] = { Coin = "25cents" },
[3] = { Zebra = "black and white" },
}
In principle, I feel this should work, but for some reason I am having difficulty with it. My table does not appear to be sorting. Here is the function I use, with the above function, to sort the table:
SortFunc = function ()
local newtbl = {}
local t = {}
for title,value in pairsByKeys(myTable) do
newtbl[title] = value
tinsert(t,newtbl[title])
end
myTable = t
end
myTable still does not end up being sorted. Why?
Lua's table can be hybrid. For numerical keys, starting at 1, it uses a vector and for other keys it uses a hash.
For example, {1="foo", 2="bar", 4="hey", my="name"}
1 & 2, will be placed in a vector, 4 & my will be placed in a hashtable. 4 broke the sequence and that's the reason for including it into the hashtable.
For information on how to sort Lua's table take a look here: 19.3 - Sort
Your new table needs consecutive integer keys and needs values themselves to be tables. So you want something on this order:
SortFunc = function (myTable)
local t = {}
for title,value in pairsByKeys(myTable) do
table.insert(t, { title = title, value = value })
end
myTable = t
return myTable
end
This assumes that pairsByKeys does what I think it does...

Resources