Ruby - Grouping and Projection

Ruby - Grouping and Projection - ruby

Coming from c#, I'm used to being able to do the following using LINQ
var people = new List<Person>
{
new Person(Race.Black, "Mitch",30),
new Person(Race.White, "Mike",30),
new Person(Race.Mexican, "Mel",30),
};
var groups = people.GroupBy (p => p.Race)
.Select(g => new {race = g.Key, person = g});
Moving into Ruby, I would like to do grouping and projection into a hash, but is there an out of the box method for this or do I need to roll my own? Here's my implementation but it'd be great if this was offered in the language, or a 3rd party library that offered an implementation
def group(arr,group_sym)
groups = {}
arr.each do |i|
race = i[group_sym]
groups[race] = [] unless groups.has_key?(race)
i.delete(group_sym)
groups[race].push(i)
end
groups
end
Edit: So what I'm expecting from this is the following:
input:
people = [{name: 'mike', race: 'white', age: 30},
{name: 'mel', race: 'white', age: 31},
{name: 'mitch', race: 'black', age: 30},
{name: 'megan', race: 'asian', age: 30},
{name: 'maebe', race: 'black', age: 30},]
function call:
groupedPeople = groupBy(people,'race')
returns:
[{'white' => [{name: 'mike', age: 30},
{name: 'mel', race: 'white'}],
{'black' => [{...black people}],
{'asian' => [{...asian people}]
}]
For this specific example, I'd want to get a hash where my people array is grouped by race

Because of the fact that C#'s query expressions are meant to look like SQL queries, the method names are a bit unusual compared to other languages: Select is usually called map, Aggregate is usually called fold or reduce, Where is usually called select or filter, etc.
If you simply translate the method names, you can almost literally translate your code to Ruby:
Person = Struct.new(:race, :name, :age)
people = [
Person.new(:black, 'Mitch', 30),
Person.new(:white, 'Mike', 30),
Person.new(:mexican, 'Mel', 30)
]
groups = people.group_by(&:race).map {|race, people| { race: race, person: people } }
I used a Hash as the closest replacement for IGrouping.

Related

Bar Chart on Dimension-1 and Stacked by Dimension-2

Summary
I want to display a bar chart whose dimension is days and is stacked by a different category (i.e. x-axis = days and stack = category-1). I can do this "manually" in that I can write if-then's to zero or display the quantity, but I'm wondering if there's a systematic way to do this.
JSFiddle https://jsfiddle.net/wostoj/rum53tn2/
Details
I have data with dates, quantities, and other classifiers. For the purpose of this question I can simplify it to this:
data = [
{day: 1, cat: 'a', quantity: 25},
{day: 1, cat: 'b', quantity: 15},
{day: 1, cat: 'b', quantity: 10},
{day: 2, cat: 'a', quantity: 90},
{day: 2, cat: 'a', quantity: 45},
{day: 2, cat: 'b', quantity: 15},
]
I can set up a bar chart, by day, that shows total units and I can manually add the stacks for 'a' and 'b' as follows.
var dayDim = xf.dimension(_ => _.day);
var bar = dc.barChart("#chart");
bar
.dimension(dayDim)
.group(dayDim.group().reduceSum(
_ => _.cat === 'a' ? _.quantity : 0
))
.stack(dayDim.group().reduceSum(
_ => _.cat === 'b' ? _.quantity : 0
));
However, this is easy when my data has only 2 categories, but I'm wondering how I'd scale this to 10 or an unknown number of categories. I'd imagine the pseudo-code I'm trying to do is something like
dc.barChart("#chart")
.dimension(xf.dimension(_ => _.day))
.stackDim(xf.dimension(_ => _.cat))
.stackGroup(xf.dimension(_ => _.cat).group().reduceSum(_ => _.quantity));

I mentioned this in my answer to your other question, but why not expand on it a little bit here.
In the dc.js FAQ there is a standard pattern for custom reductions to reduce more than one value at once.
Say that you have a field named type which determines which type of value is in the row, and the value is in a field named value (in your case these are cat and quantity). Then
var group = dimension.group().reduce(
function(p, v) { // add
p[v.type] = (p[v.type] || 0) + v.value;
return p;
},
function(p, v) { // remove
p[v.type] -= v.value;
return p;
},
function() { // initial
return {};
});
will reduce all the rows for each bin to an object where the keys are the types and the values are the sum of values with that type.
The way this works is that when crossfilter encounters a new key, it first uses the "initial" function to produce a new value. Here that value is an empty object.
Then for each row it encounters which falls into the bin labelled with that key, it calls the "add" function. p is the previous value of the bin, and v is the current row. Since we started with a blank object, we have to make sure we initialize each value; (p[v.type] || 0) will make sure that we start from 0 instead of undefined, because undefined + 1 is NaN and we hate NaNs.
We don't have to be as careful in the "remove" function, because the only way a row will be removed from a bin is if it was once added to it, so there must be a number in p[v.type].
Now that each bin contains an object with all the reduced values, the stack mixin has helpful extra parameters for .group() and .stack() which allow us to specify the name of the group/stack, and the accessor.
For example, if we want to pull items a and b from the objects for our stacks, we can use:
.group(group, 'a', kv => kv.value.a)
.stack(group, 'b', kv => kv.value.b)
It's not as convenient as it could be, but you can use these techniques to add stacks to a chart programmatically (see source).

Eloquent with(), select() and where() selection

This is my query to fetch data from 2 different table.
$variant = Variant::with(['v_detail' => function($q){
$q->select('variant_dtl_name');
}])->where('product_id','=',$productId)->get();
There is output, but v_detail returning empty list
result:
created_at: "2015-11-07 12:37:26"
id: 1
product_id: 30
updated_at: "2015-11-07 12:37:26"
v_detail: []
variant_name: "Pricing"
But with these query:
$variant = Variant::with('v_detail')->where('product_id','=',$productId)->get();
The result is:
created_at: "2015-11-07 12:37:26"
id: 1
product_id: 30
updated_at: "2015-11-07 12:37:26"
v_detail: [{id: 1, variant_id: 1, variant_dtl_name: "Adult", variant_dtl_price: 25,…},…]
0: {id: 1, variant_id: 1, variant_dtl_name: "Adult", variant_dtl_price: 25,…}
1: {id: 2, variant_id: 1, variant_dtl_name: "Senior", variant_dtl_price: 15,…}
2: {id: 3, variant_id: 1, variant_dtl_name: "Children", variant_dtl_price: 8,…}
variant_name: "Pricing"
Now, on the query that work, how can I fetch a specific column names. Thanks!!

You have this:
$variant = Variant::with(['v_detail' => function($q)
{
// Either add the related foreign key or select all
$q->select('related_foreign_key', 'variant_dtl_name');
}])->where('product_id','=',$productId)->get();
Since you are selecting only a single field which is variant_dtl_name then it's not possible to find out the related models because the relation builder foreign key is required. So, you have to select that foreign key as well. Notice the related_foreign_key in sub-query so use the right one, probably that is variant_id but not sure because you didn't mention anything about that.

Trouble with Linq in group by query

Maybe someone knows how to achieve this kind of query in linq (or lambda).
I have this set in a list
Filter: My input will be code 100 and 101, I need to get the "values", in this example = 1, 2.
Problem: If you input 100 and 101, you´ll get 3 results, because of 100 from group 1 and group 2. I just need the pair that matches in the same group. (And you don´t have group as an input param)
How can I solve this if the group fully exists?
thanks!

Starting with a simple representation in code of what you have in a picture:
var list = new[]
{
new{code = 100, value = 1, group = 1},
new{code = 101, value = 2, group = 1},
new{code = 100, value = 3, group = 2},
new{code = 103, value = 4, group = 2},
};
var inp = new[]{100, 103};
Then we can do:
list
.GroupBy(el => el.group) // Group by the "group" field.
.Where(grp => !inp.Except(grp.Select(el => el.code)).Any()) // Exclude groups that don't contain all input values
.Single() // Obtain the only such group (with a check that there is only one)
.Select(el => el.value); // Obtain the "value" fields.
If you could perhaps have inputs that were a subset of the "code" fields of some groups, you could also check that you match all of the group completely by excluding groups which have a different size:
list
.GroupBy(el => el.group)
.Where(grp =>
grp.Count‎() == inp.Count()
&& !inp.Except(grp.Select(el => el.code)).Any())
.Single()
.Select(el => el.value);
There are other variations that match other possible interpretations of your question (e.g. I'm assuming there can be only one matching group, but that wasn't clear).

neo4j complex hieharchical Cypher query slow performance

I have a complex hierarchical query which looks like this:
MATCH (comp:Company {id: 7})-[:HAS_SPACE]->(s:Space)-[:HAS_BOARD]->(b),
(col)<-[:HAS_COLUMN]-(b)-[:HAS_LANE]->(lane)
OPTIONAL MATCH (b)-[:HAS_CARD]->(card {archived: false}),
(cardCol:Column)-[:HAS_CARD]->(card {archived: false})<-[:HAS_CARD]-(cardLane:Lane)
WITH s, b, col, lane, { id: card.id, title: card.title, sort_order: card.sort_order, column_id: cardCol.id, lane_id: cardLane.id } as crd
WITH s, { id: b.id, title: b.title, left: b.left, top: b.top,
columns: collect(distinct {id: col.id, title: col.title, col_count: col.col_count, sort_order: col.sort_order}),
lanes: collect(distinct {id: lane.id, title: lane.title, row_count: lane.row_count, sort_order: lane.sort_order}),
cards: collect(distinct crd)} as brd
RETURN {id: s.id, title: s.title, boards: collect(distinct brd)}
This query slows down to 10 sec when number of cards becomes about 200. What is the problem with it and also how can I profile it? Looks like there is a PROFILE keyword, but the output doesn't look like really informative. Btw, we are using GrapheneDB on heroku.

I think one issue you have with this query is the combinatorial explosion along the paths, you can help cypher a bit (the next version will be cleverer about it).
Also where is your "optional relationship" ? between board and card?
create index on :Company(id);
MATCH (comp:Company {id: 7})-[:HAS_SPACE]->(s:Space)-[:HAS_BOARD]->(b)
WITH distinct s, b
MATCH (col)<-[:HAS_COLUMN]-(b)-[:HAS_LANE]->(lane)
OPTIONAL MATCH (b)-[:HAS_CARD]->(card {archived: false})
WITH distinct s, b, col, lane, b, card
MATCH (cardCol:Column)-[:HAS_CARD]->(card {archived: false})<-[:HAS_CARD]-(cardLane:Lane)
WITH s, b, col, lane,
{ id: card.id, title: card.title, sort_order: card.sort_order,
column_id: cardCol.id, lane_id: cardLane.id } as crd
WITH s,
{ id: b.id, title: b.title, left: b.left, top: b.top,
columns: collect(distinct {id: col.id, title: col.title,
col_count: col.col_count, sort_order: col.sort_order}),
lanes: collect(distinct {id: lane.id, title: lane.title, row_count: lane.row_count,
sort_order: lane.sort_order}),
cards: collect(distinct crd)} as brd
RETURN {id: s.id, title: s.title, boards: collect(distinct brd)}
It helps to analyse the different parts of the query separately and see where the combinatorial explosion kicks in. Then fix the cardinality of that piece back with distinct.
You can also try the new query planner by prefixing your query with cypher 2.1.experimental

After some research found out that this query runs 20 times faster if we "denormalize" nodes a little bit adding lane_id and column_id to card. Still it is not the fastest solution and I don't like this denormalization which eliminates relations. So I would appreciate any other solutions

How to make a rethinkdb atomic update if document exists, insert otherwise?

How to make a rethinkdb atomic update if document exists, insert otherwise?
I want to do something like:
var tab = r.db('agflow').table('test');
r.expr([{id: 1, n: 0, x: 11}, {id: 2, n: 0, x: 12}]).forEach(function(row){
var _id = row('id');
return r.branch(
tab.get(_id).eq(null), // 1
tab.insert(row), // 2
tab.get(_id).update(function(row2){return {n: row2('n').add(row('n'))}}) // 3
)})
However this is not fully atomic, because between the time when we check if document exists (1) and inserting it (2) some other thread may insert it.
How to make this query atomic?

I think the solution is passing
conflict="update"
to the insert method.
Als see RethinkDB documentation on insert

To make the inner part atomic, you can use replace:
tab.get(_id).replace(function(row2){
return r.expr({id: _id, n: row2('n').add(row('n'))})
.default(row)
})
Using replace is the only way at the moment to perform fully atomic upserts with non-trivial update operations.

This is a standard database operation called an "upsert".
Does RethinkDB support upserts?
Another question from Sergio
Tulentsev. The first public release didn't include support for
upserts, but now it can be done by passing an extra flag to insert:
r.table('marvel').insert({
superhero: 'Iron Man',
superpower: 'Arc Reactor'
}, {upsert: true}).run()
When set to true, the new document
from insert will overwrite the existing one.
cite: Answers to common questions about RethinkDB

OK, I figured out how to do it.
The solution is to firstly try insert, and if it fail we make update:
var tab = r.db('agflow').table('test');
tab.delete();
r.expr([{id: 1, n: 0, x: 11}, {id: 2, n: 0, x: 12}]).forEach(function(row){
var _id = row('id');
var res = tab.insert(row);
return r.branch(
res('inserted').eq(1),
res,
tab.get(_id).update(function(row2){return {n: row2('n').add(1)}})
)})

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby - Grouping and Projection - ruby

Related

Bar Chart on Dimension-1 and Stacked by Dimension-2

Eloquent with(), select() and where() selection

Trouble with Linq in group by query

neo4j complex hieharchical Cypher query slow performance

How to make a rethinkdb atomic update if document exists, insert otherwise?

Categories

Resources