I have some (simplified) data as follows:
{ "PO": 1353901, "Qty": 1, "Levels": 3 },
{ "PO": 1353901, "Qty": 2, "Levels": 3 },
{ "PO": 50048309,"Qty": 1, "Levels": 1 },
{ "PO": 50048309,"Qty": 4, "Levels": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1 }
You see here data for two purchase orders, each row representing a unique product and how much of it was used. You also see how many levels those products were spread out over.
A dimension to help understand cost is material density. That is, how many items were used per level. In the case of 1353901 there were three items used on three levels (Qty gets aggregated, Levels do not), resulting in one item per level.
For 50048309 there are six items used on one level, showing a much higher implant density. This tells me it was a lot of work focused in one place.
Filtering on flat data is easy, and not hard to group into ranges. Take Levels for example:
var levels = ndx.dimension(function (d) {
var level = d.Levels;
if (level == 1) {
return 'One';
} else if (level == 2) {
return 'Two';
} else if (level == 3) {
return 'Three';
} else {
return 'Four +';
}
});
I can easily create groups and ranges within a dimension.
What I cannot seem to do is the exact same thing for aggregates. I want to look at (filter) PO by number of materials used per level. It's not a hard figure to get per purchase order, but it seems hard to look at in groups. Example Below:
https://jsfiddle.net/efefdtcj/2/
Since I started with a Dimension based on aggregating at the PO, I'm getting a row back for each PO.
How do I get one row back per QtyPerLevel range?
I think you need to pre-calculate. That is, add a new property to each PO line with the value of the total quantity in the PO. While you're at it, you might as well calculate QtyPerLevel as well:
{ "PO": 1353901, "Qty": 1, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 1353901, "Qty": 2, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 4, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 }
Then create a Crossfilter dimension on QtyPerLevel and filter or group on that:
var ndx = crossfilter([
{ "PO": 1353901, "Qty": 1, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 1353901, "Qty": 2, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 4, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 }]);
var qtyPerLevelDim = ndx.dimension(function(d) { return d.QtyPerLevel; });
var qtyPerLevelGrp = qtyPerLevelDim.group();
Related
It is possible to make a search by the results of another search?. For example:
// index: A
{ "ID": 1, "status": "done" }
{ "ID": 2, "status": "processing" }
{ "ID": 3, "status": "done" }
{ "ID": 4, "status": "done" }
// index: B
{ "ID": 1, "user": 1, "value": 10 }
{ "ID": 1, "user": 2, "value": 3 }
{ "ID": 2, "user": 1,"value": 1 }
{ "ID": 3, "user": 1, "value": 3 }
{ "ID": 4, "user": 1, "value": 7 }
Q1: Search in index "A" status == "done" and return the ID
RES: 1,3,4
Q2: From the results in Q1 search value > 5 and return the ID
RES: 1,4
My current solution is use two queries and download the results of "Q1" and make a second search in "Q2" but is very complicated because have 30k of results.
the problem to me seems to be more of a traditional union of filters in 2 indexes sort of a join , what we have in relational databases , not sure of the exact solution but recently had used a plug-in for the joins -> https://siren.io/siren-federate-20-0-introducing-a-scalable-inner-join-for-elasticsearch/ this might help
hi I want to use whereIn to get data from a table :
$values = DB::table('attribute_product')->whereIn('value_id' , [1,5])->get();
but I want to get columns that have all [1,5] items not just one of the array item
my table data :
{
"attribute_id": 1,
"product_id": 1,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 1,
"value_id": 2
},
{
"attribute_id": 13,
"product_id": 1,
"value_id": 3
},
{
"attribute_id": 14,
"product_id": 1,
"value_id": 4
},
{
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
{
"attribute_id": 13,
"product_id": 8,
"value_id": 10
},
{
"attribute_id": 14,
"product_id": 8,
"value_id": 11
}
I want just return that have both value_ids [1,5]:
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
but that code I wrote above returns:
{
"attribute_id": 1,
"product_id": 1,
"value_id": 1
},
"attribute_id": 1,
"product_id": 8,
"value_id": 1
},
{
"attribute_id": 12,
"product_id": 8,
"value_id": 5
},
This should work:
$values = [1, 5];
$filtered = DB::table('attribute_product')
->whereIn('value_id', $values)
->get()
->groupBy('product_id')
->filter(function ($product) use ($values) {
return $product->pluck('value_id')
->intersect($values)
->count() === count($values)
})
->flatten();
PS: I don't like this solution too much since it does the calculation in memory. You should make use of relationships to do this at database level.
You can Use Laravel groupBy
$values = DB::table('attribute_product')->orderBy('product_id', 'desc')->whereIn('value_id', [1, 5])->groupBy('value_id')->get();
I recreated the database sample you shared:
INSERT INTO
attribute_product(attribute_id, product_id, value_id)
VALUES
(1, 1, 1), (12, 1, 2), (13, 1, 3), (14, 1, 4), (1, 8, 1), (12, 8, 5), (13, 8, 10), (14, 8, 11);
Came out with this raw query:
SELECT
attribute_product.*,
SUM(value_id) as filter
FROM
`attribute_product`
WHERE
value_id IN(1, 5)
GROUP BY
product_id
HAVING
filter = 6;
Then built the query with the Illuminate\Database\Query\Builder
$filter = [1, 5];
DB::table('attribute_product')
->groupBy('product_id')
->select([
'attribute_product.*',
DB::raw("SUM(value_id) as filter"),
])
->whereIn('value_id', $filter)
->having('filter', '=', array_sum($filter))
->get();
This solution gets completely managed by the database engine which avoid your server the load of manipulating Collections.
Opinion
I feel that this is a tricky way to reach your goal, which imply for me that your database design doesn't fit very well the business logic/use cases.
I think that a good database design helps doing complex data retrieve with simple queries (Using joins of course) or intuitive Eloquent Relationships
OLD ANSWER
$filter_value_id = [1, 5];
$values_by_product = DB::table('attribute_product')
->whereIn('value_id', $filter_value_id)
->get()
->groupBy('product_id');
foreach ($values_by_product as $product => $value) {
echo "product id: $product<br>";
if ($value->count() === sizeof($filter_value_id))
dump($value);
}
I have the following data
[{"devcount" : 1 , "dayofweek" :0, "hour" : 1 },
{"devcount" : 2 , "dayofweek" :0, "hour" : 2 },
{"devcount" : 3 , "dayofweek" :1, "hour" : 2 },
{"devcount" : 4 , "dayofweek" :1, "hour" : 3 },
{"devcount" : 6 , "dayofweek" :1, "hour" : 4 },
{"devcount" : 5 , "dayofweek" :1, "hour" : 5 },
{"devcount" : 7 , "dayofweek" :2, "hour" : 5 },
{"devcount" : 8 , "dayofweek" :2, "hour" : 6 },
{"devcount" : 9 , "dayofweek" :2, "hour" : 7 },
{"devcount" : 10 , "dayofweek" :2, "hour" : 9 }]
It is required to compare the devcount with the group average of devcount for each dayofweek.
i.e. for the fist row, devcount=1 is to be compared with the the average device count for the dayofweek-0 (= 1.5) and "yes" to be returned if the devcount is lesser. Else "No" should be returned.
I have coded as below.
smry=d3.nest()
.key( function(d) { return d.dayofweek;})
.rollup(function(d) {return d3.mean(d, function(g) {return g.devcount; })})
.entries(result);
I am not sure how to compare the smry data and the original data.
The original data will be used in selectAll for creating rectangles and the output after comparison needs for determining the colour of the rectangle
You can do it as shown in the snippet below.
test = [{
"devcount": 1,
"dayofweek": 0,
"hour": 1
}, {
"devcount": 2,
"dayofweek": 0,
"hour": 2
},
{
"devcount": 3,
"dayofweek": 1,
"hour": 2
}, {
"devcount": 4,
"dayofweek": 1,
"hour": 3
}, {
"devcount": 6,
"dayofweek": 1,
"hour": 4
}, {
"devcount": 5,
"dayofweek": 1,
"hour": 5
},
{
"devcount": 7,
"dayofweek": 2,
"hour": 5
}, {
"devcount": 8,
"dayofweek": 2,
"hour": 6
}, {
"devcount": 9,
"dayofweek": 2,
"hour": 7
}, {
"devcount": 10,
"dayofweek": 2,
"hour": 9
}
];
//make the summary using nest
smry = d3.nest()
.key(function(d) {
return d.dayofweek;
})
.rollup(function(d) {
return d3.mean(d, function(g) {
return g.devcount;
})
})
.entries(test);
test.forEach(function(t) {
//find the value from summary for dayofweek
var k = smry.find(function(s) {
return (s.key == t.dayofweek)
});
//check the day of week with the mean, set the flag in the data
if(k.values<t.devcount){
t.flag = true;
} else {
t.flag = false;
}
});
console.log(test);//this now has the flag to determine the color
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>
I've noticed something strange with compound indexes in the between function in RethinkDB. It seems to retrieve results that don't match the query. It's all detailed below.
r.dbCreate('test')
r.db('test').tableCreate('numbers')
r.db('test').table('numbers').insert([
{ first: 1, second: 1 },
{ first: 1, second: 2 },
{ first: 1, second: 3 },
{ first: 1, second: 4 },
{ first: 1, second: 5 },
{ first: 2, second: 1 },
{ first: 2, second: 2 },
{ first: 2, second: 3 },
{ first: 2, second: 4 },
{ first: 2, second: 5 },
{ first: 3, second: 1 },
{ first: 3, second: 2 },
{ first: 3, second: 3 },
{ first: 3, second: 4 },
{ first: 3, second: 5 },
{ first: 4, second: 1 },
{ first: 4, second: 2 },
{ first: 4, second: 3 },
{ first: 4, second: 4 },
{ first: 4, second: 5 },
{ first: 5, second: 1 },
{ first: 5, second: 2 },
{ first: 5, second: 3 },
{ first: 5, second: 4 },
{ first: 5, second: 5 }
])
r.db('test').table('numbers').indexCreate(
"both", [r.row("first"), r.row("second")])
r.db('test').table('numbers').orderBy({index :'both'}).between(
[2, 3], [3, 5], {index: 'both', rightBound: 'closed'}).without('id')
// output
{ "first": 3 ,
"second": 3
} // ok
{ "first": 3 ,
"second": 4
} // ok
{ "first": 2 ,
"second": 5
} // ok
{ "first": 3 ,
"second": 1
} // not ok
{ "first": 3 ,
"second": 5
} // ok
{ "first": 3 ,
"second": 2
} // not ok
{ "first": 2 ,
"second": 3
} // ok
{ "first": 2 ,
"second": 4
} // ok
The array in the query doesn't appear to act like an AND or an OR. Am I missing something or is this a bug?
Ok so thanks to some help from originalexe over on the Slack channel I've figured this out. It's behaving as normal and essentially the array is treated as a single value and the query returns all values that are between the two in an ordered list.
Let's say I have the following array
var data = [{ id: 0, points: 1 }, { id: 1, points: 2 }]
I would like to update my table which contains
{
"doc-1": {
"id": "abcxyz123",
"entries": [
{ "id": 0, "points": 5 },
{ "id": 1, "points": 3 },
{ "id": 2, "points": 0 }
]
}
}
so that I add the points-field in the data array to the points-field for each element in the "entries" array in "doc-1" that matches the corresponding id in the data array. The end result would look like:
{
"doc-1": {
"id": "abcxyz123",
"entries": [
{ "id": 0, "points": 6 },
{ "id": 1, "points": 4 },
{ "id": 2, "points": 0 }
]
}
}
How do I go about to write such a query in ReQL?
I assume that the actual document in the table looks like this for now:
{
"id": "abcxyz123",
"entries": [{
"id": 0,
"points": 5
}, {
"id": 1,
"points": 3
}, {
"id": 2,
"points": 0
}]
}
That is without the doc-1 nesting.
Then your update can be done like this:
r.table('t1').update(
{
entries: r.row('entries').map(function(e) {
return r.do(r.expr(data)('id').indexesOf(e('id')), function(dataIndexes) {
return r.branch(
dataIndexes.isEmpty(),
e,
{
id: e('id'),
points: e('points').add(r.expr(data)(dataIndexes(0))('points'))
});
});
})
})
I'm using map to map over each entry in entries, and indexesOf to find the corresponding entry in data if it exists.
Note that this doesn't add new entries to the entries list, but only updates existing ones. Please let me know if you need to add new entries as well.
If your documents actually have the doc-1 field first, this query should do the job:
r.table('t1').update(
{ 'doc-1':
{
entries: r.row('doc-1')('entries').map(function(e) {
return r.do(r.expr(data)('id').indexesOf(e('id')), function(dataIndexes) {
return r.branch(
dataIndexes.isEmpty(),
e,
{
id: e('id'),
points: e('points').add(r.expr(data)(dataIndexes(0))('points'))
});
});
})
}
})