I've noticed something strange with compound indexes in the between function in RethinkDB. It seems to retrieve results that don't match the query. It's all detailed below.
r.dbCreate('test')
r.db('test').tableCreate('numbers')
r.db('test').table('numbers').insert([
{ first: 1, second: 1 },
{ first: 1, second: 2 },
{ first: 1, second: 3 },
{ first: 1, second: 4 },
{ first: 1, second: 5 },
{ first: 2, second: 1 },
{ first: 2, second: 2 },
{ first: 2, second: 3 },
{ first: 2, second: 4 },
{ first: 2, second: 5 },
{ first: 3, second: 1 },
{ first: 3, second: 2 },
{ first: 3, second: 3 },
{ first: 3, second: 4 },
{ first: 3, second: 5 },
{ first: 4, second: 1 },
{ first: 4, second: 2 },
{ first: 4, second: 3 },
{ first: 4, second: 4 },
{ first: 4, second: 5 },
{ first: 5, second: 1 },
{ first: 5, second: 2 },
{ first: 5, second: 3 },
{ first: 5, second: 4 },
{ first: 5, second: 5 }
])
r.db('test').table('numbers').indexCreate(
"both", [r.row("first"), r.row("second")])
r.db('test').table('numbers').orderBy({index :'both'}).between(
[2, 3], [3, 5], {index: 'both', rightBound: 'closed'}).without('id')
// output
{ "first": 3 ,
"second": 3
} // ok
{ "first": 3 ,
"second": 4
} // ok
{ "first": 2 ,
"second": 5
} // ok
{ "first": 3 ,
"second": 1
} // not ok
{ "first": 3 ,
"second": 5
} // ok
{ "first": 3 ,
"second": 2
} // not ok
{ "first": 2 ,
"second": 3
} // ok
{ "first": 2 ,
"second": 4
} // ok
The array in the query doesn't appear to act like an AND or an OR. Am I missing something or is this a bug?
Ok so thanks to some help from originalexe over on the Slack channel I've figured this out. It's behaving as normal and essentially the array is treated as a single value and the query returns all values that are between the two in an ordered list.
Related
It is possible to make a search by the results of another search?. For example:
// index: A
{ "ID": 1, "status": "done" }
{ "ID": 2, "status": "processing" }
{ "ID": 3, "status": "done" }
{ "ID": 4, "status": "done" }
// index: B
{ "ID": 1, "user": 1, "value": 10 }
{ "ID": 1, "user": 2, "value": 3 }
{ "ID": 2, "user": 1,"value": 1 }
{ "ID": 3, "user": 1, "value": 3 }
{ "ID": 4, "user": 1, "value": 7 }
Q1: Search in index "A" status == "done" and return the ID
RES: 1,3,4
Q2: From the results in Q1 search value > 5 and return the ID
RES: 1,4
My current solution is use two queries and download the results of "Q1" and make a second search in "Q2" but is very complicated because have 30k of results.
the problem to me seems to be more of a traditional union of filters in 2 indexes sort of a join , what we have in relational databases , not sure of the exact solution but recently had used a plug-in for the joins -> https://siren.io/siren-federate-20-0-introducing-a-scalable-inner-join-for-elasticsearch/ this might help
I have the following data
[{"devcount" : 1 , "dayofweek" :0, "hour" : 1 },
{"devcount" : 2 , "dayofweek" :0, "hour" : 2 },
{"devcount" : 3 , "dayofweek" :1, "hour" : 2 },
{"devcount" : 4 , "dayofweek" :1, "hour" : 3 },
{"devcount" : 6 , "dayofweek" :1, "hour" : 4 },
{"devcount" : 5 , "dayofweek" :1, "hour" : 5 },
{"devcount" : 7 , "dayofweek" :2, "hour" : 5 },
{"devcount" : 8 , "dayofweek" :2, "hour" : 6 },
{"devcount" : 9 , "dayofweek" :2, "hour" : 7 },
{"devcount" : 10 , "dayofweek" :2, "hour" : 9 }]
It is required to compare the devcount with the group average of devcount for each dayofweek.
i.e. for the fist row, devcount=1 is to be compared with the the average device count for the dayofweek-0 (= 1.5) and "yes" to be returned if the devcount is lesser. Else "No" should be returned.
I have coded as below.
smry=d3.nest()
.key( function(d) { return d.dayofweek;})
.rollup(function(d) {return d3.mean(d, function(g) {return g.devcount; })})
.entries(result);
I am not sure how to compare the smry data and the original data.
The original data will be used in selectAll for creating rectangles and the output after comparison needs for determining the colour of the rectangle
You can do it as shown in the snippet below.
test = [{
"devcount": 1,
"dayofweek": 0,
"hour": 1
}, {
"devcount": 2,
"dayofweek": 0,
"hour": 2
},
{
"devcount": 3,
"dayofweek": 1,
"hour": 2
}, {
"devcount": 4,
"dayofweek": 1,
"hour": 3
}, {
"devcount": 6,
"dayofweek": 1,
"hour": 4
}, {
"devcount": 5,
"dayofweek": 1,
"hour": 5
},
{
"devcount": 7,
"dayofweek": 2,
"hour": 5
}, {
"devcount": 8,
"dayofweek": 2,
"hour": 6
}, {
"devcount": 9,
"dayofweek": 2,
"hour": 7
}, {
"devcount": 10,
"dayofweek": 2,
"hour": 9
}
];
//make the summary using nest
smry = d3.nest()
.key(function(d) {
return d.dayofweek;
})
.rollup(function(d) {
return d3.mean(d, function(g) {
return g.devcount;
})
})
.entries(test);
test.forEach(function(t) {
//find the value from summary for dayofweek
var k = smry.find(function(s) {
return (s.key == t.dayofweek)
});
//check the day of week with the mean, set the flag in the data
if(k.values<t.devcount){
t.flag = true;
} else {
t.flag = false;
}
});
console.log(test);//this now has the flag to determine the color
<script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.4.11/d3.min.js"></script>
I have some (simplified) data as follows:
{ "PO": 1353901, "Qty": 1, "Levels": 3 },
{ "PO": 1353901, "Qty": 2, "Levels": 3 },
{ "PO": 50048309,"Qty": 1, "Levels": 1 },
{ "PO": 50048309,"Qty": 4, "Levels": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1 }
You see here data for two purchase orders, each row representing a unique product and how much of it was used. You also see how many levels those products were spread out over.
A dimension to help understand cost is material density. That is, how many items were used per level. In the case of 1353901 there were three items used on three levels (Qty gets aggregated, Levels do not), resulting in one item per level.
For 50048309 there are six items used on one level, showing a much higher implant density. This tells me it was a lot of work focused in one place.
Filtering on flat data is easy, and not hard to group into ranges. Take Levels for example:
var levels = ndx.dimension(function (d) {
var level = d.Levels;
if (level == 1) {
return 'One';
} else if (level == 2) {
return 'Two';
} else if (level == 3) {
return 'Three';
} else {
return 'Four +';
}
});
I can easily create groups and ranges within a dimension.
What I cannot seem to do is the exact same thing for aggregates. I want to look at (filter) PO by number of materials used per level. It's not a hard figure to get per purchase order, but it seems hard to look at in groups. Example Below:
https://jsfiddle.net/efefdtcj/2/
Since I started with a Dimension based on aggregating at the PO, I'm getting a row back for each PO.
How do I get one row back per QtyPerLevel range?
I think you need to pre-calculate. That is, add a new property to each PO line with the value of the total quantity in the PO. While you're at it, you might as well calculate QtyPerLevel as well:
{ "PO": 1353901, "Qty": 1, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 1353901, "Qty": 2, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 4, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 }
Then create a Crossfilter dimension on QtyPerLevel and filter or group on that:
var ndx = crossfilter([
{ "PO": 1353901, "Qty": 1, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 1353901, "Qty": 2, "Levels": 3, "TotalQty": 3, "QtyPerLevel": 1 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 4, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 },
{ "PO": 50048309,"Qty": 1, "Levels": 1, "TotalQty": 6, "QtyPerLevel": 6 }]);
var qtyPerLevelDim = ndx.dimension(function(d) { return d.QtyPerLevel; });
var qtyPerLevelGrp = qtyPerLevelDim.group();
I have a query that uses group() function:
...group('a','b','c','na').count()
the now the result is returns like in the form of group and reduction like this:
How can I get result without group and reduce in the form of
{
"na": 1285
"c" : 487
"b" : 746
"a" : 32
}
I'm not sure, but I think you're misunderstanding what group does.
The group command takes a property and groups documents by that property. So, for example, if you wanted to group documents by the a property, that would look something like this:
{
a: 1
}, {
a: 1
}, {
a: 1
}, {
a: 2
}
Then you would run the following query:
r.table(...).group('a').count().ungroup()
Which would result in:
[
{
"group": 1 ,
"reduction": 3
},
{
"group": 2 ,
"reduction": 1
}
]
By passing multiple arguments to group you are telling it to make distinct groups for all those properties. So you you have the following documents:
[ {
a: 1, b: 1
}, {
a: 1, b: 1
}, {
a: 1, b: 2
}, {
a: 2, b: 1
}]
And you group them by a and b:
r.table(...).group('a', 'b').count().ungroup()
You will get the following result:
[{
"group": [ 1 , 1 ] ,
"reduction": 2
},
{
"group": [ 1 , 2 ] ,
"reduction": 1
},
{
"group": [ 2 , 1 ] ,
"reduction": 1
}]
Your Answer
So, when you do .group('a','b','c','na').count(), you're grouping them by those 4 properties. If you want the following result:
{
"na": 1285
"c" : 487
"b" : 746
"a" : 32
}
Then your documents should look something like this:
[{
property: 'a'
}, {
property: 'c'
}, {
property: 'na'
},
...
]
And then you would group them in the following way:
r.table(...).group('property').count().ungroup()
Let's say I have the following array
var data = [{ id: 0, points: 1 }, { id: 1, points: 2 }]
I would like to update my table which contains
{
"doc-1": {
"id": "abcxyz123",
"entries": [
{ "id": 0, "points": 5 },
{ "id": 1, "points": 3 },
{ "id": 2, "points": 0 }
]
}
}
so that I add the points-field in the data array to the points-field for each element in the "entries" array in "doc-1" that matches the corresponding id in the data array. The end result would look like:
{
"doc-1": {
"id": "abcxyz123",
"entries": [
{ "id": 0, "points": 6 },
{ "id": 1, "points": 4 },
{ "id": 2, "points": 0 }
]
}
}
How do I go about to write such a query in ReQL?
I assume that the actual document in the table looks like this for now:
{
"id": "abcxyz123",
"entries": [{
"id": 0,
"points": 5
}, {
"id": 1,
"points": 3
}, {
"id": 2,
"points": 0
}]
}
That is without the doc-1 nesting.
Then your update can be done like this:
r.table('t1').update(
{
entries: r.row('entries').map(function(e) {
return r.do(r.expr(data)('id').indexesOf(e('id')), function(dataIndexes) {
return r.branch(
dataIndexes.isEmpty(),
e,
{
id: e('id'),
points: e('points').add(r.expr(data)(dataIndexes(0))('points'))
});
});
})
})
I'm using map to map over each entry in entries, and indexesOf to find the corresponding entry in data if it exists.
Note that this doesn't add new entries to the entries list, but only updates existing ones. Please let me know if you need to add new entries as well.
If your documents actually have the doc-1 field first, this query should do the job:
r.table('t1').update(
{ 'doc-1':
{
entries: r.row('doc-1')('entries').map(function(e) {
return r.do(r.expr(data)('id').indexesOf(e('id')), function(dataIndexes) {
return r.branch(
dataIndexes.isEmpty(),
e,
{
id: e('id'),
points: e('points').add(r.expr(data)(dataIndexes(0))('points'))
});
});
})
}
})