Using group and indexed max together

Using group and indexed max together - rethinkdb

Code:
r.db('dealdb').table('messages')
.filter({ dealId : request.dealId })
.group('conversationId')
.max({ index: 'sendDate' })
.ungroup()
.getField('reduction')
Error:
Expected type TABLE but found SEQUENCE
So my understanding is that group returns a sequence while max expects a table.
However, max without an index works as expected
r.db('dealdb').table('messages')
.filter({ dealId : request.dealId })
.group('conversationId')
.max('sendDate')
.ungroup()
.getField('reduction')
So why does an indexed max not work in a sequence while a non-indexed max does? And how do I get this working with an indexed max

max with index only works at table level. That means this will work:
r.db('dealdb').table('messages').max({ index: 'sendDate' })
but when we call on a stream or a selection, we cannot use index anymore.
That's why it won't work as you see.
Without index options, we can call max on any arrays, streams, selections, and tables.

Related

How to return the N documents closest to a specific key from a couchdb view

I have a view on a couchdb database which exposes a certain document property as a key:
function (doc) {
if (doc.docType && doc.docType === 'CARD') {
if (doc.elo) {
emit(doc.elo, doc._id);
} else {
emit(1000, doc._id);
}
}
}
I'm interested in querying this db for the (say) 25 documents with keys closest to a given input. The only thing I can think to do is to set a search range and make repeated queries until I have enough results:
// pouchdb's query fcn
function getNresultsClosestToK(key: number, limit: number) {
let range = 20;
do {
cards = await this.db.query('elo', {
limit,
startkey: (key - range).toString(),
endkey: (key + range).toString()
});
range += 20;
} while (cards.rows.length < limit)
return cards;
}
But this may require several calls and is inefficient. Is there a way to pass a single key and a limit to couch and have it return the limit documents closest to the supplied key?

If I understand correctly, you want to query for a specific key, then return 12 results before the key, the key itself, and 12 results after the key, for a total of 25 results.
The most direct way to do this is with two queries against your view, with the proper combination of startkey, limit, and descending values.
For example, to get the key itself, and the 12 values following, query your view with these options:
startkey: <your key>
limit: 13
descending: false
Then to get the 12 entries before your key, perform a query with the following options:
startkey: <your key>
limit: 13
descending: true
This will give you two result sets, each with (a maximum of) 13 items. Note that your target key will be repeated (it's in each result set). You'll then need to combine the two result sets.
Note this does have a few limitations:
It returns a maximum of 26 results. If your data does not contain 12 values before or after your target key, you'll get fewer than 26 results.
If you have duplicate keys, you may get unexpected results. In particular:
If your target key is duplicated, you'll get 25 - N unique results (where N is the number of duplicates of your target key)
If your non-target keys are duplicated, you have no way of guaranteeing which of the duplicate keys will be returned, so performing the same query multiple times may result in different return values.

How to get dynamic field count in dc.js numberDisplay?

I'm currently trying to figure out how to get a count of unique records to display using DJ.js and D3.js
The data set looks like this:
id,name,artists,genre,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
6DCZcSspjsKoFjzjrWoCd,God's Plan,Drake,Hip-Hop/Rap,0.754,0.449,7,-9.211,1,0.109,0.0332,8.29E-05,0.552,0.357,77.169,198973,4
3ee8Jmje8o58CHK66QrVC,SAD!,XXXTENTACION,Hip-Hop/Rap,0.74,0.613,8,-4.88,1,0.145,0.258,0.00372,0.123,0.473,75.023,166606,4
There are 100 records in the data set, and I would expect the count to display 70 for the count of unique artists.
var ndx = crossfilter(spotifyData);
totalArtists(ndx);
....
function totalArtists(ndx) {
// Select the artists
var totalArtistsND = dc.numberDisplay("#unique-artists");
// Count them
var dim = ndx.dimension(dc.pluck("artists"));
var uniqueArtist = dim.groupAll();
totalArtistsND.group(uniqueArtist).valueAccessor(x => x);
totalArtistsND.render();
}
I am only getting 100 as a result when I should be getting 70.
Thanks a million, any help would be appreciated

You are on the right track - a groupAll object is usually the right kind of object to use with dc.numberDisplay.
However, dimension.groupAll doesn't use the dimension's key function. Like any groupAll, it looks at all the records and returns one value; the only difference between dimension.groupAll() and crossfilter.groupAll() is that the former does not observe the dimension's filters while the latter observes all filters.
If you were going to use dimension.groupAll, you'd have to write reduce functions that watch the rows as they are added and removed, and keeps a count of how many unique artists it has seen. Sounds kind of tedious and possibly buggy.
Instead, we can write a "fake groupAll", an object whose .value() method returns a value dynamically computed according to the current filters.
The ordinary group object already has a unique count: the number of bins. So we can create a fake groupAll which wraps an ordinary group and returns the length of the array returned by group.all():
function unique_count_groupall(group) {
return {
value: function() {
return group.all().filter(kv => kv.value).length;
}
};
}
Note that we also have to filter out any bins of value zero before counting.
Use the fake groupAll like this:
var uniqueArtist = unique_count_groupall(dim.group());
Demo fiddle.
I just added this to the FAQ.

ArangoDb - How to count number of filtered results before limiting them

db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN doc._key
`)
.then(cursor => {
cb(cursor._result)
}, err => console.log(err))
I have above AQL query,
I want to count total nuber of filtered results before limiting the results per page (For Pagination Purpose)
I think issue is similar to this MySQL - How to count rows before pagination?, Find total number of results in mySQL query with offset+limit
want to do in ArangoDb with AQL
and part of solution may be this How to count number of elements with AQL?
So, What is the efficient/best solution for my requirement with AQL ?

You can set the flag fullCount in the options for creating the cursor to true. Then the result will have an extra attribute with the sub-attributes stats and fullCount.
You then can get the the fullCount-attribute via cursor.extra.stats.fullCount. This attribute contains the number of documents in the result before the last LIMIT in the query was applied. see HTTP documentation
In addition, you should use the explain feature to analyse your query. In your case, your query will always make a full collection scan, thus won't scale well.
update
I added the fullCount flag to your code. Keep in mind, that the fullCount attribute only appears if the number of results before LIMIT is higher then the results after.
db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN {family: doc.family, group: doc.group} `, {count:true, options:{fullCount:true} })
.then(cursor => { console.log(cursor) }, err => console.log(err))

How can I descending sort a grouping based on the count of the reduction array in rethinkdb

Importing this dataset as a table:
https://data.cityofnewyork.us/Housing-Development/Registration-Contacts/feu5-w2e2#revert
I use the following query to perform an aggregation and then attempt to sort in descending order based on the reduction field. My intention is sort based on the count of that field or to have the aggregation create a second field called count and sort the grouping results in descending order of the reduction array count or length. How can this be done in rethinkdb?
query:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").ungroup().orderBy(r.desc('reduction'))

I don't quite understand what you're going for, but does this do what you want? If not, what do you want to be different in the output?
r.table("contacts")
.filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.ungroup()
.merge(function(row){ return {count: row('reduction').count()}; })
.orderBy(r.desc('count'))

You are almost there:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"}).hasFields("CorporationName").group("CorporationName").count().ungroup().orderBy(r.desc('reduction'))
See that .count()? That is a map-reduce operation to get the count of each group.
I haven't tested the query on your dataset. Please comment in case you had problems with it.
EDIT:
If you want to add a count field and preserve the original document, you need to use map and reduce. In your case, it should be something like:
r.table("contacts").filter({"Type": "Agent","ContactDescription" : "CONDO"})
.hasFields("CorporationName")
.group("CorporationName")
.map(r.row.merge({count:1}))
.reduce(function(left, right){
return {
count: left('count').add(right('count')),
<YOUR_OTHER_FIELDS>: left('<YOUR_OTHER_FIELDS>'),
...
};
})
.ungroup().orderBy(r.desc(r.row('reduction')('count')))
EDIT:
I am not sure if this can do the trick, but it is worth a try:
.reduce(function(left, right){
return left.merge({count: left('count').add(right('count'))})
})

Order by random in RethinkDB

I want to order documents randomly in RethinkDB. The reason for this is that I return n groups of documents and each group must appear in order in the results (so all documents belonging to a group should be placed together); and I need to randomly pick a document, belonging to the first group in the results (you don't know which is the first group in the results - the first ones could be empty, so no documents are retrieved for them).
The solution I found to this is to randomly order each of the groups before concat-ing to the result, then always pick the first document from the results (as it will be random). But I'm having a hard time ordering these groups randomly. Would appreciate any hint or even a better solution if there is one.

If you want to order a selection of documents randomly you can just use .orderBy and return a random number using r.random.
r.db('test').table('table')
.orderBy(function (row) { return r.random(); })
If these document are in a group and you want to randomize them inside the group, you can just call orderBy after the group statement.
r.db('test').table('table')
.groupBy('property')
.orderBy(function (row) { return r.random(); })
If you want to randomize the order of the groups, you can just call orderBy after calling .ungroup
r.db('test').table('table')
.groupBy('property')
.ungroup()
.orderBy(function (row) { return r.random(); })

The accepted answer here should not be possible, as John mentioned the sorting function must be deterministic, which r.random() is not.
The r.sample() function could be used to return a random order of the elements:
If the sequence has less than the requested number of elements (i.e., calling sample(10) on a sequence with only five elements), sample will return the entire sequence in a random order.
So, count the number of elements you have, and set that number as the sample number, and you'll get a randomized response.
Example:
var res = r.db("population").table("europeans")
.filter(function(row) {
return row('age').gt(18)
});
var num = res.count();
res.sample(num)

I'm not getting this to work. I tried to sort an table randomly and I'm getting the following error:
e: Sorting by a non-deterministic function is not supported in:
r.db("db").table("table").orderBy(function(var_33) { return r.random(); })
Also I have read in the rethink documentation that this is not supported. This is from the rethinkdb orderBy documentation:
Sorting functions passed to orderBy must be deterministic. You cannot, for instance, order rows using the random command. Using a non-deterministic function with orderBy will raise a ReqlQueryLogicError.
Any suggestions on how to get this to work?

One simple solution would be to give each document a random number:
r.db('db').table('table')
.merge(doc => ({
random: r.random(1, 10)
})
.orderBy('random')

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Using group and indexed max together - rethinkdb

Related

How to return the N documents closest to a specific key from a couchdb view

How to get dynamic field count in dc.js numberDisplay?

ArangoDb - How to count number of filtered results before limiting them

How can I descending sort a grouping based on the count of the reduction array in rethinkdb

Order by random in RethinkDB

Categories

Resources