I am using the geoNear commang with mongoid in order to retrive a document collection ordered by distance. I need the distance for each document in the collection which is why I am having to resort to the geoNear command.
Given the following command:
category_ids = ["list", "of", "ids"]
cmd = Hash.new
cmd[:geoNear] = :poi
cmd[:near] = [params[:location][:x], params[:location][:y]]
cmd[:query] = {
"$or" => [
{primary_category_id: {"$in" => category_ids}},
{category_ids: {"$in" => category_ids}}
]
}
cmd[:spherical] = true
cmd[:num] = num
res = Poi.collection.database.command cmd
My problem is that I require the total number of results in the collection. Sure I could just run another query that just counts the number of items that satisfy the query part of the command, however that would be pretty inefficient and also not very extendible as every change I make in the command would have to be reflected in the count query. Just adding a maxDistance would land me in a whole heap of trouble.
Another option would be to go with find and calculate the distance manually but again I would like to avoid that.
So my question is there a clever way of getting the number of documents returned by the command (minus the num) without having to run a separate query or having to calculate the distance manually and go with find.
You can use facet for the same after geoNear use facet one will project the documents and in other you can use group by _id null and use the count in group to count the total number of documents.
Related
Is there a way to get a total results count when calling Aggregate function?
Note that I'm not using Aggregate function to aggregate results, but as an advanced search query, because Search function does not allow to sort by multiple fields.
RediSearch returns total documents matched count, but I can't find a way to get this number using NRediSearch library.
With NRediSearch
Using NRediSearch, you would need to build and execute aggregation that will run a GROUPBY 0 and the COUNT reducer, say you have a person-idx index and you want to count all the Person documents in Redis:
var client = new Client("person-idx", muxer.GetDatabase());
var result = await client.AggregateAsync(new AggregationBuilder().GroupBy(new List<string>(), new List<Reducer>{Reducers.Count()}));
Console.WriteLine(result.GetResults().First().Values.First());
Will get the count you are looking for.
With Redis.OM
There's a newer library Redis.OM which you can also use to make these aggregations a bit simpler, the same operation would be done with the following:
var peopleAggregations = provider.AggregationSet<Person>();
Console.WriteLine(peopleAggregations.Count());
db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN doc._key
`)
.then(cursor => {
cb(cursor._result)
}, err => console.log(err))
I have above AQL query,
I want to count total nuber of filtered results before limiting the results per page (For Pagination Purpose)
I think issue is similar to this MySQL - How to count rows before pagination?, Find total number of results in mySQL query with offset+limit
want to do in ArangoDb with AQL
and part of solution may be this How to count number of elements with AQL?
So, What is the efficient/best solution for my requirement with AQL ?
You can set the flag fullCount in the options for creating the cursor to true. Then the result will have an extra attribute with the sub-attributes stats and fullCount.
You then can get the the fullCount-attribute via cursor.extra.stats.fullCount. This attribute contains the number of documents in the result before the last LIMIT in the query was applied. see HTTP documentation
In addition, you should use the explain feature to analyse your query. In your case, your query will always make a full collection scan, thus won't scale well.
update
I added the fullCount flag to your code. Keep in mind, that the fullCount attribute only appears if the number of results before LIMIT is higher then the results after.
db.query(aql `
FOR doc IN ${collection}
FILTER REGEX_TEST(CONCAT(VALUES(doc, true)), ${queryStr}, true)
SORT doc[${sortBy}] ${dir}
LIMIT ${start, count}
RETURN {family: doc.family, group: doc.group} `, {count:true, options:{fullCount:true} })
.then(cursor => { console.log(cursor) }, err => console.log(err))
I am beginning with ElasticSearch and really like it, hovewer I am stuck with quite simple scenario.
I am indexing such structure of a Worker:
NAME SURENAME ID AGE SEX NAME_SURENAME BIRTH_DATE
NAME_SURENAME - not analyzed - this field is indexed for grouping purposes
NAME, SURENAME - analyzed
The task is simple - search 5 unique workers sorted by birth_date (unique means the same name and surename, even if they are in different age and are different people)
I read about aggregation queries and as I understand, I can get only aggregations without documents. Unfortunatelly I aggregate by name and surename so I won't have other fields in results in buckets, like for example document ID field at least. But I also read about TopHit aggregation, that it returns document, and i tried it - the second idea below.
I have two ideas
1) Not use aggregations, just search 5 workers, filter duplicates in java and again search workers and filter duplicates in Java till I reach 5 unique results
2) Use aggregations. I event tried it like below, it even works on test data but since it is my first time, please advice, whether it works accidentially or it is done correctly? So generally I thought I could get 5 buckets with one TopHit document. I have no idea how TopHit document is chosen but it seems to work. Below is the code
String searchString = "test";
BoolQueryBuilder query = boolQuery().minimumNumberShouldMatch(1).should(matchQuery("name", searchString).should(matchQuery("surename", searchString));
TermsBuilder terms = AggregationBuilders.terms("namesAgg").size(5);
terms.field("name_surename");
terms.order(Terms.Order.aggregation("birthAgg", false)).subAggregation(AggregationBuilders.max("birthAgg")
.field("birth_date")
.subAggregation(AggregationBuilders.topHits("topHit").setSize(1).addSort("birth_date", SortOrder.DESC));
SearchRequestBuilder searchRequestBuilder = client.prepareSearch("workers")
.addAggregation(terms).setQuery(query).setSize(1).addSort(SortBuilders.fieldSort("birth_date")
.order(SortOrder.DESC));
Terms aggregations = searchRequestBuilder.execute().actionGet().getAggregations().get("namesAgg");
List<Worker> results = new ArrayList<>();
for (Terms.Bucket bucket : aggregations.getBuckets()) {
Optional<Aggregation> first = bucket.getAggregations().asList().stream().filter(aggregation -> aggregation instanceof TopHits).findFirst();
SearchHit searchHitFields = ((TopHits) first.get()).getHits().getHits()[0];
Transformer<SearchHit, Worker> transformer = transformers.get(Worker.class);
Worker transform = transformer.transform(searchHitFields);
results.add(transform);
}
return results;//
I am using the Ruby Mongoid gem and trying to create a query to retrieve the last 100 documents from a collection. Rather than using Mongoid, I would like to create the query using the underlying driver (Moped). The Moped documentation only mentions how to retrieve the first 100 records:
session[:my_collection].find.limit(100)
How can I retrieve the last 100?
I have found a solution, but you will need to sort collection in descending order. If you have a field id or date you would do:
Method .sort({fieldName: 1 or -1})
The 1 will sort ascending (oldest to newest), -1 will sort descending (newest to oldest). This will reverse entries of your collection.
session[:my_collection].find().sort({id:-1}) or
session[:my_collection].find().sort({date:-1})
If your collection contain field id (_id) that identifier have a date embedded, so you can use
session[:my_collection].find().sort({_id:-1})
In accordance with your example using .limit() the complete query will be:
session[:my_collection].find().sort({id:-1}).limit(100);
Technically that query isn't finding the first 100, that's essentially finding 100 random documents because you haven't specified an order. If you want the first then you'd have to say explicitly sort them:
session[:my_collection].find.sort(:some_field => 1).limit(100)
and to reverse the order to find the last 100 with respect to :some_field:
session[:my_collection].find.sort(:some_field => -1).limit(100)
# -----------------------------------------------^^
Of course you have decide what :some_field is going to be so the "first" and "last" make sense for you.
If you want them sorted by :some_field but want to peel off the last 100 then you could reverse them in Ruby:
session[:my_collection].find
.sort(:some_field => -1)
.limit(100)
.reverse
or you could use use count to find out how many there are then skip to offset into the results:
total = session[:my_collection].find.count
session[:my_collection].find
.sort(:some_field => 1)
.skip(total - 100)
You'd have to check that total >= 100 and adjust the skip argument if it wasn't of course. I suspect that the first solution would be faster but you should benchmark it with your data to see what reality says.
I'm trying to filter a wildcard query in neo/lucene using numeric range.
I want to search for all nodes (documents) having key "actor" starting with "rob" and age > 20:
WildcardQuery luceneQuery = new WildcardQuery( new Term("actor", "rob*" ));
QueryContext qx = new QueryContext(luceneQuery)
.numericRange("age", 20, null)
.sortNumeric("age", true);
IndexHits<Node> hits = lucene.query(qx);
Once I add numeric range the wildCard query does not works, it only orders by numeric range.
Is it possible to combine both wildcard and numeric?
Thanks,
Daniele
I suspect you want to use a BooleanQuery to combine the WildcardQuery with the numeric range query. (I normally use QueryParser, myself, rather than building the queries by hand.)
For your example query, the QueryParser syntax would look like:
+actor:rob* +age:{20 TO 123}
where +age:{20 TO 123} asks for age > 20 AND age < 123 (the oldest well-documented person lived to 122). The "+" operators force both of those terms to occur in the document.