I have the following document:
{
pmv: {
budgets: [
{
amount: 10
},
{
amount: 20
}
]
}
}
and I need to sum the amount field from every object in budgets. But it's also possible that the budget object doesn't exist so I need to check that.
How could I do this? I've seen many questions but with projections, I just need a integer number which in this case would be 30.
How can I do it?
Thanks.
EDIT 1 FOR PUNIT
This is the code I tried but its giving me and empty aray
AggregationOperation filter = match(Criteria.where("pmv.budgets").exists(true).not().size(0));
AggregationOperation unwind = unwind("pmv.budgets");
AggregationOperation sum = group().sum("budgets").as("amount");
Aggregation aggregation = newAggregation(filter, unwind, sum);
mongoTemplate.aggregate(aggregation,"Iniciativas",String.class);
AggregationResults<String> aggregationa = mongoTemplate.aggregate(aggregation,"Iniciativas",String.class);
List<String> results = aggregationa.getMappedResults();
You can do this with aggregation pipeline
db.COLLECTION_NAME.aggregate([
{"pmv.budgets":{$exists:true,$not:{$size:0}}},
{$unwind:"$pmv.budgets"},
{amount:{$sum:"$pmv.budgets"}}
]);
This pipeline contains three queries:
get document having non-null and non-empty budgets
$unwind basically open the array and create one document for each array element. e.g. if one document of budgets has 3 elements then it will create 3 document and fill budgets property from each of the array element. You can read more about it here
sum all the budgets property using $sum operator
You can read more about aggregation pipeline here
EDIT: as per comments, adding code for java as well.
AggregationOperation filter = match(Criteria.where("pmv.budgets").exists(true).not().size(0));
AggregationOperation unwind = unwind("pmv.budgets");
AggregationOperation sum = group().sum("pmv.budgets").as("amount");
Aggregation aggregation = newAggregation(filter, unwind, sum);
mongoTemplate.aggregate(aggregation,COLLECTION_NAME,Output.class);
You can do this in more inline way as well but I wrote it like this so that it will be easy to understand.
I hope this answer your question.
Related
I'm currently trying to figure out how to get a count of unique records to display using DJ.js and D3.js
The data set looks like this:
id,name,artists,genre,danceability,energy,key,loudness,mode,speechiness,acousticness,instrumentalness,liveness,valence,tempo,duration_ms,time_signature
6DCZcSspjsKoFjzjrWoCd,God's Plan,Drake,Hip-Hop/Rap,0.754,0.449,7,-9.211,1,0.109,0.0332,8.29E-05,0.552,0.357,77.169,198973,4
3ee8Jmje8o58CHK66QrVC,SAD!,XXXTENTACION,Hip-Hop/Rap,0.74,0.613,8,-4.88,1,0.145,0.258,0.00372,0.123,0.473,75.023,166606,4
There are 100 records in the data set, and I would expect the count to display 70 for the count of unique artists.
var ndx = crossfilter(spotifyData);
totalArtists(ndx);
....
function totalArtists(ndx) {
// Select the artists
var totalArtistsND = dc.numberDisplay("#unique-artists");
// Count them
var dim = ndx.dimension(dc.pluck("artists"));
var uniqueArtist = dim.groupAll();
totalArtistsND.group(uniqueArtist).valueAccessor(x => x);
totalArtistsND.render();
}
I am only getting 100 as a result when I should be getting 70.
Thanks a million, any help would be appreciated
You are on the right track - a groupAll object is usually the right kind of object to use with dc.numberDisplay.
However, dimension.groupAll doesn't use the dimension's key function. Like any groupAll, it looks at all the records and returns one value; the only difference between dimension.groupAll() and crossfilter.groupAll() is that the former does not observe the dimension's filters while the latter observes all filters.
If you were going to use dimension.groupAll, you'd have to write reduce functions that watch the rows as they are added and removed, and keeps a count of how many unique artists it has seen. Sounds kind of tedious and possibly buggy.
Instead, we can write a "fake groupAll", an object whose .value() method returns a value dynamically computed according to the current filters.
The ordinary group object already has a unique count: the number of bins. So we can create a fake groupAll which wraps an ordinary group and returns the length of the array returned by group.all():
function unique_count_groupall(group) {
return {
value: function() {
return group.all().filter(kv => kv.value).length;
}
};
}
Note that we also have to filter out any bins of value zero before counting.
Use the fake groupAll like this:
var uniqueArtist = unique_count_groupall(dim.group());
Demo fiddle.
I just added this to the FAQ.
When benchmarking Apache Lucene v7.5 I noticed a strange behavior:
I indexed the English Wikipedia dump (5,677,776 docs) using Lucene with the SimpleAnalyzer (No stopwords, no stemming)
Then I searched the index with the following queries:
the totalHits=5,382,873
who totalHits=1,687,254
the who totalHits=5,411,305
"the who" totalHits=8,827
The result number for the Boolean query the who is both larger than the result number for the single term the and the result number for the single term who, when it should be smaller than both.
Is there an explanation for that?
Code snippet:
analyzer = new SimpleAnalyzer();
MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[]{"title", "content","domain","url"},analyzer);
// Parse
Query q = parser.parse(querystr);
// top-10 results
int hitsPerPage = 10;
IndexReader indexReader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(indexReader);
// Ranker
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage);
// Search
searcher.search(q, collector);
// Retrieve the top-10 documents
TopDocs topDocs=collector.topDocs();
ScoreDoc[] hits = topDocs.scoreDocs;
totalHits=topDocs.totalHits;
System.out.println("query: "+querystr + " " + hits.length+" "+String.format("%,d",totalHits));
The explanation is that the default operator is OR and not AND as you assume. Searching for the who returns documents that have either the or who or both.
the - 5,382,873
who - 1,687,254
the OR who - 5,411,305
I.e. most documents that contain who also contains the, except for 28 432 documents which are added to the result set when you retrieve both.
You can change this behavior by changing the default operator:
parser.setDefaultOperator(QueryParserBase.AND_OPERATOR)
I am beginning with ElasticSearch and really like it, hovewer I am stuck with quite simple scenario.
I am indexing such structure of a Worker:
NAME SURENAME ID AGE SEX NAME_SURENAME BIRTH_DATE
NAME_SURENAME - not analyzed - this field is indexed for grouping purposes
NAME, SURENAME - analyzed
The task is simple - search 5 unique workers sorted by birth_date (unique means the same name and surename, even if they are in different age and are different people)
I read about aggregation queries and as I understand, I can get only aggregations without documents. Unfortunatelly I aggregate by name and surename so I won't have other fields in results in buckets, like for example document ID field at least. But I also read about TopHit aggregation, that it returns document, and i tried it - the second idea below.
I have two ideas
1) Not use aggregations, just search 5 workers, filter duplicates in java and again search workers and filter duplicates in Java till I reach 5 unique results
2) Use aggregations. I event tried it like below, it even works on test data but since it is my first time, please advice, whether it works accidentially or it is done correctly? So generally I thought I could get 5 buckets with one TopHit document. I have no idea how TopHit document is chosen but it seems to work. Below is the code
String searchString = "test";
BoolQueryBuilder query = boolQuery().minimumNumberShouldMatch(1).should(matchQuery("name", searchString).should(matchQuery("surename", searchString));
TermsBuilder terms = AggregationBuilders.terms("namesAgg").size(5);
terms.field("name_surename");
terms.order(Terms.Order.aggregation("birthAgg", false)).subAggregation(AggregationBuilders.max("birthAgg")
.field("birth_date")
.subAggregation(AggregationBuilders.topHits("topHit").setSize(1).addSort("birth_date", SortOrder.DESC));
SearchRequestBuilder searchRequestBuilder = client.prepareSearch("workers")
.addAggregation(terms).setQuery(query).setSize(1).addSort(SortBuilders.fieldSort("birth_date")
.order(SortOrder.DESC));
Terms aggregations = searchRequestBuilder.execute().actionGet().getAggregations().get("namesAgg");
List<Worker> results = new ArrayList<>();
for (Terms.Bucket bucket : aggregations.getBuckets()) {
Optional<Aggregation> first = bucket.getAggregations().asList().stream().filter(aggregation -> aggregation instanceof TopHits).findFirst();
SearchHit searchHitFields = ((TopHits) first.get()).getHits().getHits()[0];
Transformer<SearchHit, Worker> transformer = transformers.get(Worker.class);
Worker transform = transformer.transform(searchHitFields);
results.add(transform);
}
return results;//
I am using the geoNear commang with mongoid in order to retrive a document collection ordered by distance. I need the distance for each document in the collection which is why I am having to resort to the geoNear command.
Given the following command:
category_ids = ["list", "of", "ids"]
cmd = Hash.new
cmd[:geoNear] = :poi
cmd[:near] = [params[:location][:x], params[:location][:y]]
cmd[:query] = {
"$or" => [
{primary_category_id: {"$in" => category_ids}},
{category_ids: {"$in" => category_ids}}
]
}
cmd[:spherical] = true
cmd[:num] = num
res = Poi.collection.database.command cmd
My problem is that I require the total number of results in the collection. Sure I could just run another query that just counts the number of items that satisfy the query part of the command, however that would be pretty inefficient and also not very extendible as every change I make in the command would have to be reflected in the count query. Just adding a maxDistance would land me in a whole heap of trouble.
Another option would be to go with find and calculate the distance manually but again I would like to avoid that.
So my question is there a clever way of getting the number of documents returned by the command (minus the num) without having to run a separate query or having to calculate the distance manually and go with find.
You can use facet for the same after geoNear use facet one will project the documents and in other you can use group by _id null and use the count in group to count the total number of documents.
I'm trying to filter a wildcard query in neo/lucene using numeric range.
I want to search for all nodes (documents) having key "actor" starting with "rob" and age > 20:
WildcardQuery luceneQuery = new WildcardQuery( new Term("actor", "rob*" ));
QueryContext qx = new QueryContext(luceneQuery)
.numericRange("age", 20, null)
.sortNumeric("age", true);
IndexHits<Node> hits = lucene.query(qx);
Once I add numeric range the wildCard query does not works, it only orders by numeric range.
Is it possible to combine both wildcard and numeric?
Thanks,
Daniele
I suspect you want to use a BooleanQuery to combine the WildcardQuery with the numeric range query. (I normally use QueryParser, myself, rather than building the queries by hand.)
For your example query, the QueryParser syntax would look like:
+actor:rob* +age:{20 TO 123}
where +age:{20 TO 123} asks for age > 20 AND age < 123 (the oldest well-documented person lived to 122). The "+" operators force both of those terms to occur in the document.