ElasticSearch aggregation returns always 10 buckets only - elasticsearch

I am using NEST. The number of buckets returned from ElasticSearch aggregation is always 10 (default value), in spite of the fact that the size is set to 10000

You need to set the size inside the Terms aggregation and not outside of it. Try this:
.Aggregations( a => a
.Terms(category_agg", st => st
.Field(o => o.categories.Select(x => x.id))
.Size(10000)
)
)

Related

Running DateRange with null values on ElasticSearch?

I am writing some queries in my ElasticSearch project that can be filtered by Date. I have written them like this:
var searchResponse = client.Search<mdl.Event>(s => s
.Query(q => q
.QueryString(qs => qs
.Query(search.Space))
&& q
.DateRange(r => r
.Field(f => f.CreatedTimeStamp)
.GreaterThanOrEquals(search.From)
.LessThanOrEquals(search.To))));
However, search.From and search.To are optional inputs, so they might turn out to be null. In the event that they are null, does this break the query? Or will it continue as if the DateRange part of the query is not included?
Nest queries are condition less. If input is determined to be null or empty string then the query will be omitted from the request.
So you don't need to check whether each filter property is null , NEST will perform this filtering by default.
In your case if search.From and search.To are null then range check will be removed from final query

Is it possible to query aggregations on NEST for multiple term fields (.NET)?

Below is the NEST query and aggregations:
Func<QueryContainerDescriptor<ConferenceWrapper>, QueryContainer> query =
q =>
q.Term(p => p.type, "conference")
// && q.Term(p => p.conference.isWaitingAreaCall, true)
&& q.Range(d => d.Field("conference.lengthSeconds").GreaterThanOrEquals(minSeconds))
&& q.DateRange(qd => qd.Field("conference.firstCallerStart").GreaterThanOrEquals(from.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ")))
&& q.DateRange(qd => qd.Field("conference.firstCallerStart").LessThanOrEquals(to.ToUniversalTime().ToString("yyyy-MM-ddTHH:mm:ss.fffZ")));
Func<AggregationContainerDescriptor<ConferenceWrapper>, IAggregationContainer> waitingArea =
a => a
.Terms("both", t => t
.Field(p => p.conference.orgNetworkId) // seems ignore this field
.Field(p => p.conference.isWaitingAreaCall)
// .Field(new Field( p => p.conference.orgNetworkId + "-ggg-" + p.conference.networkId)
.Size(300)
.Aggregations(a2 => a2.Sum("sum-length", d2 => d2.Field("conference.lengthSeconds"))));
I have called .Field(p => p.conference.orgNetworkId) followed by .Field(p => p.conference.isWaitingAreaCall) But it seems the NEST client tries to ignore the first field expression.
Is is possible to have multiple fields to be the terms group by?
Elasticsearch doesn't support a terms aggregation on multiple fields directly; the calls to .Field(...) within NEST are assignative rather than additive, so the last call will overwrite any previously set values.
In order to aggregate on multiple fields, you can either
Create a composite field at index time that incorporates the values that you wish to aggregate on
or
Use a Script to generate the terms on which to aggregate at query time, by combining the two field values.
The performance of the first option will be better than the second.

ElasticSearch .NET Sub Aggregation

var response = client.Search<Timeline>(
x => x.Query(
q => q.Bool(
b => b.Must(queryContainer)))
.Size(0)
.Aggregations(a => a
.DateRange("last_24_hours",
f => f.Field(n=>n.server_time)
.Ranges(z=>z.From(DateMath.Now.Subtract("24h")).To(DateMath.Now))
.Aggregations(
agg => agg.DateHistogram("widget_clicked_by_hour",
p => p.Field(z => z.server_time)
.Interval(DateInterval.Hour)
.Format("yyyy-MM-dd hh:mm")
.OrderDescending("_key"))))
)
);
I'm trying to get items from widget_clicked_by_hour aggregation but in the nest .net library I don't have access to the items list
although while debugging I found the items list
To get the date histogram buckets for each date range bucket would be
var dateRange = response.Aggs.DateRange("last_24_hours");
foreach (var rangeBucket in dateRange.Buckets)
{
var dateHistogram = rangeBucket.DateHistogram("widget_clicked_by_hour");
foreach (var histogramBucket in dateHistogram.Buckets)
{
// do something with bucket
}
}
Since the date histogram aggregation is a sub-aggregation of the date range aggregation, it can be accessed from each bucket in the date histogram aggregation.
I would suggest 2 things that helped me immensely.
1) I would install the sense plugin from chrome
https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig?hl=en
This gives you a very userfriendly way to build your elasticsearch queries and analysis right in the browser.
2) I would look into using the cardinality aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
If you are trying to get a list, this should give you a list of items and the counts of it (which you can use/ignore)

Elasticsearch - Nest - 'Range' query on 'string'

Can you do a range query on a string still using nest in the lastest 2.0 alpha release? Or has this been dropped in elasticsearch.
Documentation -- suggests it is still in Elasticsearch itself
however
Range -- seems to only accept 'double'.
E.g.
...
(sh => sh.Range(ra => ra.Field(of =>
of.Name).LessThanOrEquals(
!string.IsNullOrEmpty(textInputName)
? textInputName.ToString(): null
))
...
Used to work in 1.7 Nest, but now says the input for LessThanOrEquals must be a double.
How do I now get everything where 'name' is between, for example, 'a' and 'f'?
Edit:
I think it was removed here in file src/Nest/QueryDsl/TermLevel/Range/RangeQuery.cs... just can not find 'why'.... :S
Range queries on string fields are now in the alpha2 release on nuget
(sh => sh
.TermRange(ra => ra
.Field(of => of.Name)
.LessThanOrEquals(!string.IsNullOrEmpty(textInputName)
? textInputName.ToString()
: null)
)

How to increase the speed of this MongoDB query?

MongoDB 2.0.7 & PHP 5
I'm trying to count the length of each array. Every document has one array. I want to get the number of elements in each array and the ID of the document. There are no indexes except from Id.
Here's my code:
$map = new MongoCode("function() {
emit(this._id,{
'_id':this._id,'cd':this.cd,'msgCount':this.cs[0].msgs.length}
);
}");
$reduce = new MongoCode("function(k, vals) {
return vals[0];
}");
$cmmd = smongo::$db->command(array(
"mapreduce" => "sessions",
"map" => $map,
"reduce" => $reduce,
"out" => "result"));
These are the timings. As you can see, the query is very slow
Array
(
[result] => result
[timeMillis] => 29452
[counts] => Array
(
[input] => 106026
[emit] => 106026
[reduce] => 0
[output] => 106026
)
[ok] => 1
)
How can I reduce the timings?
If you are going to frequently need the counts for your arrays, a better approach would be to include a count field in your actual documents. Otherwise you are going to be scanning all documents to do the count (as per your Map/Reduce example).
You can use an Atomic Operation such as $inc to increment/decrement this count at the same time as you are updating the arrays.

Resources