elasticsearch aggregation PHP - elasticsearch

i am trying to get the unique values from my elasticsearch database.
So i want the unique names from my elasticsearch database.
So i am aggregation like so ---
$paramss = [
'index' => 'myIndex',
'type' => 'myType',
'ignore_unavailable' => true,
'ignore' => [404, 500]
];
$paramss['body'] = <<<JSON
{
"size": 0,
"aggs" : {
"langs" : {
"terms" : { "field" : "name" }
}
}}
JSON;
$results = $client->search($paramss);
print_r(json_encode($results));
i get the result like so---
{
took: 3,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 1852,
max_score: 0,
hits: [
]
},
aggregations: {
langs: {
buckets: [
{
key: "aaaa.se",
doc_count: 430
},
{
key: "bbbb.se",
doc_count: 358
},
{
key: "cccc.se",
doc_count: 49
},
{
key: "eeee.com",
doc_count: 46
}
]
}
}
}
But the problem is i am not getting all the unique values, I am getting only 10 values, which is default value for elasticsearch query.
So how can i change the query size !!!
i tried like so---
$paramss = [
'index' => 'myIndex',
'type' => 'myType',
'size' => 1000,
'ignore_unavailable' => true,
'ignore' => [404, 500]
];
which returns me some weird documents.
So do anyone knows the solution of this problem.
How can i get all the unique names from my elasticsearch database, can someone help me to fix this problem.

You are also doing everuthing right, except you the size.
The "size": 0 should come after the targeted field's name.
$client = new Elasticsearch\Client($params);
$query['body'] = '{
"aggs" : {
"all_sources" : {
"terms" : {
"field" : "source",
"order" : { "_term" : "asc" },
"size": 0
}
}
}
}';

You need to put size parameter inside terms:
{
"aggs" : {
"langs" : {
"terms" : {
"field" : "name",
"size": 0
}
}
}}
Link to documentation where you can find more info:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

Related

how to convert elastic query aggregate filter into nest query in .Net core

How can I convert this Elastic search query into nest query.
the query is given Bellow .
GET winlogbeat-6.6.0*/_search?size=0
{
"query": {
"match_all": {}
},
"aggs": {
"success ": {
"filter": {
"term": {
"event_id": 4624
}
}
},
"failed": {
"filter": {
"term": {
"event_id": 4625
}
}
}
}
}
The desired out Output in Kibana is as follow
{
"took" : 13120,
"timed_out" : false,
"_shards" : {
"total" : 37,
"successful" : 37,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 299924794,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"failed" : {
"doc_count" : 351643
},
"success " : {
"doc_count" : 40375274
}
}
}
this is my code and i need to convert it NEST to get the desired result .
Thanks
You are almost there, you just need to add another case by calling .Filter(..) on aggregations descriptor
var searchResponse = await client.SearchAsync<Document>(s => s
.Query(q => q.MatchAll())
.Aggregations(a => a
.Filter("success", success => success
.Filter(filter => filter
.Term(t => t.Field(f => f.EventId).Value(4624))))
.Filter("failed", failed => failed
.Filter(filter => filter
.Term(t => t.Field(f => f.EventId).Value(4625))))));
public class Document
{
public int Id { get; set; }
public int EventId { get; set; }
}
Hope that helps.

Let document match multiple buckets of date histogram

I have an index that has a mapping which is similar to
{
"id": {
"type": "long"
},
"start": {
"type": "date"
},
"end": {
"type": "date"
}
}
I want to create a date histogram so that each document falls into all buckets which intervals fall between "start" and "end".
Eg. if for one document "start" = 12/01/2018, "end" = 04/25/2019, my date-histogram interval are weeks and the range is now-1y until now. I now want the document to fall into every bucket starting the week of 12/01/2018 until the week of 04/25/2019. So with just this one document the result should be 52 buckets where the buckets April to dezember have doc_count 0 and the buckets Dezember to April have doc_count 1.
As I see it date-histogram only gives me the option to match my document to exactly one bucket depending on one field, either "start" or "end".
What I have tried so far:
Dynamically generate a query with 52 filters which checks if a document falls into this "bucket"
Try to make use of painless scripts in each query
Both solutions were extremly slow. I am working with around 200k documents and such queries took around 10 seconds.
EDIT: Here is a sample query that is generated dynamically. As can be seen one filter is created per week. This query takes about 10 seconds which is way to long
%{
aggs: %{
count_chart: %{
aggs: %{
last_seen_over_time: %{
filters: %{
filters: %{
"2018-09-24T00:00:00Z" => %{
bool: %{
must: [
%{range: %{start: %{lte: "2018-09-24T00:00:00Z"}}},
%{range: %{end: %{gte: "2018-09-17T00:00:00Z"}}}
]
}
},
"2018-12-24T00:00:00Z" => %{
bool: %{
must: [
%{range: %{start: %{lte: "2018-12-24T00:00:00Z"}}},
%{range: %{end: %{gte: "2018-12-17T00:00:00Z"}}}
]
}
},
"2019-04-01T00:00:00Z" => %{
bool: %{
must: [
%{range: %{start: %{lte: "2019-04-01T00:00:00Z"}}},
%{range: %{end: %{gte: "2019-03-25T00:00:00Z"}}}
]
}
}, ...
}
}
}
},
size: 0
}
And a sample response:
%{
"_shards" => %{"failed" => 0, "skipped" => 0, "successful" => 5, "total" => 5},
"aggregations" => %{
"count_chart" => %{
"doc_count" => 944542,
"last_seen_over_time" => %{
"buckets" => %{
"2018-09-24T00:00:00Z" => %{"doc_count" => 52212},
"2018-12-24T00:00:00Z" => %{"doc_count" => 138509},
"2019-04-01T00:00:00Z" => %{"doc_count" => 119634},
...
}
}
}
},
"hits" => %{"hits" => [], "max_score" => 0.0, "total" => 14161812},
"timed_out" => false,
"took" => 2505
}
I hope this question is understandable. If not I will explain it more in detail.
How about doing 2 date_histogram query and calculating the difference per week?
I'm assuming you just need the overall count due to size:0 in your query.
let start = await client.search({
index: 'dates',
size: 0,
body: {
"aggs" : {
"start": {
"date_histogram": {
"field": "start",
"interval": "week"
},
}
}
}
});
let end = await client.search({
index: 'dates',
size: 0,
body: {
"aggs" : {
"end": {
"date_histogram": {
"field": "end",
"interval": "week"
},
}
}
}
});
let buckets = {};
let start_buckets = start.aggregations.start.buckets;
let end_buckets = end.aggregations.start.buckets;
let started = 0;
let ended = 0;
for (let i = 0; i < start_buckets.length; i++) {
started += start_buckets[i].doc_count;
buckets[start_buckets[i].key_as_string] = started - ended;
ended += end_buckets[i].doc_count;
}
This test took less than 2 seconds on my local on similar scale to yours.
You can run both aggregations simultaneously to save more time.

Elastic Search: Query string and number not always returning wanted result

We have an elastic search 5.5 setup. We use nest to perform our queries through C#.
When executing the following query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "00917751"
}
}
]
}
}
}
We get the desired result: one result with that the number as identifier.
When executing the query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "917751"
}
}
]
}
}
}
We get no results.
The value we are searching for is in the field searchIndentifier, and has the value "1-00917751".
We have a custom analyzer called "final"
.Custom("final", cu => cu
.Tokenizer("keyword").Filters(new List() { "lowercase" }))
The field searchIndentifier has no custom analyzer set on it. I tried adding the whitespace tokenizer in it but that made no difference.
Another field called "searchObjectNo" does work, when I try to search for the value "S328-25" with the query "S328". These fields are exactly the same.
Any ideas here?
Another related question:
When executing the query
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "1-00917751"
}
}
]
}
}
}
we get a lot of results. I would like this to return only 1 result. How would we accomplish this?
Thank you
Schoof
Settings and mapping: https://jsonblob.com/9dbf33f6-cd3e-11e8-8f17-c9de91b6f9d1
The searchIndentifier field is mapped as a text datatype, which will undergo analysis and use the Standard Analyzer by default. Using the Analyze API, you can see what terms will be stored in the inverted index for 1-00917751
var client = new ElasticClient();
var analyzeResponse = client.Analyze(a => a
.Text("1-00917751")
);
which returns
{
"tokens" : [
{
"token" : "1",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<NUM>",
"position" : 0
},
{
"token" : "00917751",
"start_offset" : 2,
"end_offset" : 10,
"type" : "<NUM>",
"position" : 1
}
]
}
You'll get a match for the query_string query with a query input of 00917751 as this matches one of the terms stored in the inverted index as a result of analysis at index time for the input 1-00917751.
You won't get a match for 917751 as there is not a term in the inverted index that will match. You could define an analysis chain that removes leading zeroes from numbers as well as preserving the original token e.g.
private static void Main()
{
var defaultIndex = "foobarbaz";
var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
var settings = new ConnectionSettings(pool)
.DefaultIndex(defaultIndex);
var client = new ElasticClient(settings);
client.CreateIndex(defaultIndex, c => c
.Settings(s => s
.Analysis(a => a
.Analyzers(an => an
.Custom("trim_leading_zero", ca => ca
.Tokenizer("standard")
.Filters(
"standard",
"lowercase",
"trim_leading_zero",
"trim_zero_length")
)
)
.TokenFilters(tf => tf
.PatternReplace("trim_leading_zero", pr => pr
.Pattern("^0+(.*)")
.Replacement("$1")
)
.Length("trim_zero_length", t => t
.Min(1)
)
)
)
)
.Mappings(m => m
.Map<MyDocument>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.SearchIndentifier)
.Analyzer("trim_leading_zero")
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
)
)
)
)
)
);
client.Index(new MyDocument { SearchIndentifier = "1-00917751" }, i => i
.Refresh(Refresh.WaitFor)
);
client.Search<MyDocument>(s => s
.Query(q => q
.QueryString(qs => qs
.Query("917751")
)
)
);
}
public class MyDocument
{
public string SearchIndentifier { get; set; }
}
The pattern_replacement token filter will trim leading zeroes from tokens.
the search query returns the indexed document
{
"took" : 69,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.33310556,
"hits" : [
{
"_index" : "foobarbaz",
"_type" : "mydocument",
"_id" : "MVF4bmYBJZHQAT-BUx1K",
"_score" : 0.33310556,
"_source" : {
"searchIndentifier" : "1-00917751"
}
}
]
}
}

elasticsearch -check if array contains a value

I want to check on an field of an array long type that includes some values.
the only way I found is using script: ElasticSearch Scripting: check if array contains a value
but it still not working fore me:
Query:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc['Commodity'].values.contains(param1)",
"params": {
"param1": 50
}
}
}
}
}
}
but I get 0 hits. while I have the records:
{
"_index" : "aaa",
"_type" : "logs",
"_id" : "2zzlXEOgRtujWiCGtX6s9Q",
"_score" : 1,
"_source" : {
"Commodity" : [
50
],
"Type" : 1,
"SourceId" : "fsd",
"Id" : 123
}
}
Try this instead of that script:
{
"query": {
"filtered": {
"filter": {
"terms": {
"Commodity": [
55,
150
],
"execution": "and"
}
}
}
}
}
For those of you using the latest version of Elasticsearch (7.1.1), please note that
"filtered" and "execution" are deprecated so #Andrei Stefan's answer may not help anymore.
You can go through the below discussion for alternative approaches.
https://discuss.elastic.co/t/is-there-an-alternative-solution-to-terms-execution-and-on-es-2-x/41089
In the answer written by nik9000 in the above discussion, I just replaced "term" with "terms" (in PHP) and it started working with array inputs and AND was applied with respect to each of the "terms" keys that I used.
EDIT: Upon request I will post a sample query written in PHP.
'body' => [
'query' => [
'bool' => [
'filter' => [
['terms' => ['key1' => array1]],
['terms' => ['key2' => array2]],
['terms' => ['key3' => array3]],
['terms' => ['key4' => array4]],
]
]
]
]
key1,key2 and key3 are keys present in my elasticsearch data and they will be searched for in their respective arrays. AND function is applied between the ["terms" => ['key' => array ] lines.
For those of you who are using es 6.x, this might help.
Here I am checking whether the user(rennish.joseph#gmail.com) has any orders by passing in an array of orders
GET user-orders/_search
{
"query": {
"bool": {
"filter": [
{
"terms":{
"orders":["123456","45678910"]
}
},
{
"term":{
"user":"rennish.joseph#gmail.com"
}
}
]
}
}
}

ElasticSearch : field not returned

I am new to ElasticSearch, please forgive my stupidity.
I cant seem to get the keepalive field out of ES.
{
"_index" : "2013122320",
"_type" : "log",
"_id" : "Y1M18ZItTDaap_rOAS5YOA",
"_score" : 1.0
}
I can get other field out of it cdn:
{
"_index" : "2013122320",
"_type" : "log",
"_id" : "2neLlVNKQCmXq6etTE6Kcw",
"_score" : 1.0,
"fields" : {
"cdn" : "-"
}
}
The mapping is there:
{
"log": {
"_timestamp": {
"enabled": true,
"store": true
},
"properties": {
"keepalive": {
"type": "integer"
}
}
}
}
EDIT
We create a new index every hour using the following perl code
create_index(
index => $index,
settings => {
_timestamp => { enabled => 1, store => 1 },
number_of_shards => 3,
number_of_replicas => 1,
},
mappings => {
varnish => {
_timestamp => { enabled => 1, store => 1 },
properties => {
content_length => { type => 'integer' },
age => { type => 'integer' },
keepalive => { type => 'integer' },
host => { type => 'string', index => 'not_analyzed' },
time => { type => 'string', store => 'yes' },
<SNIPPED>
location => { type => 'string', index => 'not_analyzed' },
}
}
}
);
With so little informations, I can only guess :
In the mapping you gave, keepalive is not explicitely stored and alasticsearch defaults to no. If you do not store a field, you can only get it via the complete source, wich is stored by default. Or you change, the mapping, adding ("store" : "yes") to your field and reindex.
Good luck with ES, It is well worth a few days of learning.

Resources