Elasticsearch aggregation on multi fields - elasticsearch

Here's an example of a document in my ES index:
{
"src_ip": "192.168.1.1",
"dst_ip": "192.168.1.2"
}
I want obtain the number of occurrences of ip in different documents(in field src_ip or dst_ip). What I would like to get as a result of the query is an aggregation like this:
[
{"ip": "192.168.1.1", "count": 1"},
{"ip": "192.168.1.2", "count": 1"}
]
Any idea about that? Thanks in advance for your help.

You need to use term aggregation, Where you will get the group counts.
POST index_name/_search?size=0
{
"aggs": {
"src_ip_count": {
"terms": {
"field": "src_ip"
}
},
"dst_ip_count": {
"terms": {
"field": "dst_ip"
}
}
}
}
Here i am assuming that the type of src_ip and dst_ip is keyword. If it's not, You need to store all value as keyword type.

Related

How to get documents that are differents by value field

I'm using ElasticSearch 6.3.
Scenario: dozens of thousand documents has "123" field with "blabla" value in most of those. A few has "blabla blo" in that field. These occupy last places in query results if I set up size: 10000 (if default size, they doesn't appear). But I really want both unique records: one with these field "123": "blabla" and that one with field "123":"blabla blo".
I`m using wildcard and getting all 10000 documents. Only need those two.
I'm going to feed a select tag HTML with thats records, but only two of them ideally!
Query body:
{
"query": {
"wildcard":{
"324" : {
"value":"*b*"
}
}
},
"size": 10000,
"_source": ["324"]
}
How I should make it? The concept would be similar to find records which value aren't fully duplicated in that field, I supose.
Thank you
That's what aggs are for!
GET index_name/_search
{
"query": {
"wildcard": {
"324": {
"value": "*b*"
}
}
},
"size": 0,
"aggs": {
"324_uniques": {
"terms": {
"field": "324",
"size": 10
}
}
}
}
field could be 324 OR 324.keyword, depending on your mapping.

Elasticsearch ordering by field value which is not in the filter

can somebody help me please to make a query which will order result items according some field value if this field is not part of query in request. I have a query:
{
"_source": [
"ico",
"name",
"city",
"status"
],
"sort": {
"_score": "desc",
"status": "asc"
},
"size": 20,
"query": {
"bool": {
"should": [
{
"match": {
"normalized": {
"query": "idona",
"analyzer": "standard",
"boost": 3
}
}
},
{
"term": {
"normalized2": {
"value": "idona",
"boost": 2
}
}
},
{
"match": {
"normalized": "idona"
}
}
]
}
}
}
The result is sorted according field status alphabetically ascending. Status contains few values like [active, canceled, old....] and I need something like boosting for every possible values in query. E.g. active boost 5, canceled boost 4, old boost 3 ........... Is it possible to do it? Thanks.
You would need a custom sort using script to achieve what you want.
I've just made use of generic match_all query for my query, you can probably go ahead and add your query logic there, but the solution that you are looking for is in the sort section of the below query.
Make sure that status is a keyword type
Custom Sorting Based on Values
POST <your_index_name>/_search
{
"query":{
"match_all":{
}
},
"sort":[
{ "_score": "desc" },
{
"_script":{
"type":"number",
"script":{
"lang":"painless",
"inline":"if(params.scores.containsKey(doc['status'].value)) { return params.scores[doc['status'].value];} return 100000;",
"params":{
"scores":{
"active":5,
"old":4,
"cancelled":3
}
}
},
"order":"desc"
}
}
]
}
In the above query, go ahead and add the values in the scores section of the query. For e.g. if your value is new and you want it to be at say value 2, then your scores would be in the below:
{
"scores":{
"active":5,
"old":4,
"cancelled":3,
"new":6
}
}
So basically the documents would first get sorted by _score and then on that sorted documents, the script sort would be executed.
Note that the script sort is desc by nature as I understand that you would want to show active documents at the top, followed by other values. Feel free to play around with it.
Hope this helps!

How to aggregate on the same field:value which are specified in query in elasticsearch

So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like
`{
"dummy_name": "abc",
"dummy_id": "44850642"
}`
which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.
`{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
},
{
}
]
}
},
{
"person_id": 1235,
.........
}`
Query Iam using:
`{
"query": {
"bool": {
"must": [
{
"match": {
"Properties.Property1.dummy_id": "453041 23234324 124324 "
}
}
]
}
},
"aggregations": {
"group_by_concept": {
"terms": {
"field": "Properties.Property1.dummy_id",
"order": {
"_count": "desc"
},
"size": 10
}
}
}
}`
The problem which is coming is how are you keeping the data.
For eg In this document
{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
"dummy_name": "dfg",
"dummy_id": "876468"
},
{
}
]
}
}
The tokens that would be generated in this document would be
Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene
So when you would query for dummy_id:44850642
you would get the document, but aggregations aggregates on terms produced by the documents matching the query
So as a result you would see buckets of 44850642 as well as 876468.
For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

Return distinct values in Elasticsearch

I am trying to solve an issue where I have to get distinct result in the search.
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
When I perform a term query on favourite cars "ferrari". I get two results whose name is ABC. I simply want that the result returned should be one in this case. So my requirement will be if I can apply a distinct on name field to receive one 1 result.
Thanks
One way to achieve what you want is to use a terms aggregation on the name field and then a top_hits sub-aggregation with size 1, like this:
{
"size": 0,
"query": {
"term": {
"favorite_cars": "ferrari"
}
},
"aggs": {
"names": {
"terms": {
"field": "name"
},
"aggs": {
"single_result": {
"top_hits": {
"size": 1
}
}
}
}
}
}
That way, you'll get a single term ABC and then nested into it a single matching document

How to query for many facets in single elasticsearch query

I'm looking for a way to query the distribution of the top n values for many object fields in single query
My object in elastic search looks like:
obj: {
os: "Android",
device_model: "Samsung Galaxy S II (GT-I9100)",
device_brand: "Samsung",
os_version: "Android-2.3",
country: "BR",
interests: [1,2,3],
behavioral_segment: ["sport", "lifestyle"]
}
The following query brings the distribution of the values for specific field with number of appearances of this value only for the UK users
curl -XPOST http://<endpoint>/profiles/_search?search_type=count -d '
{
"query": {
"match": {
"country" : "UK"
}
},
"facets": {
"ItemsPerCategoryCount": {
"terms": {
"field": "behavioral_segment"
}
}
}
}'
How can I query for many fields - for example I would like to get a result for behavioral_segment and device_brand and os in single query. Is it possible?
In the facets section of the query, you should use the fields parameter.
"facets": {
"ItemsPerCategoryCount": {
"terms": {
"fields": ["behavioral_segment","device_brand"]
}
}
}
That should solve your problem, but of course it might not garantee the coherence of the data

Resources