I'm looking for a way to query the distribution of the top n values for many object fields in single query
My object in elastic search looks like:
obj: {
os: "Android",
device_model: "Samsung Galaxy S II (GT-I9100)",
device_brand: "Samsung",
os_version: "Android-2.3",
country: "BR",
interests: [1,2,3],
behavioral_segment: ["sport", "lifestyle"]
}
The following query brings the distribution of the values for specific field with number of appearances of this value only for the UK users
curl -XPOST http://<endpoint>/profiles/_search?search_type=count -d '
{
"query": {
"match": {
"country" : "UK"
}
},
"facets": {
"ItemsPerCategoryCount": {
"terms": {
"field": "behavioral_segment"
}
}
}
}'
How can I query for many fields - for example I would like to get a result for behavioral_segment and device_brand and os in single query. Is it possible?
In the facets section of the query, you should use the fields parameter.
"facets": {
"ItemsPerCategoryCount": {
"terms": {
"fields": ["behavioral_segment","device_brand"]
}
}
}
That should solve your problem, but of course it might not garantee the coherence of the data
Related
Here's an example of a document in my ES index:
{
"src_ip": "192.168.1.1",
"dst_ip": "192.168.1.2"
}
I want obtain the number of occurrences of ip in different documents(in field src_ip or dst_ip). What I would like to get as a result of the query is an aggregation like this:
[
{"ip": "192.168.1.1", "count": 1"},
{"ip": "192.168.1.2", "count": 1"}
]
Any idea about that? Thanks in advance for your help.
You need to use term aggregation, Where you will get the group counts.
POST index_name/_search?size=0
{
"aggs": {
"src_ip_count": {
"terms": {
"field": "src_ip"
}
},
"dst_ip_count": {
"terms": {
"field": "dst_ip"
}
}
}
}
Here i am assuming that the type of src_ip and dst_ip is keyword. If it's not, You need to store all value as keyword type.
I'm using ElasticSearch 6.3.
Scenario: dozens of thousand documents has "123" field with "blabla" value in most of those. A few has "blabla blo" in that field. These occupy last places in query results if I set up size: 10000 (if default size, they doesn't appear). But I really want both unique records: one with these field "123": "blabla" and that one with field "123":"blabla blo".
I`m using wildcard and getting all 10000 documents. Only need those two.
I'm going to feed a select tag HTML with thats records, but only two of them ideally!
Query body:
{
"query": {
"wildcard":{
"324" : {
"value":"*b*"
}
}
},
"size": 10000,
"_source": ["324"]
}
How I should make it? The concept would be similar to find records which value aren't fully duplicated in that field, I supose.
Thank you
That's what aggs are for!
GET index_name/_search
{
"query": {
"wildcard": {
"324": {
"value": "*b*"
}
}
},
"size": 0,
"aggs": {
"324_uniques": {
"terms": {
"field": "324",
"size": 10
}
}
}
}
field could be 324 OR 324.keyword, depending on your mapping.
So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like
`{
"dummy_name": "abc",
"dummy_id": "44850642"
}`
which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.
`{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
},
{
}
]
}
},
{
"person_id": 1235,
.........
}`
Query Iam using:
`{
"query": {
"bool": {
"must": [
{
"match": {
"Properties.Property1.dummy_id": "453041 23234324 124324 "
}
}
]
}
},
"aggregations": {
"group_by_concept": {
"terms": {
"field": "Properties.Property1.dummy_id",
"order": {
"_count": "desc"
},
"size": 10
}
}
}
}`
The problem which is coming is how are you keeping the data.
For eg In this document
{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
"dummy_name": "dfg",
"dummy_id": "876468"
},
{
}
]
}
}
The tokens that would be generated in this document would be
Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene
So when you would query for dummy_id:44850642
you would get the document, but aggregations aggregates on terms produced by the documents matching the query
So as a result you would see buckets of 44850642 as well as 876468.
For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html
I am trying to get the total number of tokens in documents that match a query. I haven't defined any custom mapping and the field for which I want to get the token count is of type 'string'.
I tried the following query, but it gives a very large number in the order of 10^20, which is not the correct answer for my dataset.
curl -XPOST 'localhost:9200/nodename/comment/_search?pretty' -d '
{
"query": {
"match_all": {}
},
"aggs": {
"tk_count": {
"sum": {
"script": "_index[\"body\"].sumttf()"
}
}
},
"size": 0
}
Any idea how to get the correct count of all tokens? ( I do not need counts for each term, but the total count).
This worked for me, is it what you need?
Rather than getting token count on query (using tk_count aggregation, as suggested in the other answer), my solution stores the token count on indexing using the token_count datatype., so that I could get "name.stored_length" values returned in query results.
token_count is a "multi-field" it works on one-field-at-a-time (i.e. the "name" field or the "body" field). I modified the example slightly to store the "name.stored_length"
Notice in my example it does not count cardinality of tokens (i.e. distinct values), it counts total tokens; "John John Doe" has 3 tokens in it; "name.stored_length"===3; (even though its count distinct tokens is only 2). Notice I ask for specific "stored_fields" : ["name.stored_length"]
Finally, you may need to re-update your documents (i.e. send a PUT), or any technique to get the values you want! In this case I PUT "John John Doe", even if it was already POST/PUT in elasticsearch; the tokens were not counted until a PUT again, after adding tokens to the mapping.!)
PUT test_token_count
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"stored_length": {
"type": "token_count",
"analyzer": "standard",
//------------------v
"store": true
}
}
}
}
}
}
}
PUT test_token_count/_doc/1
{
"name": "John John Doe"
}
Now we can query, or search for results, and configure results to include the name.stored_length field (which is both a multi-field and a stored field!):
GET/POST test_token_count/_search
{
//------------------v
"stored_fields" : ["name.stored_length"]
}
And results to the search should include the total token count as named.stored_length...
{
...
"hits": {
...
"hits": [
{
"_index": "test_token_count",
"_type": "_doc",
"_id": "1",
"_score": 1,
"fields": {
//------------------v
"name.stored_length": [
3
]
}
}
]
}
}
Seems like you want to retrieve cardinality of total tokens in body field.
In such case you can just use cardinality aggregation like below.
curl -XPOST 'localhost:9200/nodename/comment/_search?pretty' -d '
{
"query": {
"match_all": {}
},
"aggs": {
"tk_count": {
"cardinality" : {
"field" : "body"
}
}
},
"size": 0
}
For detailed information, see this official document
The object that I am indexing has both a UserId (a GUID) and a FullName property. I would like to do a faceted search using the UserId but display the FullName so that it's readable. I don't want to do the facet on the FullName since it technically doesn't have to be unique.
Right now I'm doing something like this:
{
"query": {
"match_all": {}
},
"facets": {
"userFacet": {
"terms": {
"field": "userId"
}
}
}
}
But then it is giving me the Guids in the response which I would need to hit the database to lookup the full name which is obviously not a real solution.
So how can I use one field to do the facets with and then a different field for the display values to use?
i am using the below line for displaying the fields.
It is working for grids only but not facets.
Still facet is taking from the id only not label.
fields: [{
id: 'name_first',
'label': 'First Name'
}]
Try adding a fields clause to your query, as follows:
{
"query": {
"match_all": {}
},
"facets": {
"userFacet": {
"terms": {
"field": "userId"
}
}
},
"fields": [ "FullName" ]
}
Note that in order to return field values in search results, you need to actually store them in ES. So you need to either store the _source (this happens by default, unless you override it with "_source" : {"enabled" : false}, or set "store": "yes" for that field in your mapping.