Change the structure of ElasticSearch response json - elasticsearch

In some cases, I don't need all of the fields in response json.
For example,
// request json
{
"_source": "false",
"aggs": { ... },
"query": { ... }
}
// response json
{
"took": 123,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 123,
"max_score": 123,
"hits": [
{
"_index": "foo",
"_type": "bar",
"_id": "123",
"_score": 123
}
],
...
},
"aggregations": {
"foo": {
"buckets": [
{
"key": 123,
"doc_count": 123
},
...
]
}
}
}
Actually I don't need the _index/_type every time. When I do aggregations, I don't need hits block.
"_source" : false or "_source": { "exclude": [ "foobar" ] } can help ignore/exclude the _source fields in hits block.
But can I change the structure of ES response json in a more common way? Thanks.

I recently needed to "slim down" the Elasticsearch response as it was well over 1MB in json and I started using the filter_path request variable.
This allows to include or exclude specific fields and can have different types of wildcards. Do read the docs in the link above as there is quite some info there.
eg.
_search?filter_path=aggregations.**.hits._source,aggregations.**.key,aggregations.**.doc_count
This reduced (in my case) the response size by half without significantly increasing the search duration, so well worth the effort..

In the hits section, you will always jave _index, _type and _id fields. If you want to retrieve only some specific fields in your search results, you can use fields parameter in the root object :
{
"query": { ... },
"aggs": { ... },
"fields":["fieldName1","fieldName2", etc...]
}
When doing aggregations, you can use the search_type (documentation) parameter with count value like this :
GET index/type/_search?search_type=count
It won't return any document but only the result count, and your aggregations will be computed in the exact same way.

Related

Look for items that a field starts with (ElasticSearch) nodejs client

I'm trying to query my ElasticSearch index in order to retrieve the items that one of the "foo" fields starts with "hel".
The toto field is a keyword type:
"toto": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
This what I tried:
client.search({index: 'xxxxx', type: 'xxxxxx_type', body: {"query": {"regexp": {"toto": "hel.*"}}}},
function(err, resp, status) {
if (err)
res.send(err)
else {
console.log(resp);
res.send(resp.hits.hits)
}
});
I tried to find a solution here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
and
https://www.elastic.co/guide/en/elasticsearch/guide/current/_wildcard_and_regexp_queries.html
or here
How to search for a part of a word with ElasticSearch
but nothing work.
This is how looks my data:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "xxxxxx",
"_type": "xxxxx_type",
"_id": "1",
"_score": 1,
"_source": {
"toto": "hello"
}
}
}
Match phrase prefix query is what you are looking for.
Use the query below:
{
"query": {
"match_phrase_prefix": {
"toto": "hel"
}
}
}
It sounds like you are looking for an auto-complete solution. running regex searches for every character the user type is not that efficient.
I would suggest changing the indexing tokenizers and analyzer in order to create the prefix tokens in advance and allow faster search.
Some options on how to implement auto complete:
Elasticsearch Completion suggester: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/search-suggesters-completion.html
or do it yourself:
https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf
How to suggest (autocomplete) next word in elastic search?

How to apply exact match on single field and distinct on multiple fields together in ElasticSearch?

I recently started working on ElasticSearch, and I am trying search for following criteria
I want to apply exact match on ENAME & distinct on both EID & ENAME on above data.
Let say for matching, I have string ABC.
So result should be like as below
[
{"EID" :111, "ENAME" : "ABC"},
{"EID" : 444, "ENAME" : "ABC"}
]
You can achieve this via a combination of term query and terms aggregation.
Assuming that you have the following mapping:
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"EID": {
"type": "keyword"
},
"ENAME": {
"type": "keyword"
}
}
}
}
}
And inserted the documents like this:
POST my_index/doc/3
{
"EID": "111",
"ENAME": "ABC"
}
POST my_index/doc/4
{
"EID": "222",
"ENAME": "XYZ"
}
POST my_index/doc/12
{
"EID": "444",
"ENAME": "ABC"
}
The query that will do the job might look like this:
POST my_index/doc/_search
{
"query": {
"term": { 1️⃣
"ENAME": "ABC"
}
},
"size": 0, 3️⃣
"aggregations": {
"by EID": {
"terms": { 2️⃣
"field": "EID"
}
}
}
}
Let me explain how it works:
1️⃣ - term query asks Elasticsearch to filter on exact value of a keyword field "ENAME";
2️⃣ - terms aggregation collects the list of all possible values of another keyword field "EID" and gives back the first N most frequent ones;
3️⃣ - "size": 0 tells Elasticsearch not to return any search hits (we are only interested in the aggregations).
The output of the query will look like this:
{
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"by EID": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "111", <== Here is the first "distinct" value that we wanted
"doc_count": 3
},
{
"key": "444", <== Here is another "distinct" value
"doc_count": 2
}
]
}
}
}
The output does not look exactly like what you posted in the question, but I believe it is the closest what you can achieve with Elasticsearch.
However, this output is equivalent:
"ENAME" is implicitly present (since its value was used for filtering)
"EID" is present under the "buckets" of the aggregations section.
Note that under "doc_count" you will find the number of documents having such "EID".
What if I want to do a DISTINCT on several fields?
For a more complex scenario (e.g. when you need to do a distinct on many fields) see this answer.
More information about aggregations is available here.
Hope that helps!

Elasticsearch 5.x.x cannot disable dynamic mapping

I'm trying to simply disable dynamic mapping for any fields not explicitly defined in the mapping at index creation time. Nothing would work, so I even tried the example in their docs
PUT my_index
{
"mappings": {
"my_type": {
"dynamic": false,
"properties": {
"user": {
"type": "text"
}
}
}
}
}
Made a test insert:
POST my_index/my_type
{
"user": "tester",
"some_unknown_field": "lsdkfjsd"
}
Then searching the index shows:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "AViPrfwVko8c8Q3co8Qz",
"_score": 1,
"_source": {
"user": "tester",
"some_unknown_field": "lsdkfjsd"
}
}
]
}
}
I'm expecting "some_unknown_field" to not be indexed, since it was not defined in the mapping. So why is it still being indexed? Am I missing something?
UPDATE
It turns out that it isn't currently possible in version 5.0.0 to do what I wanted, so I removed the fields in my app before sending to elasticsearch and achieved the same end result.
What mapping does is to have your field as the type which you mention, when you create the index using the mapping. So for a field which you haven't mentioned anything during the mapping and then trying to insert values, ES will always consider it as a new field and will add it to the index with a default mapping. So if you don't want to see a particular field within your _source you could do some source filtering.
Work arounds:
If that's not the case try disabling the default mapping when
you're creating the index.
Try making the property dynamic into strict:
PUT /test
{
"settings": {
"index.mapper.dynamic": false
},
"mappings": {
"testing_type": {
"dynamic":"strict",
"properties": {
"field1": {
"type": "string"
}
}
}
}
}
If the above two doesn't work out, try making index_mapper_dynamicto false. This SO could be handy. Hope it helps.

Spurious results from elasticsearch

I suspect I can't (or I'm just not quite desperate enough to try yet!) give enough information to give you enough work on but I'm just hoping someone may be able to give me an idea of where to investigate...
I have an elastic search index which is in a live system and is working fine. I've added 3 attributes to the core entity in the index (productId). I'm getting the correct data back but every now and then it includes spurious data in the return results.
So for example (I've cut the list of fields down which is my it is a multi_match query).
Using Postman I am sending
{
"query" : {
"multi_match" : {
"query" : "FD41D359-1066-47C5-B930-C839F380FBDE",
"fields" : [ "softwareitem.productId" ]
}
}
}
I'm expecting 1 item to come back in this example and I'm getting 2. I've modified the result a little but the key thing is the productId. You can see in the 2nd item returned it is not the product Id be searched ?
Can anyone give me any idea where I should look next with this ? Is there a fault with my query or do you think the index might be corrupt in some way ?
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 27.424479,
"hits": [
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"_score": 27.424479,
"_source": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"productId": "FD41D359-1066-47C5-B930-C839F380FBDE",
"softwareitem": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"title": "Code Library",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
},
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"_score": 1.049637,
"_source": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"productId": "9FB80ABA-B09C-47C5-929A-9FB6C48BD5A8",
"softwareitem": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"title": "Video Game",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
}
]
}
}
It seems softwareitem.productId is a string field that it's being analysed. For doing exact matching of a string field, use a not_analyzed string field in your mapping, something like:
"productId" : {
"type" : "string",
"index" : "not_analyzed"
}
Probably your field is alread not_analyzed you have to do an additional change.
At query time you don't need to use a multi_match / match query. These type of queries will analyze your input string query and build a more complex query out of that input, for that reason you are seeing a second unexpected result (it contains 47C5, probably the analyzer is tokenising the full string and building a query that only one token needs to match) . You should use terms / term queries

Can I use ElasticSearch Facets as an equivalent to GROUP BY and how?

I'm wondering if I can use the ElasticSearch Facets features to replace to Group By feature used in rational databases or even in a Sphinx client?
If so, beside the official documentation, can someone point out a good tutorial to do so?
EDIT :
Let's consider an SQL table products in which I have the following fields :
id
title
description
price
etc.
I omitted the others fields in the tables because I don't want to put them into my ES index.
I've indexed my database with ElasticSearch.
A product is not unique in the index. We can have the same product with different price offers and I wish to group them by price range.
Facets gives you the number of the docs it a particular word is present for a particular field...
Now let's suppose you have an index named tweets, with type tweet and field "name"...
A facet query for the field "name" would be:
curl -XPOST "http://localhost:9200/tweets/tweet/_search?search_type=count" -d'
{
"facets": {
"name": {
"terms": {
"field": "name"
}
}
}
}'
Now the response you get is the as below
"hits": {
"total": 3475368,
"max_score": 0,
"hits": []
},
"facets": {
"name": {
"_type": "terms",
"total": 3539206,
"other": 3460406,
"terms": [
{
"term": "brickeyee",
"count": 9205
},
{
"term": "ken_adrian",
"count": 9160
},
{
"term": "rhizo_1",
"count": 9143
},
{
"term": "purpleinopp",
"count": 8747
}
....
....
This is called term facet as this is term based count...There are other facets also which can be seen here

Resources