Query with `field` returns nothing - elasticsearch

I'm new to elastic search and am having troubles with my queries.
When I do a match all I get this;
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 1,
"hits": [{
"_index": "stations",
"_type": "station",
"_id": "4432",
"_score": 1,
"_source": {
"SiteName": "Abborrkroksvägen",
"LastModifiedUtcDateTime": "2015-02-13 10:34:20.643",
"ExistsFromDate": "2015-02-14 00:00:00.000"
}
},
{
"_index": "stations",
"_type": "station",
"_id": "9110",
"_score": 1,
"_source": {
"SiteName": "Abrahamsberg",
"LastModifiedUtcDateTime": "2012-03-26 23:55:32.900",
"ExistsFromDate": "2012-06-23 00:00:00.000"
}
}
]
}
}
My search query looks like this:
{
"query": {
"query_string": {
"fields": ["SiteName"],
"query": "a"
}
}
}
The problem is that when I run the query above I get empty results which is strange. I should receive both of the documents from my index, right?
What am I doing wrong? Did I index my data wrong or is my query just messed up?
Appreciate any help I can get. Thanks guys!

There is nothing wrong either in your data or query. It seems you didn't understand how data get stored in elasticsearch!
Firstly, when you index data("SiteName": "Abborrkroksvägen" and "SiteName": "Abrahamsberg") they will get stored as individual analysed terms.
When you query ES using "query":"a"(means here you are looking for the term "a" ) then it will look for whether there is any match with term a but as there are no terms so you will get empty results.
When you query ES using "query":"a*"(means all terms starts with "a") then it will return you as you expected.
Hope this clarifies your question!

Also you may have a look at article I found recently about search - https://www.timroes.de/2016/05/29/elasticsearch-kibana-queries-in-depth-tutorial/

Related

Look for items that a field starts with (ElasticSearch) nodejs client

I'm trying to query my ElasticSearch index in order to retrieve the items that one of the "foo" fields starts with "hel".
The toto field is a keyword type:
"toto": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
This what I tried:
client.search({index: 'xxxxx', type: 'xxxxxx_type', body: {"query": {"regexp": {"toto": "hel.*"}}}},
function(err, resp, status) {
if (err)
res.send(err)
else {
console.log(resp);
res.send(resp.hits.hits)
}
});
I tried to find a solution here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
and
https://www.elastic.co/guide/en/elasticsearch/guide/current/_wildcard_and_regexp_queries.html
or here
How to search for a part of a word with ElasticSearch
but nothing work.
This is how looks my data:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "xxxxxx",
"_type": "xxxxx_type",
"_id": "1",
"_score": 1,
"_source": {
"toto": "hello"
}
}
}
Match phrase prefix query is what you are looking for.
Use the query below:
{
"query": {
"match_phrase_prefix": {
"toto": "hel"
}
}
}
It sounds like you are looking for an auto-complete solution. running regex searches for every character the user type is not that efficient.
I would suggest changing the indexing tokenizers and analyzer in order to create the prefix tokens in advance and allow faster search.
Some options on how to implement auto complete:
Elasticsearch Completion suggester: https://www.elastic.co/guide/en/elasticsearch/reference/6.0/search-suggesters-completion.html
or do it yourself:
https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf
How to suggest (autocomplete) next word in elastic search?

Kibana including versioned documents in visualizations

I have a document with _id "123456", and when I do a GET in Elasticsearch for that ID in my index I can see that it is _version: 2 which makes sense because I updated it.
However in my Kibana visualizations it seems like it is picking up both versions of the same document when showing the results.
How do I exclude versioned documents from re-appearing in the visualization? For example, this record is showing up twice in my bar graph.
Please and thank you
Example GET response:
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_version": 2,
"found": true,
"_source": {
... ommitted
}
}
Also I am sure there is only one actual document with that ID because if I do a _search on the _id field I can see this:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 7.53924,
"hits": [
{
"_index": "censored",
"_type": "censored",
"_id": "123456",
"_score": 7.53924,
"_source": {
... ommitted
}
}
]
}
}
EDIT: Things I've tried below
aggs": {
"latest": {
"terms": {
"field": "_id"
}
}
}
and
"aggs": {
"latest": {
"max": {
"field": "version"
}
}
}
So frankly this is just a workaround, if someone finds a better solution I will mark that as the answer instead. Anyway this is how I've been able to prevent multiple records with the same _id showing up in my visualizations on my dashboard:
I just changed the "Y Axis - Count" on all the visualizations to being "Y Axis - Unique Count on field _id"
Honestly it seems silly that I have to do this because I think different versions should just automatically be exempt from appearing in my saved searches & visualizations. I couldn't find any information about why this was happening. I even tried a _forcemerge to try and delete previous versions of records but it didn't do anything.
Would be nice if someone found a real solution.

elasticsearch, multi_match and filter by date

It seems I followed every similar answer I found, but I just cant figure out what is wrong...
This is a "match all" query:
{
"query": {
"match_all": {}
}
}
..and the results:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "unittest_index_unittestdocument",
"_type": "unittestdocument",
"_id": "a.b",
"_score": 1,
"_source": {
"id": "a.b",
"docdate": "2018-01-24T09:45:44.4168345+02:00",
"primarykeywords": [
"keyword"
],
"primarytitles": [
"the title of a document"
]
}
}
]
}
}
but when I try to filter that with a date like this:
{
"query":{
"bool":{
"must":{
"multi_match":{
"type":"most_fields",
"query":"document",
"fields":[ "primarytitles","primarykeywords" ]
}
},
"filter": [
{"range":{ "docdate": { "gte":"1900-01-23T15:17:12.7313261+02:00" } } }
]
}
}
}
I have zero hits...
I tried to follow this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html and this filtering by date in elasticsearch with no success at all..
Is there any difference that I cannot see????
Please note that when I remove the date filter and I add a term filter on "primarykeywords" i get the results i want. The only problem is the range filter
Apparently there was no error with my query, the problem was that the docdate field wasn't index... :/
Don't know why I initially skipped indexing that field (my mistake), but I do believe elastic should warn me that I am trying to query something that has "index: false"
This thing that elastic just doesn't return results without informing what is going on is, in my opinion, a major issue. I lost one day reading everything I could find on the web, just because I didn't had a proper feedback from the engine.
Fail safe died for this reason...

Using scan and scroll for elasticSearch on sense

I am trying to iterate over several documents in elasticSearch, and am using Sense (the google chrome plugin to do so). Using scan and scroll for efficiency I get the scroll id as:
POST _search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
The result of which is:
{
"_scroll_id": "c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs=",
"took": 10,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 20000,
"max_score": 0,
"hits": []
}
}
Then pass this to a GET as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
but I get 0 results, specifically:
{
"_index": "my_index",
"_type": "_search",
"_id": "scroll",
"found": false
}
I found the problem, I had specified the index my_index in the server box on sense. Removing this and re-executing the post command as:
POST /my_index/_search?scroll=10m&search_type=scan
{
"query": { "match_all": {}}
}
and passing the resulting scroll_id as:
GET _search/scroll?scroll=1m&scroll_id="c2Nhbjs1OzE4ODY6N[...]c5NTU0NTs="
worked!
This works in my sense (of course you should replace the id from your case; don't use ")
POST /test/_search?search_type=scan&scroll=1m
GET /_search/scroll?scroll=1m&scroll_id=c2Nhbjs1OzI[...]Tt0b3RhbF9oaXRzOjQ7

Spurious results from elasticsearch

I suspect I can't (or I'm just not quite desperate enough to try yet!) give enough information to give you enough work on but I'm just hoping someone may be able to give me an idea of where to investigate...
I have an elastic search index which is in a live system and is working fine. I've added 3 attributes to the core entity in the index (productId). I'm getting the correct data back but every now and then it includes spurious data in the return results.
So for example (I've cut the list of fields down which is my it is a multi_match query).
Using Postman I am sending
{
"query" : {
"multi_match" : {
"query" : "FD41D359-1066-47C5-B930-C839F380FBDE",
"fields" : [ "softwareitem.productId" ]
}
}
}
I'm expecting 1 item to come back in this example and I'm getting 2. I've modified the result a little but the key thing is the productId. You can see in the 2nd item returned it is not the product Id be searched ?
Can anyone give me any idea where I should look next with this ? Is there a fault with my query or do you think the index might be corrupt in some way ?
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 27.424479,
"hits": [
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"_score": 27.424479,
"_source": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"productId": "FD41D359-1066-47C5-B930-C839F380FBDE",
"softwareitem": {
"id": "040EEEA1-4758-4F01-A55A-CAE710117C81",
"title": "Code Library",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
},
{
"_index": "core_products",
"_type": "softwareitem",
"_id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"_score": 1.049637,
"_source": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"productId": "9FB80ABA-B09C-47C5-929A-9FB6C48BD5A8",
"softwareitem": {
"id": "806B8F04-3E53-4278-BCC2-C2E1A17D2813",
"title": "Video Game",
"description": "Blah Blah Blah",
"rmType": "Software",
"created": 1424445765000,
"updated": null
},
"searchable": true
}
}
]
}
}
It seems softwareitem.productId is a string field that it's being analysed. For doing exact matching of a string field, use a not_analyzed string field in your mapping, something like:
"productId" : {
"type" : "string",
"index" : "not_analyzed"
}
Probably your field is alread not_analyzed you have to do an additional change.
At query time you don't need to use a multi_match / match query. These type of queries will analyze your input string query and build a more complex query out of that input, for that reason you are seeing a second unexpected result (it contains 47C5, probably the analyzer is tokenising the full string and building a query that only one token needs to match) . You should use terms / term queries

Resources