Cannot get Elasticsearch Highlight to work - elasticsearch

I am working on a project that involves Elasticsearch. So far I can get most function to work except highlight. I am using Laravel + Elasticsearch official PHP client.
Previously I thought it was a problem of my PHP code, and asked a question here:
highlight field missing from Elasticsearch results, PHP
Later when I tried with elasticsearch-head in browser, I still cannot see highlight field in results, so I guess there must be something wrong with either my settings of elasticsearch or the way I indexed the documents.
Here is the query I entered into elasticsearch-head:
{
"query" : {
"match" : {
"combined" : "DNA"
}
},
"highlight": {
"fields" : {
"combined" : {}
}
}
}
And I don't see "highlight" after "_source" in hits returned by elasticsearch.
What might I did wrong here?
Please advise,
Thanks.
Update: I'm running Elasticsearch 2.3.3, on Ubuntu 16.04 LTS desktop, JDK 1.8.
Documentation says "store" in mapping needs to be set true. I did so, and re-indexed a bunch of documents. This didn't fix the problem.

OK, after stopping and restarting elasticsearch service, my code started to work as intended. I got "highlight" field in results.
The issue is that I need to set "store" to true. Everything else being equal, including the following line
"store" => true
in mapping ensured "highlight" appear in the results I have. Vice versa.
Not sure why doing this earlier didn't solve my problem.

Related

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

I'm quite new to ElasticSearch, so if I overlook something obvious/basic, please forgive me.
Now I'm using ElasticSearch at work, and want to see how the complex settings of analyzers/tokenizers/filters--which are set by my predecessors--split texts into tokens.
I did some research and found the way to do it:
GET /_analyze
{
"tokenizer" : "whitespace",
"filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
"text" : "this is a test"
}
However, as I said, the settings of analyzers/tokenizers/filters is so complicated that writing the details every time I test the settings would horribly slow me down.
So I want to analyze a text with analyzers/tokenizers/filters settings already applied to an index. Is there way to do that?
I would appreciate it if anyone would shed some lights on it.
You don't have to supply the complete analyzer definition every time to analyze API, you can simply use the _analyze API on index and use it like following
GET <your-index-name>/_analyze
{
"analyzer" : "standard",
"text" : "Quick Brown Foxes!"
}
So instead of using the analyze API at a cluster level, you will be using it on index level, where analyzer definition is already present, so you just need to provide the analyzer name not its definition like filter etc to get the tokens based on the analyzer.
Refer Elasticsearch official documentation on using it on specific index or on a specific field with examples.
Hope this helps.

Elastitcsearch 7 : mapping types

I come across the following phrase and I am under impression that a valid 6.x query with type might give an error. I am using the cluster ES 7.10
Note that in 7.0, _doc is a permanent part of the path, and represents
the endpoint name rather than the document type.
But, to my surprise, I am able to run the following query. Does it mean _doc is NOT permanent part of the path? In specific, what kind of queries I need to modify when I am moving from 6.x to 7.x
PUT ecommercesite/product/1
{
"product_name": "Men High Performance Fleece Jacket",
"description": "Best Value. All season fleece jacket",
"unit_price": 79.99,
"reviews": 250,
"release_date": "2016-08-16"
}
And only the 6.x query, I am not able to run on 7.10. I got an error with respect to type.
GET ecommercesite/product/_mapping
The PUT requests currently (end of 2020) just throws a warning but will fail in 8.x.
For now, you could start replacing product with _doc:
PUT ecommercesite/product/1 --> PUT ecommercesite/_doc/1
GET ecommercesite/product/_mapping --> GET ecommercesite/_doc/_mapping?include_type_name
but it'd be best to ditch the types completely and adhere to the standards:
important: instead of PUT ecommercesite/1 either keep using PUT ecommercesite/_doc/1 or use PUT /ecommercesite/_create/1 (docs here)
GET ecommercesite/_mapping (docs here)
no significant changes in GET ecommercesite/_search

how to copy id field during indexing (elasticsearch)

It's often useful to have the _id as a part of the document. In fact it's advised here: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
But if you do not know the _id prior to document creation, how would you duplicate the _id during indexing? The only way I can think of doing it is using a pipeline but is there a simpler way?
Edit: according to answer below even a pipeline cannot achieve this.
Ingest pipelines (current version 7.9.2) cannot access the _id if the _id is generated. There is a note in the documentation saying:
If you automatically generate document IDs, you cannot use the {{_id}} value in an ingest processor. Elasticsearch assigns auto-generated _id values after ingest.
The copy_to feature also don't work for _id when auto generated. This Information is a little bit hidden here https://github.com/elastic/elasticsearch/issues/6730#issuecomment-103142553
Queries with script_fieldsusing doc['_id'].value is deprecated too.
It seems to me that this is what many of us are looking for, for different reasons, but there is no solution at least I am aware of.
The case is obviously complete different for self generated document id.
In case someone still looking for Solution to this issue
You can do a reindexing with script tag and use the context object to get grab of the _id and matched it the ID in the POCO
POST /_reindex?wait_for_completion=false
{
"source": {
"index": "data.dataitems",
"query": {
"match_all": {}
}
},
"dest": {
"index": "data.dataitems_new_index_with_id"
},"script": {
"source": "ctx._source.id = ctx._id"
}
}

Elasticsearch Dynamic Field Mapping and JSON Dot Notation

I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
The problem is that a few other applications deployed to the cluster have labels of the following format:
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.
The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:
object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value
The opposite situation, posting the namespaced label second, results in this exception:
Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].
What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.
My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.
Any help here would be appreciated.
I opted to use the Logstash mutate filter with the rename option as described here:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename
The end result looked something like this:
filter {
mutate {
'[kubernetes][labels][app]' => '[kubernetes][labels][app.kubernetes.io/name]'
'[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
}
}
Although personally I've never encountered the exact same issue, I had similar problems when I indexed some test data and afterwards changed the structure of the document that should have been indexed (especially when "unflattening" data structures).
Your interpretation of the error message is correct. When you first index the document
{
"log": "This is another log message.",
"kubernetes": {
"labels": {
"app.kubernetes.io/name": "application-2"
}
}
}
Elasticsearch will recognize the app as an object/structure due to dynamic mapping.
When you then try to index the document
{
"log": "This is a log message.",
"kubernetes": {
"labels": {
"app": "application-1"
}
}
}
the previously, dynamically created mapping defined the field app as an object with sub-fields but elasticsearch encounters a concrete value, namely "application-1".
I suggest that you setup an index template to define the correct mappings. For the 'outdated' logging-versions I suggest to pre-process the particular documents either through an elasticsearch ingest-pipeline or with e.g. Logstash to get the documents in the correct format.
Hope that helps.

Fuzzy search by default in kibana

I'm trying to make some fuzzy search in kibana using their IHM (ideally by default). I know how to do such a request in the DEV TOOLS section. The problem is to have that option by default. Is it possible? I'd like also to save all the requests that I entered (by default).
Please find below the search I try to incorporate and get the results.
GET /_search
{
"query": {
"fuzzy" : { "NOM" : "COUT" }
}
}
PS: I know that there is a Lucene Syntax for sophisticated requests.
Thanks a lot for your help !

Resources