Elasticsearch - icu_folding and utf-8 searching - elasticsearch

Even after hours of trying to understand Elastic search, I can not understand idea how to achieve the same results for searching text with special characters.
What I am doing wrong with icu_folding? How can I achieve, that results will be same for "Škoda" and "Skoda" same? Is it even possible?
https://github.com/pavoltravnik/examples/blob/master/elastic_search_settings.sh

You're applying the icu_folding token filter on the name.sort sub-field and not on the name field itself, so your queries need to be like this instead:
# 1 result as expected
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": { "match": { "name.sort": "Škoda" } }
}'
# 0 results - I expected the same behaviour
curl -XGET 'localhost:9200/my_index/_search?pretty' -d'
{
"query": { "match": { "name.sort": "Skoda" } }
}'

Related

Elastic search query not returning results

I have an Elastic Search query that is not returning data. Here are 2 examples of the query - the first one works and returns a few records but the second one returns nothing - what am I missing?
Example 1 works:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"data.case.field1": "ABC123"
}
}
}
'
Example 2 not working:
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": {
"term" : { "data.case.field1" : "ABC123" }
}
}
}
}
'
this is happening due to the difference between match and term queries, match queries are analyzed, which means it applied the same analyzer on the search term, which is used on field at index time, while term queries are not analyzed, and used for exact searches, and search term in term queries doesn't go through the analysis process.
Official doc of term query
Returns documents that contain an exact term in a provided field.
Official doc of match query
Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
If you are using text field for data.case.field1 without any explicit analyzer than the default analyzer(standard) for the text field would be applied, which lowercase the text and store the resultant token.
For your text, a standard analyzer would produce the below token, please refer Analyze API for more details.
{
"text" : "ABC123",
"analyzer" : "standard"
}
And generated token
{
"tokens": [
{
"token": "abc123",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Now, when you use term query as a search term will not be analyzed and used as it is, which is in captical char(ABC123) it doesn't match the tokens in the index, hence doesn't return result.
PS: refer my this SO answer for more details on term and match queries.
What is your mapping for data.case.field1? If it is of type text, you should use a match query instead of term.
See the warning at the top of this page: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html#query-dsl-term-query
Unless we know the mapping type as text or keyword. It is relatively answering in the dark without knowing all the variables involved. May be you can try the following.
curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": { <- Try this if you have datatype as keyword
"term" : { "data.case.field1" : "ABC123" }
}
}
}
}
'

Delete Data from Elasticsearch Maxed out index

I have an single Elasticsearch index which has maximum number of documents. i want to delete some data from this index. i have tried running a delete_by_query but it fails. any sugestions for this, which i would able to keep the index and its data and also delete some old data from the same index ?
This is the request i was running.
curl -X GET "1localhost:9200/produdtion-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "range": { "submitDate": { "lte": "20200421" } } }
]
}
}
}
'
Try index life cycle managementhttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html
You can split the index based on the number of documents and then delete them

ElasticsearchIllegalArgumentException No feature for name

I have an Elasticsearch node setup. When I query the index via curl command I get the expected output.
curl -XPOST 'http://localhost:9200/one/employee/_search?pretty=true' -d '{
"query": {
"term": {
"emp_id":"4318W01149"
}
}
}'
but when I run similar query via browser I get the error
http://localhost:9200/one/employee/?q=emp_id:4318W01149
{"error":"ElasticsearchIllegalArgumentException[No feature for name [employee]]","status":400}
I'm on ES version 1.5.2
Thanks
you forgot _search in http://localhost:9200/one/employee/?q=emp_id:4318W01149
should be
http://localhost:9200/one/employee/_search?q=emp_id:4318W01149

Delete all documents from index/type without deleting type

I know one can delete all documents from a certain type via deleteByQuery.
Example:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
But i have NO term and simply want to delete all documents from that type, no matter what term. What is best practice to achieve this? Empty term does not work.
Link to deleteByQuery
I believe if you combine the delete by query with a match all it should do what you are looking for, something like this (using your example):
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"match_all" : {}
}
}'
Or you could just delete the type:
curl -XDELETE http://localhost:9200/twitter/tweet
Note: XDELETE is deprecated for later versions of ElasticSearch
The Delete-By-Query plugin has been removed in favor of a new Delete By Query API implementation in core. Read here
curl -XPOST 'localhost:9200/twitter/tweet/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
From ElasticSearch 5.x, delete_by_query API is there by default
POST: http://localhost:9200/index/type/_delete_by_query
{
"query": {
"match_all": {}
}
}
You can delete documents from type with following query:
POST /index/type/_delete_by_query
{
"query" : {
"match_all" : {}
}
}
I tested this query in Kibana and Elastic 5.5.2
Torsten Engelbrecht's comment in John Petrones answer expanded:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d
'{
"query":
{
"match_all": {}
}
}'
(I did not want to edit John's reply, since it got upvotes and is set as answer, and I might have introduced an error)
Starting from Elasticsearch 2.x delete is not anymore allowed, since documents remain in the index causing index corruption.
Since ElasticSearch 7.x, delete-by-query plugin was removed in favor of new Delete By Query API.
The curl option:
curl -X POST "localhost:9200/my-index/_delete_by_query" -H 'Content-Type: application/json' -d' { "query": { "match_all":{} } } '
Or in Kibana
POST /my-index/_delete_by_query
{
"query": {
"match_all":{}
}
}
The above answers no longer work with ES 6.2.2 because of Strict Content-Type Checking for Elasticsearch REST Requests. The curl command which I ended up using is this:
curl -H'Content-Type: application/json' -XPOST 'localhost:9200/yourindex/_doc/_delete_by_query?conflicts=proceed' -d' { "query": { "match_all": {} }}'
In Kibana Console:
POST calls-xin-test-2/_delete_by_query
{
"query": {
"match_all": {}
}
}
(Reputation not high enough to comment)
The second part of John Petrone's answer works - no query needed. It will delete the type and all documents contained in that type, but that can just be re-created whenever you index a new document to that type.
Just to clarify:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet'
Note: this does delete the mapping! But as mentioned before, it can be easily re-mapped by creating a new document.
Note for ES2+
Starting with ES 1.5.3 the delete-by-query API is deprecated, and is completely removed since ES 2.0
Instead of the API, the Delete By Query is now a plugin.
In order to use the Delete By Query plugin you must install the plugin on all nodes of the cluster:
sudo bin/plugin install delete-by-query
All of the nodes must be restarted after the installation.
The usage of the plugin is the same as the old API. You don't need to change anything in your queries - this plugin will just make them work.
*For complete information regarding WHY the API was removed you can read more here.
You have these alternatives:
1) Delete a whole index:
curl -XDELETE 'http://localhost:9200/indexName'
example:
curl -XDELETE 'http://localhost:9200/mentorz'
For more details you can find here -https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
2) Delete by Query to those that match:
curl -XDELETE 'http://localhost:9200/mentorz/users/_query' -d
'{
"query":
{
"match_all": {}
}
}'
*Here mentorz is an index name and users is a type
I'm using elasticsearch 7.5 and when I use
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -d'
{
"query": {
"match_all": {}
}
}'
which will throw below error.
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
I also need to add extra -H 'Content-Type: application/json' header in the request to make it works.
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
{
"took" : 465,
"timed_out" : false,
"total" : 2275,
"deleted" : 2275,
"batches" : 3,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
Just to add couple cents to this.
The "delete_by_query" mentioned at the top is still available as a plugin in elasticsearch 2.x.
Although in the latest upcoming version 5.x it will be replaced by
"delete by query api"
Elasticsearch 2.3 the option
action.destructive_requires_name: true
in elasticsearch.yml do the trip
curl -XDELETE http://localhost:9200/twitter/tweet
For future readers:
in Elasticsearch 7.x there's effectively one type per index - types are hidden
you can delete by query, but if you want remove everything you'll be much better off removing and re-creating the index. That's because deletes are only soft deletes under the hood, until the trigger Lucene segment merges*, which can be expensive if the index is large. Meanwhile, removing an index is almost instant: remove some files on disk and a reference in the cluster state.
* The video/slides are about Solr, but things work exactly the same in Elasticsearch, this is Lucene-level functionality.
If you want to delete document according to a date.
You can use kibana console (v.6.1.2)
POST index_name/_delete_by_query
{
"query" : {
"range" : {
"sendDate" : {
"lte" : "2018-03-06"
}
}
}
}

Elastic Search Hyphen issue with term filter

I have the following Elastic Search query with only a term filter. My query is much more complex but I am just trying to show the issue here.
{
"filter": {
"term": {
"field": "update-time"
}
}
}
When I pass in a hyphenated value to the filter, I get zero results back. But if I try without an unhyphenated value I get results back. I am not sure if the hyphen is an issue here but my scenario makes me believe so.
Is there a way to escape the hyphen so the filter would return results? I have tried escaping the hyphen with a back slash which I read from the Lucene forums but that didn't help.
Also, if I pass in a GUID value into this field which is hyphenated and surrounded by curly braces, something like - {ASD23-34SD-DFE1-42FWW}, would I need to lower case the alphabet characters and would I need to escape the curly braces too?
Thanks
I would guess that your field is analyzed, which is default setting for string fields in elasticsearch. As a result, when it indexed it's not indexed as one term "update-time" but instead as 2 terms: "update" and "time". That's why your term search cannot find this term. If your field will always contain values that will have to be matched completely as is, it would be the best to define such field in mapping as not analyzed. You can do it by recreating the index with new mapping:
curl -XPUT http://localhost:9200/your-index -d '{
"mappings" : {
"your-type" : {
"properties" : {
"field" : { "type": "string", "index" : "not_analyzed" }
}
}
}
}'
curl -XPUT http://localhost:9200/your-index/your-type/1 -d '{
"field" : "update-time"
}'
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"filter": {
"term": {
"field": "update-time"
}
}
}'
Alternatively, if you want some flexibility in finding records based on this field, you can keep this field analyzed and use text queries instead:
curl -XPOST http://localhost:9200/your-index/your-type/_search -d'{
"query": {
"text": {
"field": "update-time"
}
}
}'
Please, keep in mind that if your field is analyzed then this record will be found by searching for just word "update" or word "time" as well.
The accepted answer didn't work for me with elastic 6.1. I solved it using the "keyword" field that elastic provides by default on string fields.
{
"filter": {
"term": {
"field.keyword": "update-time"
}
}
}
Based on the answer by #imotov If you're using spring-data-elasticsearch then all you need to do is mark your field as:
#Field(type = FieldType.String, index = FieldIndex.not_analyzed)
instead of
#Field(type = FieldType.String)
The problem is you need to drop the index though and re-instantiate it with new mappings.

Resources