Update "keyword" to "text" field type of an index for inexact words matching in elasticsearch - elasticsearch

{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I tried to update by using below PUT request on the index, but still getting the above ouput of _mapping
{
"_doc" : {
"properties" : {
"city" : {"type" : "text"}
}
}
}
I am not able to query with inexact words because its type is "keyword", for the below the actual value in record is "Mumbai"
{
"query": {
"bool": {
"must": {
"match": {
"city": {
"query": "Mumbi",
"minimum_should_match": "10%"
}
}
}
}
}
}

Below mapping (What is shared in the question) will store 'city' as text and 'city.keyword' as a keyword.
{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text", // ==========> Store city as text
"fields": {
"keyword": {
"type": "keyword", // =========> store city.keyword as a keyword
"ignore_above": 256
}
}
}
}
}
}
}
your's is the use case of Fuzzy search and not minimum_should_match.
ES Docs for Fuzzy Search: https://www.elastic.co/blog/found-fuzzy-search
Try below query
{
"query": {
"match": {
"city": {
"query": "mubai",
"fuzziness": "AUTO"
}
}
}
}

minimum_should_match
Minimum number of clauses that must match for a document to be returned
It signifies the percentage of clauses not the percentage of the string. Go through this documentation to frame the query to get the expected results. Invalid queries return invalid results.

Related

How to boost documents matching one of the query_string

Elasticsearch newbie here. I'm trying to lookup documents that has foo in its name but want to prioritize that ones having bar as well i.e. those with bar will be at the top of the list. The result doesn't have the ones with bar at the top. boost here doesn't seem to have any effect, likely I'm not understanding how boost works here. Appreciate any help here.
query: {
bool: {
should: [
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
},
{
query_string: {
query: `name:*foo*`,
}
}
]
}
}
Sample document structure:
{
"name": "foos, one two three",
"type": "car",
"age": 10
}
{
"name": "foos, one two bar three",
"type": "train",
"age": 30
}
Index mapping
{
"detail": {
"mappings": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"servings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Try switching the order for the query like so:
query: {
bool: {
should: [
{
query_string: {
query: `name:*foo*`,
}
},
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
}
]
}
}
it should work but if not you might need to do a nested search.
Search against keyword field.
If you will only run first part of the query ("query": "name:foo*bar*"), you will see that it is not returning anything. It is searching against tokens generated rather than whole string.
Text "foos, one two bar three" generates tokens like ["foos","one","two","bar","three"] and query is searching for "foo*bar*" in individual tokens hence no result. Keyword fields are stored as it is so search is happening against entire text.
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "name.keyword:foo*bar*",
"boost": 5
}
},
{
"query_string": {
"query": "name.keyword:*foo*"
}
}
]
}
}
Wildcards take huge memory and don't scale well. So it is better to avoid it. If foo and bar appear at start of words , you can use prefix query
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foo"
}
},
{
"prefix": {
"name": "bar"
}
}
]
}
}
}
You can also explore ngrams

Update field in a document based on the condition in Kibana/Elasticsearch

I am trying to update particular field in document based on some condition. In general sql way, I want to do following.
Update index indexname
set name = "XXXXXX"
where source: file and name : "YYYYYY"
I am using below to update all the documents but I am not able to add any condition.
POST indexname/_update_by_query
{
"query": {
"term": {
"name": "XXXXX"
}
}
}
Here is the template, I am using:
{
"indexname": {
"mappings": {
"idxname123": {
"_all": {
"enabled": false
},
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"date1": {
"type": "date",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"source": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Could someone guide me how to add condition to it as mentioned above for the source and name.
Thanks,
Babu
You can make use of the below query to what you are looking for. I'm assuming name and source are your fields in your index.
POST <your_index_name>/_update_by_query
{
"script": {
"inline": "ctx._source.name = 'XXXXX'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"term": {
"name": {
"value": "YYYYY"
}
}
},
{
"term": {
"source": {
"value": "file"
}
}
}
]
}
}
}
You can probably make use of any of the Full Text Queries or Term Queries inside the Bool Query for either searching/updating/deletions.
Do spend sometime in going through them.
Note: Make use of Term Queries only if your field's datatype is keyword
Hope this helps!

Elasticsearch discard documents that contain superset of query

Let's say I have 3 documents:
{ "cities": "Paris Zurich Milan" }
{ "cities": "Paris Zurich" }
{ "cities": "Zurich"}
cities is just text, I'm not using any custom analyzer.
I want to query for documents that have in cities both Paris and Zurich, in this order, and do not have any other city. So I want to get only the second document.
This is what I'm trying so far:
{
"query": {
"match_phrase": {
"cities": "Paris Zurich"
}
}
}
But this returns also the first document.
What should I do instead?
If you do not care about case sensitivity just use term query:
{
"query": {
"term": {
"cities.keyword": "Paris Zurich"
}
}
}
It will only match the exact value of field.
On the other hand you can create custom analyzer that will still store the exact value of field (just like keyword) with one exception: the stored value will be converted to lowercase so you will be able to find Paris Zurich as well as paris Zurich. Here is the example:
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [],
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"cities": {
"type": "text",
"fields": {
"lowercased": {
"type": "text",
"analyzer": "lowercase_analyzer"
}
}
}
}
}
}
}
{
"query": {
"term": {
"cities.lowercased": "paris zurich" // Query string should also be in lowercase
}
}
}

How to search a elasticsearch index by partial text of a field in the indexed document?

I have an ElsaticSearch index where I keep certain data. Each document in the index has a field named file_namein a nested document. So a doc looks like
{
...
"file_data":{
"file_name": "sample_filename_acp24_20180223_1222.json"
}
...
}
I want my search to return above document if I search for sample, filename,acp24 and 20180223 and likewise.
So far I tried following analyzers and full text search queries. But still it doesn't return the above doc if I searched for acp24, 20180223.
Index Mapping
{
"index_name": {
"mappings": {
"type": {
"properties": {
"file_data": {
"type": "nested",
"properties": {
"file_name": {
"type": "text",
"analyzer": "keyword_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Analyzer
{
"analysis": {
"analyzer": {
"keyword_analyzer":{
"type": "pattern",
"pattern":"\\W|_",
"lowercase": true
}
}
}
}
Search Query
{
"query": {
"match_phrase_prefix": {
"_all": {
"query": "20180223",
"analyzer": "keyword_analyzer"
}
}
}
}
Any help on how to achieve this is very much appreciated. I have spent so many hours with this and still couldn't find a solution.
If I understand right, you could use the wildcard query :
POST /my_index
{
"query" : {
"wildcard" : {
"file_data.file_name" : {
"wildcard" : "sample_*filename_acp24*", "boost" : 2.0
}
}
}
}
(tested with elasticsearch 6.1, might need to change the syntax for other versions)

Aggregating over _field_names in elasticsearch 5

I'm trying to aggregate over field names in ES 5 as described in Elasticsearch aggregation on distinct keys But the solution described there is not working anymore.
My goal is to get the keys across all the documents. Mapping is the default one.
Data:
PUT products/product/1
{
"param": {
"field1": "data",
"field2": "data2"
}
}
Query:
GET _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 0
}
}
}
}
I get following error: Fielddata is not supported on field [_field_names] of type [_field_names]
After looking around it seems the only way in ES > 5.X to get the unique field names is through the mappings endpoint, and since cannot aggregate on the _field_names you may need to slightly change your data format since the mapping endpoint will return every field regardless of nesting.
My personal problem was getting unique keys for various child/parent documents.
I found if you are prefixing your field names in the format prefix.field when hitting the mapping endpoint it will automatically nest the information for you.
PUT products/product/1
{
"param.field1": "data",
"param.field2": "data2",
"other.field3": "data3"
}
GET products/product/_mapping
{
"products": {
"mappings": {
"product": {
"properties": {
"other": {
"properties": {
"field3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"param": {
"properties": {
"field1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"field2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Then you can grab the unique fields based on the prefix.
This is probably because setting size: 0 is not allowed anymore in ES 5. You have to set a specific size now.
POST _search
{
"aggs": {
"params": {
"terms": {
"field": "_field_names",
"include" : "param.*",
"size": 100 <--- change this
}
}
}
}

Resources