How to search a elasticsearch index by partial text of a field in the indexed document? - elasticsearch

I have an ElsaticSearch index where I keep certain data. Each document in the index has a field named file_namein a nested document. So a doc looks like
{
...
"file_data":{
"file_name": "sample_filename_acp24_20180223_1222.json"
}
...
}
I want my search to return above document if I search for sample, filename,acp24 and 20180223 and likewise.
So far I tried following analyzers and full text search queries. But still it doesn't return the above doc if I searched for acp24, 20180223.
Index Mapping
{
"index_name": {
"mappings": {
"type": {
"properties": {
"file_data": {
"type": "nested",
"properties": {
"file_name": {
"type": "text",
"analyzer": "keyword_analyzer",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
Analyzer
{
"analysis": {
"analyzer": {
"keyword_analyzer":{
"type": "pattern",
"pattern":"\\W|_",
"lowercase": true
}
}
}
}
Search Query
{
"query": {
"match_phrase_prefix": {
"_all": {
"query": "20180223",
"analyzer": "keyword_analyzer"
}
}
}
}
Any help on how to achieve this is very much appreciated. I have spent so many hours with this and still couldn't find a solution.

If I understand right, you could use the wildcard query :
POST /my_index
{
"query" : {
"wildcard" : {
"file_data.file_name" : {
"wildcard" : "sample_*filename_acp24*", "boost" : 2.0
}
}
}
}
(tested with elasticsearch 6.1, might need to change the syntax for other versions)

Related

Update "keyword" to "text" field type of an index for inexact words matching in elasticsearch

{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
I tried to update by using below PUT request on the index, but still getting the above ouput of _mapping
{
"_doc" : {
"properties" : {
"city" : {"type" : "text"}
}
}
}
I am not able to query with inexact words because its type is "keyword", for the below the actual value in record is "Mumbai"
{
"query": {
"bool": {
"must": {
"match": {
"city": {
"query": "Mumbi",
"minimum_should_match": "10%"
}
}
}
}
}
}
Below mapping (What is shared in the question) will store 'city' as text and 'city.keyword' as a keyword.
{
"myindex": {
"mappings": {
"properties": {
"city": {
"type": "text", // ==========> Store city as text
"fields": {
"keyword": {
"type": "keyword", // =========> store city.keyword as a keyword
"ignore_above": 256
}
}
}
}
}
}
}
your's is the use case of Fuzzy search and not minimum_should_match.
ES Docs for Fuzzy Search: https://www.elastic.co/blog/found-fuzzy-search
Try below query
{
"query": {
"match": {
"city": {
"query": "mubai",
"fuzziness": "AUTO"
}
}
}
}
minimum_should_match
Minimum number of clauses that must match for a document to be returned
It signifies the percentage of clauses not the percentage of the string. Go through this documentation to frame the query to get the expected results. Invalid queries return invalid results.

How to create and add values to a standard lowercase analyzer in elastic search

Ive been around the houses with this for the past few days trying things in various orders but cant figure out why its not working.
I am trying to create an index in Elasticsearch with an analyzer which is the same as the "standard" analyzer but retains upper case characters when records are stored.
I create my analyzer and index as follows:
PUT /upper
{
"settings": {
"index" : {
"analysis" : {
"analyzer": {
"rebuilt_standard": {
"tokenizer": "standard",
"filter": [
"standard"
]
}
}
}
}
},
"mappings": {
"doc": {
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
}
}
}
Then add two records to test like this...
POST /upper/doc
{
"text" : "TEST"
}
Add a second record...
POST /upper/doc
{
"text" : "test"
}
Using /upper/_settings gives the following:
{
"upper": {
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "upper",
"creation_date": "1537788581060",
"analysis": {
"analyzer": {
"rebuilt_standard": {
"filter": [
"standard"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "s4oDgdsFTxOwsdRuPAWEkg",
"version": {
"created": "6030299"
}
}
}
}
}
But when I search with the following query I still get two matches! Both the upper and lower cases which must mean the analyser is not applied when I store the records.
Search like so...
GET /upper/_search
{
"query": {
"term": {
"text": {
"value": "test"
}
}
}
}
Thanks in advance!
first thing first you set your analyzer on the title field instead of upon the text field (since your search is on the text property, and since you are indexing doc with only text property)
"properties": {
"title": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
try
"properties": {
"text": {
"type": "text",
"analyzer": "rebuilt_standard"
}
}
and keep us posted ;)

Custom analyzer, use case : zip-code [ElasticSearch]

Let be a set index/type named customers/customer.
Each document of this set has a zip-code as property.
Basically, a zip-code can be like:
String-String (ex : 8907-1009)
String String (ex : 211-20)
String (ex : 30200)
I'd like to set my index analyzer to get as many documents as possible that could match. Currently, I work like that :
PUT /customers/
{
"mappings":{
"customer":{
"properties":{
"zip-code": {
"type":"string"
"index":"not_analyzed"
}
some string properties ...
}
}
}
When I search a document I'm using that request :
GET /customers/customer/_search
{
"query":{
"prefix":{
"zip-code":"211-20"
}
}
}
That works if you want to search rigourously. But for instance if the zip-code is "200 30", then searching with "200-30" will not give any results.
I'd like to give orders to my index analyser in order to don't have this problem.
Can someone help me ?
Thanks.
P.S. If you want more information, please let me know ;)
As soon as you want to find variations you don't want to use not_analyzed.
Let's try this with a different mapping:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "standard",
"filter": [ ]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
We're using the standard tokenizer; strings will be broken up at whitespaces and punctuation marks (including dashes) into tokens. You can see the actual tokens if you run the following query:
POST zip/_analyze
{
"analyzer": "zip_code",
"text": ["8907-1009", "211-20", "30200"]
}
Add your examples:
POST zip/_doc
{
"zip": "8907-1009"
}
POST zip/_doc
{
"zip": "211-20"
}
POST zip/_doc
{
"zip": "30200"
}
Now the query seems to work fine:
GET zip/_search
{
"query": {
"match": {
"zip": "211-20"
}
}
}
This will also work if you just search for "211". However, this might be too lenient, since it will also find "20", "20-211", "211-10",...
What you probably want is a phrase search where all the tokens in your query need to be in the field and also in the right order:
GET zip/_search
{
"query": {
"match_phrase": {
"zip": "211"
}
}
}
Addition:
If the ZIP codes have a hierarchical meaning (if you have "211-20" you want this to be found when searching for "211", but not when searching for "20"), you can use the path_hierarchy tokenizer.
So changing the mapping to this:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
Using the same 3 documents from above you can use the match query now:
GET zip/_search
{
"query": {
"match": {
"zip": "1009"
}
}
}
"1009" won't find anything, but "8907" or "8907-1009" will.
If you want to also find "1009", but with a lower score, you'll have to analyze the zip code with both variations I have shown (combine the 2 versions of the mapping):
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_hierarchical": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
},
"zip_standard": {
"tokenizer": "standard",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_standard",
"fields": {
"hierarchical": {
"type": "text",
"analyzer": "zip_hierarchical"
}
}
}
}
}
}
}
Add a document with the inverse order to properly test it:
POST zip/_doc
{
"zip": "1009-111"
}
Then search both fields, but boost the one with the hierarchical tokenizer by 3:
GET zip/_search
{
"query": {
"multi_match" : {
"query" : "1009",
"fields" : [ "zip", "zip.hierarchical^3" ]
}
}
}
Then you can see that "1009-111" has a much higher score than "8907-1009".

How can I find the location that are within the certain range from my input location?

I have a mapping like below:
{
"my_locations": {
"aliases": {
},
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
I know that if the field type of location is "geo_point" then I can use following geo distance query.
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200km",
"location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}
I read that I cannot change the field type for location from text to geo_point(from elastic search documentation and stackoverflow) and I already have many data. So how can I find the location that are within the certain range from my input location?
First, you need to create a new index with the correct data type
PUT my_locations_2
{
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
Then you can use the reindex API in order to copy the data from the old index to the new one:
POST _reindex
{
"source": {
"index": "my_locations"
},
"dest": {
"index": "my_locations_2"
}
}

Is there a way to make elasticsearch case-insensitive without altering the existing documents?

Does Elasticsearch allow us to query documents case-insensitive? Or should I save them as case-insensitive before querying? Or is there some setting that I should set for the whole index to make it case-insensitive?
Can you clarify this moment please?
By Default, the fields are case-insensitive because of the mapping elastic applied.
Try below:
PUT myindex/doc/1
{
"name":"TEST"
}
GET myindex/_mapping
It should return :
{
"myindex": {
"mappings": {
"doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now if you query with below, it will return a match (notice the mapping[text and keyword]):
POST myindex/_search
{
"query": {
"match": {
"name2": "test"
}
}
}
Now, if you explicitly specify to index the field as keyword, then it will be case-sensitive search. Try below and see; it will not return any results.
PUT myindex/_mapping/doc
{
"properties": {
"name2": {
"type": "keyword"
}
}
}
PUT myindex/doc/1
{
"name2":"TEST"
}
POST myindex/_search
{
"query": {
"match": {
"name2": "test"
}
}
}
TLDR; Use default mapping or text type- if you specify the field to index only keyword type, it will be case-sensitive.

Resources