Searching in all fields, case insensitive, and not analyzed - elasticsearch

In elasticSearch,
How can I define a dynamic default mapping for any field (the fields are not predefined) that is searchable with spaces and case insensitive values.
For example, if i have two documents:
PUT myindex/mytype/1
{
"transaction": "test"
}
and
PUT myindex/mytype/2
{
"transaction": "test SPACE"
}
I'd like to perform the following queries:
Querying: "test", Expected result: "test"
Querying: "test space", Expected result "test SPACE"
I've tried to use:
PUT myindex
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}
But it gives me both document as result when looking for "test".

Apparently there was a mistake running my query:
Here's a solution I found to this problem, when using multi field query:
#any field mapping - not analyzed and case insensitive
PUT /test_index
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
},
"mappings": {
"doc": {
"dynamic_templates": [
{ "notanalyzed": {
"match_mapping_type": "string",
"mapping": {
"type": "string",
"analyzer":"analyzer_keyword"
}
}
}
]
}
}
}
#index test data
POST /test_index/doc/_bulk
{"index":{"_id":3}}
{"name":"Company Solutions", "a" : "a1"}
{"index":{"_id":4}}
{"name":"Company", "a" : "a2"}
#search for document with name “company” and a “a1”
POST /test_index/doc/_search
{
"query" : {
"filtered" : {
"filter": {
"and": {
"filters": [
{
"query": {
"match": {
"name": "company"
}
}
},
{
"query": {
"match": {
"a": "a2"
}
}
}
]
}
}
}
}
}

Related

how to query for phrases(shingles) in Elasticsearch

I have the following string "Word1 Word2 StopWord1 StopWord2 Word3 Word4".
When I query for this string using ["bool"]["must"]["match"], I would like to return all text that matches "Word1Word2" and/or "Word3Word4".
I have created an analyzer that I would like to use for indexing and searching.
Using analyze API, I have confirmed that indexing is being done correctly. The shingles returned are "Word1Word2" and "Word3Word4"
I want to query so that text matching "Word1Word2" and/or "Word3Word4" are returned. How can I do this dynamically - meaning, I don't know up front how many shingles will be generated, so I don't know how many match_phrase to code up in a query.
"should":[
{ "match_phrase" : {"content": phrases[0]}},
{ "match_phrase" : {"content": phrases[1]}}
]
To query for shingles(and unigrams), you could set up your mappings to handle them cleanly in separate fields. In the example below, the field "shingles" will be used to analyze and retrieve shingles, while the implicit field will be used to handle unigrams.
PUT /my_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"my_shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 2,
"output_unigrams": false
}
},
"analyzer": {
"my_shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_shingle_filter"
]
}
}
}
}
}
PUT /my_index/_mapping/my_type
{
"my_type": {
"properties": {
"title": {
"type": "string",
"fields": {
"shingles": {
"type": "string",
"analyzer": "my_shingle_analyzer"
}
}
}
}
}
}
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": {
"match": {
"title": "<your query string>"
}
},
"should": {
"match": {
"title.shingles": "<your query string"
}
}
}
}
}
Ref. Elasticsearch: The Definitive Guide....

Custom analyzer, use case : zip-code [ElasticSearch]

Let be a set index/type named customers/customer.
Each document of this set has a zip-code as property.
Basically, a zip-code can be like:
String-String (ex : 8907-1009)
String String (ex : 211-20)
String (ex : 30200)
I'd like to set my index analyzer to get as many documents as possible that could match. Currently, I work like that :
PUT /customers/
{
"mappings":{
"customer":{
"properties":{
"zip-code": {
"type":"string"
"index":"not_analyzed"
}
some string properties ...
}
}
}
When I search a document I'm using that request :
GET /customers/customer/_search
{
"query":{
"prefix":{
"zip-code":"211-20"
}
}
}
That works if you want to search rigourously. But for instance if the zip-code is "200 30", then searching with "200-30" will not give any results.
I'd like to give orders to my index analyser in order to don't have this problem.
Can someone help me ?
Thanks.
P.S. If you want more information, please let me know ;)
As soon as you want to find variations you don't want to use not_analyzed.
Let's try this with a different mapping:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "standard",
"filter": [ ]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
We're using the standard tokenizer; strings will be broken up at whitespaces and punctuation marks (including dashes) into tokens. You can see the actual tokens if you run the following query:
POST zip/_analyze
{
"analyzer": "zip_code",
"text": ["8907-1009", "211-20", "30200"]
}
Add your examples:
POST zip/_doc
{
"zip": "8907-1009"
}
POST zip/_doc
{
"zip": "211-20"
}
POST zip/_doc
{
"zip": "30200"
}
Now the query seems to work fine:
GET zip/_search
{
"query": {
"match": {
"zip": "211-20"
}
}
}
This will also work if you just search for "211". However, this might be too lenient, since it will also find "20", "20-211", "211-10",...
What you probably want is a phrase search where all the tokens in your query need to be in the field and also in the right order:
GET zip/_search
{
"query": {
"match_phrase": {
"zip": "211"
}
}
}
Addition:
If the ZIP codes have a hierarchical meaning (if you have "211-20" you want this to be found when searching for "211", but not when searching for "20"), you can use the path_hierarchy tokenizer.
So changing the mapping to this:
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_code": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_code"
}
}
}
}
}
Using the same 3 documents from above you can use the match query now:
GET zip/_search
{
"query": {
"match": {
"zip": "1009"
}
}
}
"1009" won't find anything, but "8907" or "8907-1009" will.
If you want to also find "1009", but with a lower score, you'll have to analyze the zip code with both variations I have shown (combine the 2 versions of the mapping):
PUT zip
{
"settings": {
"number_of_shards": 1,
"analysis": {
"analyzer": {
"zip_hierarchical": {
"tokenizer": "zip_tokenizer",
"filter": [ ]
},
"zip_standard": {
"tokenizer": "standard",
"filter": [ ]
}
},
"tokenizer": {
"zip_tokenizer": {
"type": "path_hierarchy",
"delimiter": "-"
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"zip": {
"type": "text",
"analyzer": "zip_standard",
"fields": {
"hierarchical": {
"type": "text",
"analyzer": "zip_hierarchical"
}
}
}
}
}
}
}
Add a document with the inverse order to properly test it:
POST zip/_doc
{
"zip": "1009-111"
}
Then search both fields, but boost the one with the hierarchical tokenizer by 3:
GET zip/_search
{
"query": {
"multi_match" : {
"query" : "1009",
"fields" : [ "zip", "zip.hierarchical^3" ]
}
}
}
Then you can see that "1009-111" has a much higher score than "8907-1009".

How to escape backslash in ElasticSerach query

I have the following field in ElasticSearch:
"type": "doc\doc1"
My goal is to select a special type, I tried:
GET /my_index/my_type/_search
{
"query":{
"bool":{
"must":{
"term":{
"type": "doc\\doc1"
}
}
}
}
}
but it does not works, I tried:
"type": "doc\\\\doc1"
"type": "\"doc\\\\doc1"\"
"type": "\"doc\\doc1"\"
but the query returns no results.
I tried with:
GET /my_index/my_type/_search
{
"query" : {
"query_string" : {
"query" : "doc\\doc1",
"analyzer": "keyword"
}
}
}
But it's the same output.
Any helps would be greatly appreciated
Thanks!
You need to escape backslash in your field value.
Then you can search the exact value with the keyword analyzer with term query or instead you can use match query with analyzed value:
PUT test
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"type1": {
"properties": {
"field1": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
}
PUT test/type1/1
{
"field1" : "doc\\doc1"
}
POST test/_search?pretty
{
"query": {
"bool": {
"must": {
"term": {
"field1.raw": "doc\\doc1"
}
}
}
}
}
POST test/_search?pretty
{
"query": {
"bool": {
"must": {
"match": {
"field1": "doc\\doc1"
}
}
}
}
}

wildcard on different tokens in elastic search

I have a document which looks like this
Name
Thomy tyson Olando Magua
Using ngram i was able to acheive the wildcard search so that if i type in omy tyson it can return me the above document pretty much similar to this sql query
select name from table where name like '%omy tyson%'
PUT sample
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "15"
}
}
}
},
"mappings": {
"typename": {
"properties": {
"name": {
"type": "string",
"fields": {
"search": {
"type": "string",
"analyzer": "my_ngram_analyzer"
}
}
}
}
}
}
}
PUT sample/typename/2
{
"name": "Thomy tyson Olando Magua"
}
{
"query": {
"bool": {
"should": [
{
"term": {
"name.search": "omy tyson"
}
}
]
}
}
}
Is there a way in elastic search where i can perform wildcard search on 2 different words separated by other words like
select name from table where name like '%omy Magua%'
So in this case i would like to perform partial search on first and fourth word.
Any feedback would be helpfull

Elasticsearch index search for currency $ and £ signs

In some of my documents I have $ or £ symbols. I want to search for £ and retrieve documents containing that symbol. I've gone through the documentation but I'm getting some cognitive dissonance.
# Delete the `my_index` index
DELETE /my_index
# Create a custom analyzer
PUT /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and ",
"$=> dollar "
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
}
This returns "the", "quick", "and", "brown", "fox" just as the documentation states:
# Test out the new analyzer
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%26%20brown%20fox
This returns "the", "quick", "dollar", "brown", "fox"
GET /my_index/_analyze?analyzer=my_analyzer&text=The%20quick%20%24%20brown%20fox
Adding some records:
PUT /my_index/test/1
{
"title": "The quick & fast fox"
}
PUT /my_index/test/1
{
"title": "The daft fox owes me $100"
}
I would have thought if I search for "dollar", I would get a result? Instead I get no results:
GET /my_index/test/_search
{ "query": {
"simple_query_string": {
"query": "dollar"
}
}
}
Or even using '$' with an analyzer:
GET /my_index/test/_search
{ "query": {
"query_string": {
"query": "dollar10",
"analyzer": "my_analyzer"
}
}
}
Your problem is that you specify a custom analyzer but you never use that. If you use term vertors you can verify that. So follow that steps:
When creating and index set custom analyzer for the `title field:
GET /my_index
{
"settings": {
"analysis": {
"char_filter": {
"&_to_and": {
"type": "mapping",
"mappings": [
"&=> and ",
"$=> dollar "
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"html_strip",
"&_to_and"
],
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}, "mappings" :{
"test" : {
"properties" : {
"title" : {
"type":"string",
"analyzer":"my_analyzer"
}
}
}
}
}
Insert data:
PUT my_index/test/1
{
"title": "The daft fox owes me $100"
}
Check for term vectors:
GET /my_index/test/1/_termvectors?fields=title
Response:
{
"_index":"my_index",
"_type":"test",
"_id":"1",
"_version":1,
"found":true,
"took":3,
"term_vectors":{
"title":{
"field_statistics":{
"sum_doc_freq":6,
"doc_count":1,
"sum_ttf":6
},
"terms":{
"daft":{
"term_freq":1,
"tokens":[
{
"position":1,
"start_offset":4,
"end_offset":8
}
]
},
"dollar100":{ <-- You can see it here
"term_freq":1,
"tokens":[
{
"position":5,
"start_offset":21,
"end_offset":25
}
]
},
"fox":{
"term_freq":1,
"tokens":[
{
"position":2,
"start_offset":9,
"end_offset":12
}
]
},
"me":{
"term_freq":1,
"tokens":[
{
"position":4,
"start_offset":18,
"end_offset":20
}
]
},
"owes":{
"term_freq":1,
"tokens":[
{
"position":3,
"start_offset":13,
"end_offset":17
}
]
},
"the":{
"term_freq":1,
"tokens":[
{
"position":0,
"start_offset":0,
"end_offset":3
}
]
}
}
}
}
}
Now search:
GET /my_index/test/_search
{
"query": {
"match": {
"title": "dollar100"
}
}
}
That will find the match. But searching with query string as:
GET /my_index/test/_search
{ "query": {
"simple_query_string": {
"query": "dollar100"
}
}
}
won't find anything. Because it searches special _all field. And as I can see it aggregates fields as they are not analyzed:
GET /my_index/test/_search
{
"query": {
"match": {
"_all": "dollar100"
}
}
}
does not find a result. But:
GET /my_index/test/_search
{
"query": {
"match": {
"_all": "$100"
}
}
}
finds. I am not sure but the reason for that can be that the default analyzer is not the custom analyzer. To set a custom analyzer as default check:
Changing the default analyzer in ElasticSearch or LogStash
http://elasticsearch-users.115913.n3.nabble.com/How-we-can-change-Elasticsearch-default-analyzer-td4040411.html
http://grokbase.com/t/gg/elasticsearch/148kwsxzee/overriding-built-in-analyzer-and-set-it-as-default
http://elasticsearch-users.115913.n3.nabble.com/How-to-set-the-default-analyzer-td3935275.html

Resources