Elasticsearch Map case insensitive to not_analyzed documents - elasticsearch

I have a type with following mapping
PUT /testindex
{
"mappings" : {
"products" : {
"properties" : {
"category_name" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
I wanted to search for an exact word.Thats why i set this as not_analyzed.
But the problem is i want to search that with lower case or upper case[case insensitive].
I searched for it and found a way to set case insensitive.
curl -XPOST localhost:9200/testindex -d '{
"mappings" : {
"products" : {
"properties" : {
"category_name":{"type": "string", "index": "analyzed", "analyzer":"lowercase_keyword"}
}
}
}
}'
Is there any way to do these two mappings to same field.?
Thanks..

I think this example meets your needs:
$ curl -XPUT localhost:9200/testindex/ -d '
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"test":{
"properties":{
"title":{
"analyzer":"analyzer_keyword",
"type":"string"
}
}
}
}
}'
taken from here: How to setup a tokenizer in elasticsearch
it uses both the keyword tokenizer and the lowercase filter on a string field which I believe does what you want.

If you want case insensitive queries ONLY, consider changing both your data AND your query to either of lower/upper case before you go about doing your business.
That would mean you keep your field not_analyzed and enter data/query in only one of the cases.

I believe this Gist answers your question best:
* https://gist.github.com/mtyaka/2006966
You can index a field several times during mapping and we do this all the time where one is not_analyzed and another is. We typically set the not_analyzed version to .raw
Like John P. wrote, you can set up analyzer during runtime, or you can set one up in the config at server start like in link above:
# Register the custom 'lowercase_keyword' analyzer. It doesn't do anything else
# other than changing everything to lower case.
index.analysis.analyzer.lowercase_keyword.type: custom
index.analysis.analyzer.lowercase_keyword.tokenizer: keyword
index.analysis.analyzer.lowercase_keyword.filter: [lowercase]
Then you define your mapping for your field(s) with both the not_analyzed version and the analyzed one:
# Map the 'tags' property to two fields: one that isn't analyzed,
# and one that is analyzed with the 'lowercase_keyword' analyzer.
curl -XPUT 'http://localhost:9200/myindex/images/_mapping' -d '{
"images": {
"properties": {
"tags": {
"type": "multi_field",
"fields": {
"tags": {
"index": "not_analyzed",
"type": "string"
},
"lowercased": {
"index": "analyzed",
"analyzer": "lowercase_keyword",
"type": "string"
}
}
}
}
}
}'
And finally your query (note lowercased values before building query to help find match):
# Issue queries against the index. The search query must be manually lowercased.
curl -XPOST 'http://localhost:9200/myindex/images/_search?pretty=true' -d '{
"query": {
"terms": {
"tags.lowercased": [
"event:battle at the boardwalk"
]
}
},
"facets": {
"tags": {
"terms": {
"field": "tags",
"size": "500",
"regex": "^team:.*"
}
}
}
}'

just create your custom analyzer with keyword tokenizer and lowercase token filter.

To this scenarios, I suggest that you could combine lowercase filter and keyword tokenizer into your custom analyzer. And lowercase your search-input keywords.
1.Create index with the analyzer combined with lowercase filter and keyword tokenizer
curl -XPUT localhost:9200/test/ -d '
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"your_custom_analyzer":{
"tokenizer":"keyword",
"filter": ["lowercase"]
}
}
}
}
}'
2.Put mappings and set the field properties with the analyzer
curl -XPUT localhost:9200/test/_mappings/twitter -d '
{
"twitter": {
"properties": {
"content": {"type": "string", "analyzer": "your_custom_analyzer" }
}
}
}'
3.You could search what you want in wildcard query.
curl -XPOST localhost:9200/test/twitter/ -d '{
"query": {
"wildcard": {"content": "**the words you want to search**"}
}
}'
Another way for search a filed in different way. I offser a suggestion for U was that using the multi_fields type.
You could set the field in multi_field
curl -XPUT localhost:9200/test/_mapping/twitter -d '
{
"properties": {
"content": {
"type": "multi_field",
"fields": {
"default": {"type": "string"},
"search": {"type": "string", "analyzer": "your_custom_analyzer"}
}
}
}
}'
So you could index data with above mappings properties. and finally search it in two way (default/your_custom_analyzer)

We could achieve case insensitive searching on non-analyzed strings using ElasticSearch scripting.
Example Query Using Inline Scripting:
{
"query" : {
"bool" : {
"must" : [{
"query_string" : {
"query" : "\"apache\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"script" : {
"script" : "if(doc['verb'].value != null) {doc['verb'].value.equalsIgnoreCase(\"geT\")}"
}
}
]
}
}
}
You need to enable scripting in the elasticsearch.yml file. Using scripts in search queries could reduce your overall search performance. If you want scripts to perform better, then you should make them "native" using java plugin.
Example Plugin Code:
public class MyNativeScriptPlugin extends Plugin {
#Override
public String name() {
return "Indexer scripting Plugin";
}
public void onModule(ScriptModule scriptModule) {
scriptModule.registerScript("my_script", MyNativeScriptFactory.class);
}
public static class MyNativeScriptFactory implements NativeScriptFactory {
#Override
public ExecutableScript newScript(#Nullable Map<String, Object> params) {
return new MyNativeScript(params);
}
#Override
public boolean needsScores() {
return false;
}
}
public static class MyNativeScript extends AbstractSearchScript {
Map<String, Object> params;
MyNativeScript(Map<String, Object> params) {
this.params = params;
}
#Override
public Object run() {
ScriptDocValues<?> docValue = (ScriptDocValues<?>) doc().get(params.get("key"));
if (docValue instanceof Strings) {
return ((String) params.get("value")).equalsIgnoreCase(((Strings) docValue).getValue());
}
return false;
}
}
}
Example Query Using Native Script:
{
"query" : {
"bool" : {
"must" : [{
"query_string" : {
"query" : "\"apache\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"script" : {
"script" : "my_script",
"lang" : "native",
"params" : {
"key" : "verb",
"value" : "GET"
}
}
}
]
}
}
}

it is so simple, just create mapping as follows
{
"mappings" : {
"products" : {
"properties" : {
"category_name" : {
"type" : "string"
}
}
}
}
}
No Need of giving index if you want to work with case insensitive because the default index will be "standard" that will take care of case insensitive.

I wish I could add a comment, but I can't. So the answer to this question is "this is not possible".
Analyzers are composed of a single Tokenizer and zero or more TokenFilters.
I wish I could tell you something else, but spending 4 hours researching, that's the answer. I'm in the same situation. You can't skip tokenization. It's either all on or all off.

Related

Can only use wildcard queries on keyword, text and wildcard fields - not on [id] which is of type [long]

Elasticsearch version 7.13.1
GET test/_mapping
{
"test" : {
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
}
}
}
POST test/_doc/101
{
"id":101,
"name":"hello"
}
POST test/_doc/102
{
"id":102,
"name":"hi"
}
Wildcard Search pattern
GET test/_search
{
"query": {
"query_string": {
"query": "*101* *hello*",
"default_operator": "AND",
"fields": [
"id",
"name"
]
}
}
}
Error is : "reason" : "Can only use wildcard queries on keyword, text and wildcard fields - not on [id] which is of type [long]",
It was working fine in version 7.6.0 ..
What is new change in latest ES and what is the resolution of this issue?
It's not directly possible to perform wildcards on numeric data types. It is better to convert those integers to strings.
You need to modify your index mapping to
PUT /my-index
{
"mappings": {
"properties": {
"code": {
"type": "text"
}
}
}
}
Otherwise, if you want to perform a partial search you can use edge n-gram tokenizer

Only getting results when elasticsearch is case sensitive

I currently have a problem with my multi match search in ES.
Its simple like that: If I'm searching for the City "Sachsen", I'm getting results.
If I'm searching for "sachsen" (lowercase), I'm getting no results.
how to avoid this?
QUERY with no results
{
"match" : {
"City" : {
"query" : "sachsen"
}
}
My analyzer is analyzer_keyword. Should I have anything add ?
MAPPING
City: {
type: "string",
analyzer: "analyzer_keyword"
}
Your analyzer_keyword analyzer is most probably of type keyword which means you can only perform exact matches on it.
It's standard practice to apply multiple "variants" of a field, one of which is going to match lowercase, possibly ascii-tokenized characters (think München -> munchen) and one which will not be tokenized in any way (this is what you have in your analyzer_keyword).
Since you intend to search the lowercase version of Sachsen, your mapping could look something like
PUT sachsen
{
"mappings": {
"properties": {
"City": {
"type": "keyword", <----
"fields": {
"standard": {
"type": "text",
"analyzer": "standard" <----
}
}
}
}
}
}
After indexing a doc
POST sachsen/_doc
{
"City": "Sachsen"
}
The following will work for exact matches:
GET sachsen/_search
{
"query": {
"match": {
"City": "Sachsen"
}
}
}
and this for lowercase
GET sachsen/_search
{
"query": {
"match": {
"City.standard": "sachsen"
}
}
}
Note that I'm using the default, standard analyzer here but you can choose any one you deem appropriate.

ElasticSearch: preserve_position_increments not working

According to the docs
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
preserve_position_increments=false is supposed to make consecutive keywords in a string searchable. But for me it's not working. Is this a bug? Steps to reproduce in Kibana:
PUT /example-index/
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"_doc": {
"properties": {
"example-suggest-field": {
"type": "completion",
"analyzer": "stop",
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
}
PUT /example-index/_doc/1
{
"example-suggest-field": [
{
"input": "Nevermind Nirvana",
"weight" : 10
}
]
}
POST /example-index/_search
{
"suggest": {
"bib-suggest" : {
"prefix" : "nir",
"completion" : {
"field" : "example-suggest-field"
}
}
}
}
POST /example-index/_search
{
"suggest": {
"bib-suggest" : {
"prefix" : "nev",
"completion" : {
"field" : "example-suggest-field"
}
}
}
}
If yes I will make a bug report
It's not a bug, preserve_position_increments is only useful when you are removing stopwords and would like to search for the token coming after the stopword (i.e. search for Beat and find The Beatles).
In your case, you should probably index ["Nevermind", "Nirvana"] instead, i.e. and array of tokens.
If you try to indexing "The Nirvana" instead, you'll find it by searching for nir

ElasticSearch filtering for a tag in array

I've got a bunch of events that are tagged for their audience:
{ id = 123, audiences = ["Public", "Lecture"], ... }
I've trying to do an ElasticSearch query with filtering, so that the search will only return events that have the an exact entry of "Public" in that audiences array (and won't return events that a "Not Public").
How do I do that?
This is what I have so far, but it's returning zero results, even though I definitely have "Public" events:
curl -XGET 'http://localhost:9200/events/event/_search' -d '
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"audiences": "Public"
}
},
"query" : {
"match" : {
"title" : "[searchterm]"
}
}
}
}
}'
You could use this mapping for you content type
{
"your_index": {
"mappings": {
"your_type": {
"properties": {
"audiences": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
}
not_analyzed
Index this field, so it is searchable, but index the
value exactly as specified. Do not analyze it.
And use lowercase term value in search query

Indexing a comma-separated value field in Elastic Search

I'm using Nutch to crawl a site and index it into Elastic search. My site has meta-tags, some of them containing comma-separated list of IDs (that I intend to use for search). For example:
contentTypeIds="2,5,15". (note: no square brackets).
When ES indexes this, I can't search for contentTypeIds:5 and find documents whose contentTypeIds contain 5; this query returns only the documents whose contentTypeIds is exactly "5". However, I do want to find documents whose contentTypeIds contain 5.
In Solr, this is solved by setting the contentTypeIds field to multiValued="true" in the schema.xml. I can't find how to do something similar in ES.
I'm new to ES, so I probably missed something. Thanks for your help!
Create custom analyzer which will split indexed text into tokens by commas.
Then you can try to search. In case you don't care about relevance you can use filter to search through your documents. My example shows how you can attempt search with term filter.
Below you can find how to do this with sense plugin.
DELETE testindex
PUT testindex
{
"index" : {
"analysis" : {
"tokenizer" : {
"comma" : {
"type" : "pattern",
"pattern" : ","
}
},
"analyzer" : {
"comma" : {
"type" : "custom",
"tokenizer" : "comma"
}
}
}
}
}
PUT /testindex/_mapping/yourtype
{
"properties" : {
"contentType" : {
"type" : "string",
"analyzer" : "comma"
}
}
}
PUT /testindex/yourtype/1
{
"contentType" : "1,2,3"
}
PUT /testindex/yourtype/2
{
"contentType" : "3,4"
}
PUT /testindex/yourtype/3
{
"contentType" : "1,6"
}
GET /testindex/_search
{
"query": {"match_all": {}}
}
GET /testindex/_search
{
"filter": {
"term": {
"contentType": "6"
}
}
}
Hope it helps.
POST _analyze
{
"tokenizer": {
"type": "char_group",
"tokenize_on_chars": [
"whitespace",
"-",
"\n",
","
]
},
"text": "QUICK,brown, fox"
}

Resources