Search case insensitive - elasticsearch

I want to search string contain string in field. Like search like in SQL
I tried to use the regex to search.
$params["query"]["bool"]["filter"][]["regexp"][$item_key] = '.*'.$search_pattern.'.*'
I can only search for lower word. For upper word , it is not working.
Example:
my title is : ABC
if search text is : abc -> has result
if search text is : ABC -> no result
My mapping config is :
`'mappings' => [
'items' => [
"title" => [
"type" => "text",
"fields" => [
"keyword" => [
"type" => "keyword",
]
],
"fielddata" => true,
"index" => "not_analyzed",
]
]
]`
Does anyone have any idea for search in case insensitive?
Thank you very much.

You are using prefix query inside the bool filter. Elasticsearch don't analyze the query for filters. You can move your regex to query context where the query terms will be analyzed by the index analyzer. To make sure terms are same while indexing and searching elastic appiles the same analyzer while searching.
You can apply a standard analyzer
{
"mappings": {
"items": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"fielddata": true,
"analyzer": "standard"
}
}
}
}
}
Query
{
"query": {
"regexp":{
"title": "ab.*"
}
}
}
Hope this helps

Related

Updating Documents in Elasticsearch is not Applying the Custom Analyzer to the Fields Data

I have a field that has a custom analyzer on it that is to put the data into lowercase.
The analyzer is defined as:
"analysis" : {
"analyzer" : {
"custom_keyword_analyzer" : {
"filter" : [
"lowercase"
],
"type" : "custom",
"tokenizer" : "keyword"
}
}
}
With the mapping on the field like:
"Field" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
},
"copy_to" : [
"all_field"
],
"analyzer" : "custom_keyword_analyzer",
"fielddata" : true
}
When creating documents normally with data in Field the analyzer is working correctly. Field has the data in lowercase and Field.raw has the original un-analyzed data in it.
However if the documents are created without anything in Field but are later updated, the analyzer is not used, Field has the un-analyzed data in it and Field.raw is empty.
I have tried manually scripting bulk updates in python, and also using _update_by_query to perform the updates. In no cases can I get the analyser to work on the updated data.
I don't know how you are verifying that your updated doc is not having the analyzer impact, below is complete example to show you, it works and how you can check it.
Index mapping according to your def
{
"settings": {
"analysis": {
"analyzer": {
"custom_keyword_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
},
"copy_to": [
"all_field"
],
"analyzer": "custom_keyword_analyzer",
"fielddata": true
}
}
}
}
Index sample doc
{
"title" : "Hello world"
}
Check the analyzed value in inverted index
use _search endpoint with below query
{
"docvalue_fields": [
"title",
"title.raw"
],
"query": {
"term": {
"_id": 1
}
}
}
Result of above query
"_source": {
"title": "Hello world" // actual indexed content
},
"fields": {
"title.raw": [
"Hello world" // same as keyword analyzer
],
"title": [
"hello world" // notice lowercased `h`.
]
}
}
Now update the doc, by using the PUT API
{
"title" : "Hello world Updated" // note `U` in `Updated`
}
And again use the same _search query
"_source": {
"title": "Hello world Updated"
},
"fields": {
"title.raw": [
"Hello world Updated"
],
"title": [
"hello world updated" // note lowercase
]
}
As you can see, even after updating the document analyzer impact is present and that how it works and can be verified, it's very core functionality and can't be broken and maybe you are missing something while verifying and above method should give you some way to identify your mistake

Undesired Stopwords in Elastic Search

I am using Elastic Search 6.This is query
PUT /semtesttest
{
"settings": {
"index" : {
"analysis" : {
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "analysis1/stopwords.csv"
},
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis1/synonym.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["synonym","my_stop"]
}
}
}
}
},
"mappings": {
"all_questions": {
"dynamic": "strict",
"properties": {
"kbaid":{
"type": "integer"
},
"answer":{
"type": "text"
},
"question": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
PUT /semtesttest/all_questions/1
{
"question":"this is hippie"
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"hippie","fuzziness":2}}
}
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"this is","fuzziness":2}}
}
}
in synonym.txt it is
this, that, money => sainai
in stopwords.csv it is
hello
how
are
you
The first get ('hippie') return empty
only the second get ('this is') return results
what is the problem? It looks like the stop word "this is" is filtered in the first query, but I have specified my stop words explicitly?
fuzzy is a term query. It is not going to analyze the input, so your query was looking for the exact term this is (applying some fuzzy fun).
So you either want to build a query off those two terms, or use a full text query instead. If fuzziness is important, I think the only full text query is match:
GET /semtesttest/all_questions/_search?pretty
{
"query":{
"match":{"question":{"query":"this is","fuzziness":2}}
}
}
If match phrases is important, you may want to look at this answer and work with span queries.
This might also help you so you can see how your analyzer is being used:
GET /semtesttest/_analyze?analyzer=my_analyzer&field=question&text=this is

Elasticsearch query_string query with multiple default fields

I would like to avail myself of the feature of a query_string query, but I need the query to search by default across a subset of fields (not all, but also not just one). When I try to pass many default fields, the query fails. Any suggestions?
Not specifying a specific field in the query, so I want to search three fields by default:
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"default_field": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}
If you want to create a query on 3 specific fields as above, just use the fields parameter.
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"fields": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}
Alternatively, if you want to search by default on those 3 fields without specifying them, you will have to use the copy_to parameter when you set up the mapping. Then set the default field to be the concatenated field.
PUT my_index
{
"settings": {
"index.query.default_field": "full_name"
},
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
I have used this and don't recommend it because the control over the tokenization can be limiting, as you can only specify one tokenizer for the concatenated field.
Here is the page on copy_to.

Elastic Search,lowercase search doesnt work

I am trying to search again content using prefix and if I search for diode I get results that differ from Diode. How do I get ES to return result where both diode and Diode return the same results? This is the mappings and settings I am using in ES.
"settings":{
"analysis": {
"analyzer": {
"lowercasespaceanalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"articles": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword",
"index": "true"
},
"imageurl": {
"type": "keyword",
"index": "true"
},
"content": {
"type": "text",
"analyzer" : "lowercasespaceanalyzer",
"search_analyzer":"whitespace"
},
"description": {
"type": "text"
},
"relatedcontentwords": {
"type": "text"
},
"cmskeywords": {
"type": "text"
},
"partnumbers": {
"type": "keyword",
"index": "true"
},
"pubdate": {
"type": "date"
}
}
}
}
here is an example of the query I use
POST _search
{
"query": {
"bool" : {
"must" : {
"prefix" : { "content" : "capacitance" }
}
}
}
}
it happens because you use two different analyzers at search time and at indexing time.
So when you input query "Diod" at search time because you use "whitespace" analyzer your query is interpreted as "Diod".
However, because you use "lowercasespaceanalyzer" at index time "Diod" will be indexed as "diod". Just use the same analyzer both at search and index time, or analyzer that lowercases your strings because default "whitespace" analyzer doesn't https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html
There will be no term of Diode in your index. So if you want to get same results, you should let your query context analyzed by same analyzer.
You can use Query string query like
"query_string" : {
"default_field" : "content",
"query" : "Diode",
"analyzer" : "lowercasespaceanalyzer"
}
UPDATE
You can analyze your context before query.
AnalyzeResponse resp = client.admin().indices()
.prepareAnalyze(index, text)
.setAnalyzer("lowercasespaceanalyzer")
.get();
String analyzedContext = resp.getTokens().get(0);
...
Then use analyzedContext as new query context.

Elasticsearch indexing multi_field with array field

I'm new to elasticsearch and I'm trying to create a multi_field index with string and array of strings fields. With the string fields it's all working great but when I try to get some results when they are inside an array, it returns an empty one.
My data:
{
"string_field_one": "one",
"string_field_two": "two",
"array_of_strings_field": [
"2010", "2011", "2012", "2013"
]
}
Mappings:
{
"string_field_one" : {
"type" : "string",
"analyzer": "snowball",
"copy_to": "group"
},
"string_field_two" : {
"type" : "string",
"analyzer": "snowball",
"copy_to": "group"
},
"array_of_strings_field" : {
"type" : "string",
"analyzer": "keyword",
"copy_to": "group"
}
"group" : {
"type": "multi_field"
}
}
Search:
"body": {
"query": {
"multi_match": {
"query": "one two 2010",
type: "cross_fields",
operator: "and",
fields: [
"string_field_one",
"string_field_two",
"array_of_strings_field",
"group"
]
}
}
}
Expecting:
Searching by one, two, or 2010 should return the result
Searching by one two should return the result
Searching by one two 2010 should return the result
Searching by one two 2008 should not return the result
What am I missing?
Cross_fields has the constraint all fields should have the same search analyzer or rather all the query terms should occur in fields with same search analyzer. Which is not the case here.
You would need to use query_string for the above case:
Example:
"body": {
"query": {
"query_string": {
"query": "one two 2010",
"default_operator": "AND",
"fields": [
"string_field_one",
"string_field_two",
"array_of_strings_field",
"group"
]
}
}
}

Resources