Wildcard doesn't work as expected when querying by more than a word - elasticsearch

If I search documents containing e.g "called" in "message" field I get an expected result, but when I search for "was called", "was called*" or
"*was called*"
I get nothing, although I have a lot of documents whose message field contains the following content "Application was called by REST API".
Here is a part of a query I send:
"wildcard": {
"message": {
"wildcard": "was called",
"boost": 1.0
}
}
Here is a part of the mapping:
"mappings": {
"doc": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
...
"message": {
"type": "text",
"norms": false
}
}
}
}
Indexes I search in are automatically created by Logstash.
I have a similar problem with another field; I have the following value in the field: "NP-00121". *00121 works, but *-00121 doesn't.
edit: and one example more: I have a "requestUri" field containing "/api/v1/log/rest", "/api/v1/log/notification" etc. when I send the following wildcard query I get nothing "/api/v1*".
So it looks like problem appears when using spaces and dashes. Could anyone help me to solve this problem?

Wildcards are used within tokens. Your message field is indexed as text, and so will be tokenized into words.
Basically, you don't need wildcards for a query like "was called". Simply use a phrase query like:
"query": {
"match_phrase" : {
"message" : "was called"
}
}
or if you prefer a query string query:
"query": {
"query_string" : {
"query" : "message:\"was called\""
}
}
A wildcard query would be useful for searching for partial terms, something like:
"query": {
"wildcard" : { "message" : "call*" }
}
If you wanted to find all docs that contain "call", "called" or "calling".
For values like NP-00121, or for URIs, it would likely be more useful if those fields were not analyzed. As it is these are getting separated into tokens ('np' and '00121'), thus the problem you are seeing. You can index these fields as the "keyword" type instead of "text", to have the whole field indexed as a single, unanalyzed token.

Related

Elasticsearch exact multiword (array) query for one field

I try to write a query where I have multiple Exact search terms lets say an array of strings
like
["Q4 Test WC Schüssel", "Q4_18 Bankerlampen", "MORE_SEARCHTERMS"]
And I have an index with an property data.name and I want to search for each of my array strings inside this ONE field for the exact value and I want all entries back where one of my array strings matches.
{
"mappings": {
"_doc": {
"country": {
"type": "keyword"
},
"data": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
I thought this would be an easy task, but I am not shure if I have the wrong google terms where I search for this problem to get an example query.
Use terms query
GET /_search
{
"query": {
"terms": {
"name.keyword": ["Q4 Test WC Schüssel", "Q4_18 Bankerlampen", "MORE_SEARCHTERMS"]
}
}
}

Difference between keyword and text in ElasticSearch

Can someone explain the difference between keyword and text in ElasticSearch with an example?
keyword type:
if you define a field to be of type keyword like this.
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
Then when you make a search query on this field you have to insert the whole value (keyword search) so keyword field.
POST products/_doc
{
"name": "washing machine"
}
when you execute search like this:
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
it will not match any docs. You have to search with the whole word "washing machine".
text type on the other hand is analyzed and you can search using tokens from the field value. a full text search in the whole value:
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
and the search :
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
will return a matching documents.
You can check this to more details keyword Vs. text
The primary difference between the text datatype and the keyword datatype is that text fields are analyzed at the time of indexing, and keyword fields are not.
What that means is, text fields are broken down into their individual terms at indexing to allow for partial matching, while keyword fields are indexed as is.
Keyword Mapping
"channel" : {
"name" : "keyword"
},
"product_image" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
Along with the other advantages of keyword type in elastic search, one more is that you can store any data type inside of it. Be it string, numeric, date, etc.
PUT /demo-index/
{
"mappings": {
"properties": {
"name": { "type": "keyword" }
}
}
}
POST /demo-index/_doc
{
"name": "2021-02-21"
}
POST /demo-index/_doc
{
"name": 100
}
POST /demo-index/_doc
{
"name": "Jhon"
}

Undesired Stopwords in Elastic Search

I am using Elastic Search 6.This is query
PUT /semtesttest
{
"settings": {
"index" : {
"analysis" : {
"filter": {
"my_stop": {
"type": "stop",
"stopwords_path": "analysis1/stopwords.csv"
},
"synonym" : {
"type" : "synonym",
"synonyms_path" : "analysis1/synonym.txt"
}
},
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["synonym","my_stop"]
}
}
}
}
},
"mappings": {
"all_questions": {
"dynamic": "strict",
"properties": {
"kbaid":{
"type": "integer"
},
"answer":{
"type": "text"
},
"question": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
}
PUT /semtesttest/all_questions/1
{
"question":"this is hippie"
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"hippie","fuzziness":2}}
}
}
GET /semtesttest/all_questions/_search
{
"query":{
"fuzzy":{"question":{"value":"this is","fuzziness":2}}
}
}
in synonym.txt it is
this, that, money => sainai
in stopwords.csv it is
hello
how
are
you
The first get ('hippie') return empty
only the second get ('this is') return results
what is the problem? It looks like the stop word "this is" is filtered in the first query, but I have specified my stop words explicitly?
fuzzy is a term query. It is not going to analyze the input, so your query was looking for the exact term this is (applying some fuzzy fun).
So you either want to build a query off those two terms, or use a full text query instead. If fuzziness is important, I think the only full text query is match:
GET /semtesttest/all_questions/_search?pretty
{
"query":{
"match":{"question":{"query":"this is","fuzziness":2}}
}
}
If match phrases is important, you may want to look at this answer and work with span queries.
This might also help you so you can see how your analyzer is being used:
GET /semtesttest/_analyze?analyzer=my_analyzer&field=question&text=this is

Elasticsearch query_string query with multiple default fields

I would like to avail myself of the feature of a query_string query, but I need the query to search by default across a subset of fields (not all, but also not just one). When I try to pass many default fields, the query fails. Any suggestions?
Not specifying a specific field in the query, so I want to search three fields by default:
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"default_field": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}
If you want to create a query on 3 specific fields as above, just use the fields parameter.
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"fields": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}
Alternatively, if you want to search by default on those 3 fields without specifying them, you will have to use the copy_to parameter when you set up the mapping. Then set the default field to be the concatenated field.
PUT my_index
{
"settings": {
"index.query.default_field": "full_name"
},
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
I have used this and don't recommend it because the control over the tokenization can be limiting, as you can only specify one tokenizer for the concatenated field.
Here is the page on copy_to.

Dynamic Mapping for an object field that unwraps the parent path

I am evaluating whether ElasticSearch can meet the needs of a new system I'm building. It looks amazing, so I'm really hopeful I can figure out a mapping strategy that works.
In this system, administrators can define fields to be associated with documents dynamically. So a given type (in the elasticsearch sense of the word) can have any number of fields, which I do not know the name of ahead of time. And each field can be of any type: int, date, string, etc.
An example document may look like:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"tagline": "Ask not what your company can do for you, but..."
}
Notice that there are 2 string fields. Awesome. My problem though is that I want the "tagline" to be analyzed, but I do not want "title" to be analyzed.
Remember I don't know the names of these fields ahead of time. And there could be multiple fields of each type. So there could be 10 string fields of various names, 3 of which should be analyzed and 7 of which should not.
Another requirement I have is that the name the administrator gives the field should also be what they can search by. So, for example, if they want to find all the Vice Interns who have something to say, the lucene query may be:
+title:"Vice Intern" +tagline:"company"
So my thought was that I could define a dynamic mapping. Since I don't know the names of the fields ahead of time, it seems like a great approach. The key though is coming up with a way of differentiating string fields that should be analyzed and ones that shouldn't be!
I thought, hey, I'll just put all the fields that need analyzing into a nested object, like this:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"textfields": {
"tagline": "Ask not what your company can do for you, but...",
"somethingelse": "lorem ipsum",
}
}
Then, in my dynamic mapping, I have a way of mapping those fields differently:
{
"mytype": {
"dynamic_templates": {
"nested_textfields": {
"match": "textfields",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"analyzer": "default"
}
}
}
}
}
I know that isn't right, I actually need some kind of nested mapping, but no matter, because if I understand it correctly, even if I got that working, it would mean those fields are searched for (via lucene syntax) like this:
+title:"Vice Intern" +textfields.tagline:"company"
And I don't want the "textfields" prefix. Since I'm the one providing the textfields object that wraps the text fields, I know that the fields within it are still uniquely named across the entire document.
I thought of using a pattern match instead. So instead of wrapping them in a "textfields" object, I could prefix them, like "textfield_tagline". But when doing that, the {name} token in the dynamic mapping includes the prefix, I don't see a way to just pull out the "*" portion.
Any solution which gets me the necessary behavior is a correct answer. Even if that involves nested mapping information into the documents themselves (can you do that? I've seen something like that, I think...).
EDIT:
I've attempted the following dynamic template. I'm trying to use index_name to remove the 'textfields.' in the index. This dynamic template just doesn't seem to match though, because after putting a document and looking at the mapping I see no analyzer specified.
{
"mytype" : {
"dynamic_templates":
[
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
}
]
}
}
I was able to duplicate the results that you asked for specifically with the following index creation (with mappings), document, and search query. The type does vary a bit, but it serves the purpose of the example.
Index Settings
PUT http://localhost:9200/sandbox
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"mytype": {
"dynamic_templates": [
{
"indexedfields": {
"path_match": "indexedfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
},
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}"
}
}
},
{
"strings": {
"path_match": "*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Document
PUT http://localhost:9200/sandbox/mytype/1
{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
Search
POST http://localhost:9200/sandbox/mytype/_search
{
"query": {
"query_string": {
"query": "message:\"great balls\""
}
},
"filter":{
"query":{
"query_string":{
"query":"username:\"User Name\""
}
}
},
"from":0,
"size":10,
"sort":[
]
}
The search returns the following response:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[
{
"_index":"sandbox",
"_type":"mytype",
"_id":"1",
"_score":0.19178301,
"_source":{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
}
]
}
}

Resources