Elasticsearch query_string query with multiple default fields - elasticsearch

I would like to avail myself of the feature of a query_string query, but I need the query to search by default across a subset of fields (not all, but also not just one). When I try to pass many default fields, the query fails. Any suggestions?
Not specifying a specific field in the query, so I want to search three fields by default:
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"default_field": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}

If you want to create a query on 3 specific fields as above, just use the fields parameter.
{
"query": {
"query_string" : {
"query" : "some search using advanced operators OR dog",
"fields": ["Title", "Description", "DesiredOutcomeDescription"]
}
}
}
Alternatively, if you want to search by default on those 3 fields without specifying them, you will have to use the copy_to parameter when you set up the mapping. Then set the default field to be the concatenated field.
PUT my_index
{
"settings": {
"index.query.default_field": "full_name"
},
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
I have used this and don't recommend it because the control over the tokenization can be limiting, as you can only specify one tokenizer for the concatenated field.
Here is the page on copy_to.

Related

Difference between keyword and text in ElasticSearch

Can someone explain the difference between keyword and text in ElasticSearch with an example?
keyword type:
if you define a field to be of type keyword like this.
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
}
Then when you make a search query on this field you have to insert the whole value (keyword search) so keyword field.
POST products/_doc
{
"name": "washing machine"
}
when you execute search like this:
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
it will not match any docs. You have to search with the whole word "washing machine".
text type on the other hand is analyzed and you can search using tokens from the field value. a full text search in the whole value:
PUT products
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
and the search :
GET products/_search
{
"query": {
"match": {
"name": "washing"
}
}
}
will return a matching documents.
You can check this to more details keyword Vs. text
The primary difference between the text datatype and the keyword datatype is that text fields are analyzed at the time of indexing, and keyword fields are not.
What that means is, text fields are broken down into their individual terms at indexing to allow for partial matching, while keyword fields are indexed as is.
Keyword Mapping
"channel" : {
"name" : "keyword"
},
"product_image" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
Along with the other advantages of keyword type in elastic search, one more is that you can store any data type inside of it. Be it string, numeric, date, etc.
PUT /demo-index/
{
"mappings": {
"properties": {
"name": { "type": "keyword" }
}
}
}
POST /demo-index/_doc
{
"name": "2021-02-21"
}
POST /demo-index/_doc
{
"name": 100
}
POST /demo-index/_doc
{
"name": "Jhon"
}

Wildcard doesn't work as expected when querying by more than a word

If I search documents containing e.g "called" in "message" field I get an expected result, but when I search for "was called", "was called*" or
"*was called*"
I get nothing, although I have a lot of documents whose message field contains the following content "Application was called by REST API".
Here is a part of a query I send:
"wildcard": {
"message": {
"wildcard": "was called",
"boost": 1.0
}
}
Here is a part of the mapping:
"mappings": {
"doc": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
...
"message": {
"type": "text",
"norms": false
}
}
}
}
Indexes I search in are automatically created by Logstash.
I have a similar problem with another field; I have the following value in the field: "NP-00121". *00121 works, but *-00121 doesn't.
edit: and one example more: I have a "requestUri" field containing "/api/v1/log/rest", "/api/v1/log/notification" etc. when I send the following wildcard query I get nothing "/api/v1*".
So it looks like problem appears when using spaces and dashes. Could anyone help me to solve this problem?
Wildcards are used within tokens. Your message field is indexed as text, and so will be tokenized into words.
Basically, you don't need wildcards for a query like "was called". Simply use a phrase query like:
"query": {
"match_phrase" : {
"message" : "was called"
}
}
or if you prefer a query string query:
"query": {
"query_string" : {
"query" : "message:\"was called\""
}
}
A wildcard query would be useful for searching for partial terms, something like:
"query": {
"wildcard" : { "message" : "call*" }
}
If you wanted to find all docs that contain "call", "called" or "calling".
For values like NP-00121, or for URIs, it would likely be more useful if those fields were not analyzed. As it is these are getting separated into tokens ('np' and '00121'), thus the problem you are seeing. You can index these fields as the "keyword" type instead of "text", to have the whole field indexed as a single, unanalyzed token.

Elasticsearch 5.X Percolate: How to autogenerate copy_to fields?

In ES 2.3.3, many queries in the system I'm working on use the _all field. Sometimes these are registered to a percolate index, and when running percolator on the doc, _all is generated automatically.
In converting to ES 5.X _all is being deprecated and so _all has been replaced with a copy_to field that contains the components that we actually care about, and it works great for those searches.
Registering the same query to a percolate index with the same document mapping including copy_to fields works fine. Sending a percolate query with the document never results in a hit for a copy_to field however.
Manually building the copy_to field via simple string concatenation seems to work, it's just that I'd expect to be able to Query -> DocIndex and get the same result as Doc -> PercolateQuery... So I'm just looking for a way to have ES generate the copy_to fields automatically on a document being percolated.
Ended up there was nothing wrong with ES of course, posting here in case it helps someone else. Figured it out while attempting to generate a simpler example to post here with details... Basically the issue came down to the fact that attempting to percolate a document of a type that doesn't exist in the percolate index doesn't give any errors back, but seems to apply all percolate queries without applying any mappings which was just confusing as it worked for simple test cases, but not complex ones. Here's an example:
From the copy_to docs, generate an index with a copy_to mapping. See that a query to the copy_to field works.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
Create a percolate index with the same type
PUT /my_percolate_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
Create a percolate query that matches our other percolate query on the copy_to field, and a second query that just queries on a basic unmodified field
PUT /my_percolate_index/queries/1?refresh
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
PUT /my_percolate_index/queries/2?refresh
{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}
Search, but with the wrong type... there will be a hit on the basic field (first_name: John) even though no document mappings match the request
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "non_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Send in the correct document_type and see both matches as expected
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "my_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.51623213,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"1","_score":0.51623213,"_source":{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}},{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

Dynamic Mapping for an object field that unwraps the parent path

I am evaluating whether ElasticSearch can meet the needs of a new system I'm building. It looks amazing, so I'm really hopeful I can figure out a mapping strategy that works.
In this system, administrators can define fields to be associated with documents dynamically. So a given type (in the elasticsearch sense of the word) can have any number of fields, which I do not know the name of ahead of time. And each field can be of any type: int, date, string, etc.
An example document may look like:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"tagline": "Ask not what your company can do for you, but..."
}
Notice that there are 2 string fields. Awesome. My problem though is that I want the "tagline" to be analyzed, but I do not want "title" to be analyzed.
Remember I don't know the names of these fields ahead of time. And there could be multiple fields of each type. So there could be 10 string fields of various names, 3 of which should be analyzed and 7 of which should not.
Another requirement I have is that the name the administrator gives the field should also be what they can search by. So, for example, if they want to find all the Vice Interns who have something to say, the lucene query may be:
+title:"Vice Intern" +tagline:"company"
So my thought was that I could define a dynamic mapping. Since I don't know the names of the fields ahead of time, it seems like a great approach. The key though is coming up with a way of differentiating string fields that should be analyzed and ones that shouldn't be!
I thought, hey, I'll just put all the fields that need analyzing into a nested object, like this:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"textfields": {
"tagline": "Ask not what your company can do for you, but...",
"somethingelse": "lorem ipsum",
}
}
Then, in my dynamic mapping, I have a way of mapping those fields differently:
{
"mytype": {
"dynamic_templates": {
"nested_textfields": {
"match": "textfields",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"analyzer": "default"
}
}
}
}
}
I know that isn't right, I actually need some kind of nested mapping, but no matter, because if I understand it correctly, even if I got that working, it would mean those fields are searched for (via lucene syntax) like this:
+title:"Vice Intern" +textfields.tagline:"company"
And I don't want the "textfields" prefix. Since I'm the one providing the textfields object that wraps the text fields, I know that the fields within it are still uniquely named across the entire document.
I thought of using a pattern match instead. So instead of wrapping them in a "textfields" object, I could prefix them, like "textfield_tagline". But when doing that, the {name} token in the dynamic mapping includes the prefix, I don't see a way to just pull out the "*" portion.
Any solution which gets me the necessary behavior is a correct answer. Even if that involves nested mapping information into the documents themselves (can you do that? I've seen something like that, I think...).
EDIT:
I've attempted the following dynamic template. I'm trying to use index_name to remove the 'textfields.' in the index. This dynamic template just doesn't seem to match though, because after putting a document and looking at the mapping I see no analyzer specified.
{
"mytype" : {
"dynamic_templates":
[
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
}
]
}
}
I was able to duplicate the results that you asked for specifically with the following index creation (with mappings), document, and search query. The type does vary a bit, but it serves the purpose of the example.
Index Settings
PUT http://localhost:9200/sandbox
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"mytype": {
"dynamic_templates": [
{
"indexedfields": {
"path_match": "indexedfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
},
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}"
}
}
},
{
"strings": {
"path_match": "*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Document
PUT http://localhost:9200/sandbox/mytype/1
{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
Search
POST http://localhost:9200/sandbox/mytype/_search
{
"query": {
"query_string": {
"query": "message:\"great balls\""
}
},
"filter":{
"query":{
"query_string":{
"query":"username:\"User Name\""
}
}
},
"from":0,
"size":10,
"sort":[
]
}
The search returns the following response:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[
{
"_index":"sandbox",
"_type":"mytype",
"_id":"1",
"_score":0.19178301,
"_source":{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
}
]
}
}

Resources