Elasticsearch 5.X Percolate: How to autogenerate copy_to fields? - elasticsearch

In ES 2.3.3, many queries in the system I'm working on use the _all field. Sometimes these are registered to a percolate index, and when running percolator on the doc, _all is generated automatically.
In converting to ES 5.X _all is being deprecated and so _all has been replaced with a copy_to field that contains the components that we actually care about, and it works great for those searches.
Registering the same query to a percolate index with the same document mapping including copy_to fields works fine. Sending a percolate query with the document never results in a hit for a copy_to field however.
Manually building the copy_to field via simple string concatenation seems to work, it's just that I'd expect to be able to Query -> DocIndex and get the same result as Doc -> PercolateQuery... So I'm just looking for a way to have ES generate the copy_to fields automatically on a document being percolated.

Ended up there was nothing wrong with ES of course, posting here in case it helps someone else. Figured it out while attempting to generate a simpler example to post here with details... Basically the issue came down to the fact that attempting to percolate a document of a type that doesn't exist in the percolate index doesn't give any errors back, but seems to apply all percolate queries without applying any mappings which was just confusing as it worked for simple test cases, but not complex ones. Here's an example:
From the copy_to docs, generate an index with a copy_to mapping. See that a query to the copy_to field works.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
Create a percolate index with the same type
PUT /my_percolate_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
Create a percolate query that matches our other percolate query on the copy_to field, and a second query that just queries on a basic unmodified field
PUT /my_percolate_index/queries/1?refresh
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
PUT /my_percolate_index/queries/2?refresh
{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}
Search, but with the wrong type... there will be a hit on the basic field (first_name: John) even though no document mappings match the request
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "non_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Send in the correct document_type and see both matches as expected
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "my_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.51623213,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"1","_score":0.51623213,"_source":{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}},{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}

Related

How to exclude certain fields from the _source field in ElasticSearch

I have following simple snippet:
PUT /lib36
{
"mappings": {
"_source": {"enabled": false},
"properties": {
"name": {
"type": "text"
},
"description":{
"type": "text"
}
}
}
}
PUT /lib36/_doc/1
{
"name":"abc",
"description":"xyz"
}
POST /lib36/_search
{
"query": {
"match": {
"name": "abc"
}
}
}
With "_source": {"enabled": false}, the queried result doesn't include _source field.
I would to know how to write the query that the query result has the _source field ,but only contain name field, but not description field.
Thanks!
You can use the source filtering of elasticsearch, but for that first you need to have _source enabled in your mapping.
You need to have below key-value in your search query JSON. below search will exclude all other fields apart from name.
{
"query": {
"match": {
"name": "abc"
}
},
"_source": [
"name"
],
}

Elasticsearch - Mapping fields from other indices

How can I define mapping in Elasticsearch 7 to index a document with a field value from another index? For example, if I have a users index which has a mapping for name, email and account_number but the account_number value is actually in another index called accounts in field number.
I've tried something like this without much success (I only see "name", "email" and "account_id" in the results):
PUT users/_mapping
{
"properties": {
"name": {
"type": "text"
},
"email": {
"type": "text"
},
"account_id": {
"type": "integer"
},
"accounts": {
"properties": {
"number": {
"type": "text"
}
}
}
}
}
The accounts index has the following mapping:
{
"properties": {
"name": {
"type": "text"
},
"number": {
"type": "text"
}
}
}
As I understand it, you want to implement field joining as is usually done in relational databases. In elasticsearch, this is possible only if the documents are in the same index. (Link to doc). But it seems to me that in your case you need to work differently, I think your Account object needs to be nested for User.
PUT /users/_mapping
{
"mappings": {
"properties": {
"account": {
"type": "nested"
}
}
}
}
You can further search as if it were a separate document.
GET /users/_search
{
"query": {
"nested": {
"path": "account",
"query": {
"bool": {
"must": [
{ "match": { "account.number": 1 } }
]
}
}
}
}
}

Elasticsearch: Look for a term at the end of text

I'm indexing the following documents in Elasticsearch 5.x:
{
"id": 1
"description": "The quick brown fox"
}
{
"id": 2
"description": "The fox runs fast"
}
If I search for the word "fox" in these documents, I will get both of them.
How can I find documents that their description field ends with the word "fox"? In the above example, I am looking the one with id=1
If possible, I prefer to do this with a Query String.
Well regex should work. Note if you have to look for endwith a lot of time, reindex using a analyzer will be the best (https://discuss.elastic.co/t/elasticsearch-ends-with-word-in-phrases/60473)
"regexp":{
"id": {
"value": "*fox",
}
}
Make sure your index mapping includes a keyword field for the description. For example:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
}
}
}
}
}
Add the documents:
POST my_index/_doc
{
"id": 1,
"description": "The quick brown fox"
}
POST my_index/_doc
{
"id": 2,
"description": "The fox runs fast"
}
And use a query string like this:
GET /my_index/_search
{
"query": {
"query_string" : {
"default_field" : "description.keyword",
"query" : "*fox"
}
}
}
Elastic uses Lucene's Regex engine, which doesn't support everything. To solve your particular problem however, you can use wildcard search or regexp search like so:
GET /products/_search
{
"query": {
"wildcard": {
"description.keyword": {
"value": "*able"
}
}
}
}
GET /products/_search
{
"query": {
"regexp":{
"description.keyword": {
"value": "[a-zA-Z0-9]+fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}

How to force a terms filter to ignore stopwords?

I have an Elasticsearch index with a bunch of fields, some of which I want to use along with the default stopword list. On the other hand, I have a username field which should return results for users called the, be etc.
Of course, when I run the following query:
{
"query": {
"constant_score": {
"filter": {
"terms": {
"username": [
"be"
]
}
}
}
}
}
nothing is returned. I have seen various solutions for changing the standard analyzer to remove stopwords, but am struggling to find how I would do so for this one field only. Thanks for any pointers.
You can do it like the following: add a custom analyzer that shouldn't use stopwords and then explicitly specify this analyzer just for those fields you want stopwords to be recognized (like your username field).
PUT /stopwords
{
"settings": {
"analysis": {
"analyzer": {
"my_english": {
"type": "english",
"stopwords": "_none_"
}
}
}
},
"mappings": {
"text": {
"properties": {
"title": {
"type": "string"
},
"content": {
"type": "string"
},
"username": {
"type": "string",
"analyzer": "my_english"
}
}
}
}
}

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

Resources