Elasticsearch: Look for a term at the end of text - elasticsearch

I'm indexing the following documents in Elasticsearch 5.x:
{
"id": 1
"description": "The quick brown fox"
}
{
"id": 2
"description": "The fox runs fast"
}
If I search for the word "fox" in these documents, I will get both of them.
How can I find documents that their description field ends with the word "fox"? In the above example, I am looking the one with id=1
If possible, I prefer to do this with a Query String.

Well regex should work. Note if you have to look for endwith a lot of time, reindex using a analyzer will be the best (https://discuss.elastic.co/t/elasticsearch-ends-with-word-in-phrases/60473)
"regexp":{
"id": {
"value": "*fox",
}
}

Make sure your index mapping includes a keyword field for the description. For example:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
}
}
}
}
}
Add the documents:
POST my_index/_doc
{
"id": 1,
"description": "The quick brown fox"
}
POST my_index/_doc
{
"id": 2,
"description": "The fox runs fast"
}
And use a query string like this:
GET /my_index/_search
{
"query": {
"query_string" : {
"default_field" : "description.keyword",
"query" : "*fox"
}
}
}

Elastic uses Lucene's Regex engine, which doesn't support everything. To solve your particular problem however, you can use wildcard search or regexp search like so:
GET /products/_search
{
"query": {
"wildcard": {
"description.keyword": {
"value": "*able"
}
}
}
}
GET /products/_search
{
"query": {
"regexp":{
"description.keyword": {
"value": "[a-zA-Z0-9]+fox",
"flags": "ALL",
"case_insensitive": true
}
}
}
}

Related

How to boost documents matching one of the query_string

Elasticsearch newbie here. I'm trying to lookup documents that has foo in its name but want to prioritize that ones having bar as well i.e. those with bar will be at the top of the list. The result doesn't have the ones with bar at the top. boost here doesn't seem to have any effect, likely I'm not understanding how boost works here. Appreciate any help here.
query: {
bool: {
should: [
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
},
{
query_string: {
query: `name:*foo*`,
}
}
]
}
}
Sample document structure:
{
"name": "foos, one two three",
"type": "car",
"age": 10
}
{
"name": "foos, one two bar three",
"type": "train",
"age": 30
}
Index mapping
{
"detail": {
"mappings": {
"properties": {
"category": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"servings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Try switching the order for the query like so:
query: {
bool: {
should: [
{
query_string: {
query: `name:*foo*`,
}
},
{
query_string: {
query: `name:foo*bar*`,
boost: 5
}
}
]
}
}
it should work but if not you might need to do a nested search.
Search against keyword field.
If you will only run first part of the query ("query": "name:foo*bar*"), you will see that it is not returning anything. It is searching against tokens generated rather than whole string.
Text "foos, one two bar three" generates tokens like ["foos","one","two","bar","three"] and query is searching for "foo*bar*" in individual tokens hence no result. Keyword fields are stored as it is so search is happening against entire text.
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "name.keyword:foo*bar*",
"boost": 5
}
},
{
"query_string": {
"query": "name.keyword:*foo*"
}
}
]
}
}
Wildcards take huge memory and don't scale well. So it is better to avoid it. If foo and bar appear at start of words , you can use prefix query
{
"query": {
"bool": {
"should": [
{
"prefix": {
"name": "foo"
}
},
{
"prefix": {
"name": "bar"
}
}
]
}
}
}
You can also explore ngrams

How can I get auto-suggestions for synonyms match in elasticsearch

I'm using the code below and it does not give auto-suggestion as curd when i type "cu"
But it does match the document with yogurt which is correct.
How can I get both auto-complete for synonym words and document match for the same?
PUT products
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"synonym_graph"
]
}
},
"filter": {
"synonym_graph": {
"type": "synonym_graph",
"synonyms": [
"yogurt, curd, dahi"
]
}
}
}
}
}
}
PUT products/_mapping
{
"properties": {
"description": {
"type": "text",
"analyzer": "synonym_analyzer"
}
}
}
POST products/_doc
{
"description": "yogurt"
}
GET products/_search
{
"query": {
"match": {
"description": "cu"
}
}
}
When you provide a list of synonyms in a synonym_graph filter it simply means that ES will treat any of the synonyms interchangeably. But when they're analyzed via the standard analyzer, only full-word tokens will be produced:
POST products/_analyze?filter_path=tokens.token
{
"text": "yogurt",
"field": "description"
}
yielding:
{
"tokens" : [
{
"token" : "curd"
},
{
"token" : "dahi"
},
{
"token" : "yogurt"
}
]
}
As such, a regular match_query won't cut it here because the standard analyzer hasn't provided it with enough context in terms of matchable substrings (n-grams).
In the meantime you can replace match with match_phrase_prefix which does exactly what you're after -- match an ordered sequence of characters while taking into account the synonyms:
GET products/_search
{
"query": {
"match_phrase_prefix": {
"description": "cu"
}
}
}
But that, as the query name suggests, is only going to work for prefixes. If you fancy an autocomplete that suggests terms regardless of where the substring matches occur, have a look at my other answer where I talk about leveraging n-grams.

elasticsearch, find exact phrase in a field

I want to search exact phrase in a single field.. and this is my approach:
"query": {
"match_phrase": {
"word": "developer"
}
}
but the point is this query will find any document that have this keyword developer :
like "word": "developer" and "word": "php developer"
how can I create a query, that when I search for developer just return "word": "developer" doc,
and when I searched for php developer return "word": "php developer" doc
thanks
In a simple way, if your field word would be of type keyword, you can then make use of Term Query as shown below:
POST <your_index_name>/_search
{
"query": {
"term" : { "word" : "developer" }
}
}
If you have word as only of type text I'd suggest you to add its keyword field as multi-fields, in that way you can make use of word for text matches and word.keyword for exact matches.
PUT <your_index_name>
{
"mappings": {
"_doc": {
"properties": {
"word": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}

Elasticsearch 5.X Percolate: How to autogenerate copy_to fields?

In ES 2.3.3, many queries in the system I'm working on use the _all field. Sometimes these are registered to a percolate index, and when running percolator on the doc, _all is generated automatically.
In converting to ES 5.X _all is being deprecated and so _all has been replaced with a copy_to field that contains the components that we actually care about, and it works great for those searches.
Registering the same query to a percolate index with the same document mapping including copy_to fields works fine. Sending a percolate query with the document never results in a hit for a copy_to field however.
Manually building the copy_to field via simple string concatenation seems to work, it's just that I'd expect to be able to Query -> DocIndex and get the same result as Doc -> PercolateQuery... So I'm just looking for a way to have ES generate the copy_to fields automatically on a document being percolated.
Ended up there was nothing wrong with ES of course, posting here in case it helps someone else. Figured it out while attempting to generate a simpler example to post here with details... Basically the issue came down to the fact that attempting to percolate a document of a type that doesn't exist in the percolate index doesn't give any errors back, but seems to apply all percolate queries without applying any mappings which was just confusing as it worked for simple test cases, but not complex ones. Here's an example:
From the copy_to docs, generate an index with a copy_to mapping. See that a query to the copy_to field works.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
}
}
}
PUT my_index/my_type/1
{
"first_name": "John",
"last_name": "Smith"
}
GET my_index/_search
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
Create a percolate index with the same type
PUT /my_percolate_index
{
"mappings": {
"my_type": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"last_name": {
"type": "text",
"copy_to": "full_name"
},
"full_name": {
"type": "text"
}
}
},
"queries": {
"properties": {
"query": {
"type": "percolator"
}
}
}
}
}
Create a percolate query that matches our other percolate query on the copy_to field, and a second query that just queries on a basic unmodified field
PUT /my_percolate_index/queries/1?refresh
{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}
PUT /my_percolate_index/queries/2?refresh
{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}
Search, but with the wrong type... there will be a hit on the basic field (first_name: John) even though no document mappings match the request
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "non_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.2876821,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}
Send in the correct document_type and see both matches as expected
GET /my_percolate_index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document_type" : "my_type",
"document" : {
"first_name": "John",
"last_name": "Smith"
}
}
}
}
{"took":7,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":2,"max_score":0.51623213,"hits":[{"_index":"my_percolate_index","_type":"queries","_id":"1","_score":0.51623213,"_source":{
"query": {
"match": {
"full_name": {
"query": "John Smith",
"operator": "and"
}
}
}
}},{"_index":"my_percolate_index","_type":"queries","_id":"2","_score":0.2876821,"_source":{
"query": {
"match": {
"first_name": {
"query": "John"
}
}
}
}}]}}

Elastic Search - Multi Word Exact Match

I want to adjust the following query so it exactly matches multiple words. Whenever I try this it seems to tokenize the strings and then search. How can I specify for a particular substring that it must be an exact match?
{
"query": {
"query_string": {
"query": "string OR string2 OR this is my multi word string",
"fields": ["title","description"]
}
}
}
My mapping is as follows:
{
"indexname": {
"properties": {
"title": {
"type": "multi_field",
"fields": {
"title": {"type": "string"},
"original": {"type" : "string", "index" : "not_analyzed"}
}
},
"location": {
"type": "geo_point"
}
}
}
By default the QuerySring and match queries are analyzed.So use terms query.unfortunately we cannot use multiple fields in term query.So use bool query for that.Please try bellow query..
{
"query": {
"bool": {
"must": [
{
"term": {
"title": {
"value": "string OR string2 OR this is my multi word string"
}
}
},
{
"term": {
"description": {
"value": "string OR string2 OR this is my multi word string"
}
}
}
]
}
}
}
HOpe it helps..!

Resources