Elasticsearch completion suggester matching multiple inputs - elasticsearch

I have an issue with ES completion suggester. I have the following index mapping:
curl -XPUT localhost:9200/test_index/ -d '{
"mappings": {
"item": {
"properties": {
"test_suggest": {
"type": "completion",
"index_analyzer": "whitespace",
"search_analyzer": "whitespace",
"payloads": false
}
}
}
}
}'
I index some names like so:
curl -X PUT 'localhost:9200/test_index/item/1?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Smith" ],
"output": "John Smith",
"weight" : 34
}
}'
curl -X PUT 'localhost:9200/test_index/item/2?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Doe" ],
"output": "John Doe",
"weight" : 34
}
}'
Now if I call suggest and provide only the first name John it works fine:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john",
"completion": {
"field" : "test_suggest"
}
}
}'
Same works for last names:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"doe",
"completion": {
"field" : "test_suggest"
}
}
}'
Even searching for parts of last or first names work fine:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"sm",
"completion": {
"field" : "test_suggest"
}
}
}'
However, when I try and search for something that includes part or all of the second word (last name) I get no suggestions, none of the calls below work:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john d",
"completion": {
"field" : "test_suggest"
}
}
}'
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john doe",
"completion": {
"field" : "test_suggest"
}
}
}'
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john smith",
"completion": {
"field" : "test_suggest"
}
}
}'
I wonder how can I achieve such a thing without having to put the input a single text field, since I want both to match first and/or last names on completion.

You should do this:
curl -X PUT 'localhost:9200/test_index/item/1?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Smith", "John Smith" ],
"output": "John Smith",
"weight" : 34
}
}'
i.e. add all wanted terms combinations into the input.

I faced the same problem, then I used something like
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":["john", "smith"],
"completion": {
"field" : "test_suggest"
}
}
}'

Related

Making aggregations in two different types and return it grouped in Elasticsearch

Having this mapping with two types, items_one and items_two:
curl -XPUT 'localhost:9200/tester?pretty=true' -d '{
"mappings": {
"items_one": {
"properties" : {
"type" : {"type": "string",
"index": "not_analyzed"}
}},
"items_two": {
"properties" : {
"other_type" : { "type": "string",
"index": "not_analyzed"}
}}}}'
I put two items on items_one:
curl -XPUT 'localhost:9200/tester/items_one/1?pretty=true' -d '{
"type": "Bank transfer"
}'
curl -XPUT 'localhost:9200/tester/items_one/2?pretty=true' -d '{
"type": "PayPal"
}'
... and another two in items_two:
curl -XPUT 'localhost:9200/tester/items_two/1?pretty=true' -d '{
"other_type": "Cash"
}'
curl -XPUT 'localhost:9200/tester/items_two/2?pretty=true' -d '{
"other_type": "No pay"
}'
How can I make the aggregations in two different fields and return it grouped?
I know I can get it from one field doing:
curl -XGET 'localhost:9200/tester/_search?pretty=true' -d '{
"size": 0,
"aggs": {
"paying_types": {
"terms": {
"field": "type"
}
}
}
}'
But I cant make it "multi-field" making something like this (which is not working):
curl -XGET 'localhost:9200/tester/_search?pretty=true' -d '{
"size": 0,
"aggs": {
"paying_types": {
"terms": {
"field": ["type", "other_type"]
}
}
}
}'
My desired output should be:
"aggregations" : {
"paying_types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "Bank transfer",
"doc_count" : 1
}, {
"key" : "PayPal",
"doc_count" : 1
}, {
"key" : "Cash",
"doc_count" : 1
}, {
"key" : "No pay",
"doc_count" : 1
} ]
}
}
}
Thanks in advance
Finally solved it. A script will do the trick:
curl -XGET 'localhost:9200/tester/_search?pretty=true' -d '{
"size": 0,
"aggs": {
"paying_types": {
"terms": {
"script": "doc['type'].values + doc['other_type'].values"
}
}
}
}'

Exclude a field on a Elasticsearch query

Having the following mapping:
curl -XPUT 'localhost:9200/testidx?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" }
}}}}'
I put two items on it:
curl -XPUT 'localhost:9200/testidx/items/1' -d '{
"title": "Titulo anterior",
"body": "blablabla blablabla blablabla blablabla blablabla blablabla"
}'
curl -XPUT 'localhost:9200/testidx/items/2' -d '{
"title": "Joselr",
"body": "Titulo stuff more stuff"
}'
Now I want to search the word titulo on every field but body, so what I do is (following this post):
curl -XGET 'localhost:9200/testidx/items/_search?pretty=true' -d '{
"query" : {
"query_string": {
"query": "Titulo"
}},
"_source" : {
"exclude" : ["*.body"]
}
}'
It's supposed to show only the 1 item, as the second one has the word Titulo but it's on the body and that's what I want to ignore. How can archive this?
PS: This is just a simple example, I've a mapping with a lot of properties and I want to ignore some of them in some searches.
PS2: I'm using ES 2.3.2
The _source/exclude setting is only useful for not returning the body field in the response, but that doesn't exclude that field from being searched.
What you can do is to specify all the fields you want to search instead (whitelist approach)
curl -XGET 'localhost:9200/testidx/items/_search?pretty=true' -d '{
"query" : {
"query_string": {
"fields": ["title", "field2", "field3"], <-- add this
"query": "Titulo"
}},
"_source" : {
"exclude" : ["*.body"]
}
}'
Another thing you can do is to explicitly specify that body should not be matched with -body:Titulo
curl -XGET 'localhost:9200/testidx/items/_search?pretty=true' -d '{
"query" : {
"query_string": {
"query": "Titulo AND -body:Titulo" <-- modify this
}},
"_source" : {
"exclude" : ["*.body"]
}
}'
Up to elasticsearch 6.0.0 you can set "include_in_all": false to your index field properties, see e.g. https://www.elastic.co/guide/en/elasticsearch/reference/5.5/include-in-all.html.
(This of course needs a reindexing of the data.)

Elastic Search parent with same type

Sorry if this is a duplicate (I did try searching), or if this is a silly question. New to posting questions.
I am trying to do parent child relations and queries in ElasticSearch with the following:
#!/bin/bash
curl -XDELETE 'http://localhost:9200/test/'
echo
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 1
}
}
}'
echo
curl -XPUT localhost:9200/test/_mapping/nelement -d '{
"nelement" : {
"_id" : { "path" : "nid", "store" : true, "index" : "not_analyzed"},
"_parent" : { "type" : "nelement"},
"properties" : {
"name" : { "type" : "string", "index" : "not_analyzed" },
"nid": { "type" : "string", "copy_to" : "_id" }
}
}
}'
echo
#curl -s -XPOST localhost:9200/_bulk --data-binary #test_data.json
test_data.json is as follows:
{"index":{"_index":"test","_type":"element", "_parent":"abc"}
{"nid":"1a","name":"parent1"}
{"index":{"_index":"test","_type":"element", "_parent":"1a"}
{"nid":"2b","name":"child1"}
{"index":{"_index":"test","_type":"element", "_parent":"2b"}
{"nid":"2c","name":"child2"}
curl -XGET 'localhost:9200/test/nelement/_search?pretty=true' -d '{
"query": {
"has_child": {
"child_type": "nelement",
"query": {
"match": {
"nid": "2c"
}
}
}
}
}'
echo
echo
curl -XGET 'localhost:9200/test/nelement/_search?pretty=true' -d '{
"query": {
"has_parent": {
"type": "nelement",
"query": {
"term": {
"nid": "2b"
}
}
}
}
}'
For some reason, my search queries get no results. I have confirmed that the objects are indexed....
Because you are using self referential(set parent and query in the same index type) to parent/child query.
For now Elasticsearch is not supporting it.
Explore parent/child self referential support

Best way to search/index the data - with and without whitespace

I am having a problem indexing and searching for words that may or may not contain whitespace...Below is an example
Here is how the mappings are set up:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "index_ngram",
"search_analyzer": "search_ngram"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"desc_ngram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "desc_ngram", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}'
This is how I built the index:
curl -s -XPUT 'localhost:9200/test/name/1' -d '{ "street": "Lakeshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/2' -d '{ "street": "Sunnyshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/3' -d '{ "street": "Lake View Dr" }'
curl -s -XPUT 'localhost:9200/test/name/4' -d '{ "street": "Shore Dr" }'
Here is an example of the query that is not working correctly:
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"lake shore dr",
"type":"boolean"
}
}
}
]
}
}
}';
If a user attempts to search for "Lake Shore Dr", I want to only match to document 1/"Lakeshore Dr"
If a user attempts to search for "Lakeview Dr", I want to only match to document 3/"Lake View Dr"
So is the issue with how I am setting up the mappings (tokenizer?, edgegram vs ngrams?, size of ngrams?) or the query (I have tried things like setting the minimum_should_match, and the analyzer to use), but I have not been able to get the desired results.
Thanks all.

How to match on prefix in Elasticsearch

let's say that in my elasticsearch index I have a field called "dots" which will contain a string of punctuation separated words (e.g. "first.second.third").
I need to search for e.g. "first.second" and then get all entries whose "dots" field contains a string being exactly "first.second" or starting with "first.second.".
I have a problem understanding how the text querying works, at least I have not been able to create a query which does the job.
Elasticsearch has Path Hierarchy Tokenizer that was created exactly for such use case. Here is an example of how to set it for your index:
# Create a new index with custom path_hierarchy analyzer
# See http://www.elasticsearch.org/guide/reference/index-modules/analysis/pathhierarchy-tokenizer.html
curl -XPUT "localhost:9200/prefix-test" -d '{
"settings": {
"analysis": {
"analyzer": {
"prefix-test-analyzer": {
"type": "custom",
"tokenizer": "prefix-test-tokenizer"
}
},
"tokenizer": {
"prefix-test-tokenizer": {
"type": "path_hierarchy",
"delimiter": "."
}
}
}
},
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"analyzer": "prefix-test-analyzer",
//"index_analyzer": "prefix-test-analyzer", //deprecated
"search_analyzer": "keyword"
}
}
}
}
}'
echo
# Put some test data
curl -XPUT "localhost:9200/prefix-test/doc/1" -d '{"dots": "first.second.third"}'
curl -XPUT "localhost:9200/prefix-test/doc/2" -d '{"dots": "first.second.foo-bar"}'
curl -XPUT "localhost:9200/prefix-test/doc/3" -d '{"dots": "first.baz.something"}'
curl -XPOST "localhost:9200/prefix-test/_refresh"
echo
# Test searches.
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true" -d '{
"query": {
"term": {
"dots": "first.second.foo-bar"
}
}
}'
echo
curl -XPOST "localhost:9200/prefix-test/doc/_search?pretty=true&q=dots:first.second"
echo
There is also a much easier way, as pointed out in elasticsearch documentation:
just use:
{
"text_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
or since 0.19.9:
{
"match_phrase_prefix" : {
"fieldname" : "yourprefix"
}
}
instead of:
{
"prefix" : {
"fieldname" : "yourprefix"
}
Have a look at prefix queries.
$ curl -XGET 'http://localhost:9200/index/type/_search' -d '{
"query" : {
"prefix" : { "dots" : "first.second" }
}
}'
You should use a commodin chars to make your query, something like this:
$ curl -XGET http://localhost:9200/myapp/index -d '{
"dots": "first.second*"
}'
more examples about the syntax at: http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html
I was looking for a similar solution - but matching only a prefix. I found #imtov's answer to get me almost there, but for one change - switching the analyzers around:
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"analyzer": "keyword",
"search_analyzer": "prefix-test-analyzer"
}
}
}
}
instead of
"mappings": {
"doc": {
"properties": {
"dots": {
"type": "string",
"index_analyzer": "prefix-test-analyzer",
"search_analyzer": "keyword"
}
}
}
}
This way adding:
'{"dots": "first.second"}'
'{"dots": "first.third"}'
Will add only these full tokens, without storing first, second, third tokens.
Yet searching for either
first.second.anyotherstring
first.second
will correctly return only the first entry:
'{"dots": "first.second"}'
Not exactly what you asked for but somehow related, so I thought could help someone.

Resources