Unexpected result of Elastic term query - elasticsearch

I have Elastic 2.4 running on http://localhost:9200 only for test.
Setup
As fresh start, I created 1 and only 1 item in the index.
$ curl -s -XPUT "http://localhost:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": ["Crime", "Drama"]
}'
Returns
{"_index":"movies","_type":"movie","_id":"1","_version":3,"_shards":{"total":2,"successful":1,"failed":0},"created":false}
I then run this command to confirm the index works:
$ curl -s -XPOST "http://localhost:9200/movies/_search" -d'
{
"query": {
"query_string": {
"query": "Godfather"
}
}
}'
Returns
{"took":8,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.095891505,"hits":[{"_index":"movies","_type":"movie","_id":"1","_score":0.095891505,"_source":
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": ["Crime", "Drama"]
}}]}}
The Problem
I tried to run term query like this:
$ curl -s -XPOST "http://localhost:9200/movies/_search" -d'
{
"query": {
"term": {"title": "The Godfather"}
}
}'
I was expected to get 1 result, instead I got this:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
What did I got wrong?

Either match_phrase like jay suggested or you need to create a not_analyzed sub-field (e.g. title.raw), like this:
$ curl -s -XPUT "http://localhost:9200/movies/_mapping/movie" -d'
{
"properties": {
"title": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
Then you can reindex your document to populate the title.raw:
$ curl -s -XPUT "http://localhost:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": ["Crime", "Drama"]
}'
And finally, your term query will work on the title.raw sub-field:
$ curl -s -XPOST "http://localhost:9200/movies/_search" -d'
{
"query": {
"term": {"title.raw": "The Godfather"}
}
}'

Related

Elasticsearch does not filter as expected

I am using Elasticsearch 1.4
I have an Index:
curl -XPUT "http://localhost:49200/customer" -d '{"mappings": {"venues": {"properties": {"party_id": {"type": "string"},"sup_party_id": {"type": "string"},"location": {"type": "geo_point"} } } }}'
And put some data, for instances:
curl -XPOST "http://localhost:49200/customer/venues/RO2" -d '{ "party_id":"RO2", "sup_party_id": "SUP_GT_R1A_0001","location":{ "lat":"21.030347","lon":"105.842896" }}'
curl -XPOST "http://localhost:49200/customer/venues/RO3" -d '{ "party_id":"RO3", "sup_party_id": "SUP_GT_R1A_0004","location":{ "lat":"20.9602051","lon":"105.78709179999998" }}'
and my filter is:
{"constant_score":
{"filter":
{"and":
[{"terms":
{"sup_party_id":["SUP_GT_R1A_0004","SUP_GT_R1A_0001","RO2","RO3","RO4"]
}
},{"geo_bounding_box":
{"location":
{"top_left":{"lat":25.74546096707413,"lon":70.43503197075188},
"bottom_right":{"lat":6.342579199578783,"lon":168.96042259575188}
}
}
}]
}
}
}
the above query does not return data but It return data when I remove the following terms:
{"terms":
{"sup_party_id":["SUP_GT_R1A_0004","SUP_GT_R1A_0001","RO2","RO3","RO4"]
}
}
Please show me the problem, any suggestions is appreciated!
That's because the sup_party_id field is an analyzed string. Change your mapping like this instead and it will work:
curl -XPUT "http://localhost:49200/customer" -d '{
"mappings": {
"venues": {
"properties": {
"party_id": {
"type": "string"
},
"sup_party_id": {
"type": "string",
"index": "not_analyzed" <--- add this
},
"location": {
"type": "geo_point"
}
}
}
}
}'

sort result by term frequency count

If there are 2 documents which have word "world" in them 5 times & 2 times respectively.
So I want the document which has word "world" 5 times to be listed first followed by document which has word "world" 2 times.
How do i sort this?
Thanks.
I don't think there is any need to sort it. If you have documents as you mentioned, and you are searching a particular word which is appearing more then one, two or three in your case, elastic search will calculate its score automatically and would return the document by score sorting.
To try this ingest some documents:
curl -XPUT "http://localhost:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}'
curl -XPUT "http://localhost:9200/movies/movie/2" -d'
{
"title": "The Godfather Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}'
curl -XPUT "http://localhost:9200/movies/movie/3" -d'
{
"title": "The Godfather Godfather Godfather",
"director": "Francis Ford Coppola",
"year": 1972,
"genres": [
"Crime",
"Drama"
]
}'
After ingestion run this query and see the result:
curl -XPOST "http://localhost:9200/movies/_search" -d'
{
"explain": true,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "godfather"
}
}
}
}
}'
This will return the document three on top because it has "godfather" multiple time

Elasticsearch completion suggester matching multiple inputs

I have an issue with ES completion suggester. I have the following index mapping:
curl -XPUT localhost:9200/test_index/ -d '{
"mappings": {
"item": {
"properties": {
"test_suggest": {
"type": "completion",
"index_analyzer": "whitespace",
"search_analyzer": "whitespace",
"payloads": false
}
}
}
}
}'
I index some names like so:
curl -X PUT 'localhost:9200/test_index/item/1?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Smith" ],
"output": "John Smith",
"weight" : 34
}
}'
curl -X PUT 'localhost:9200/test_index/item/2?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Doe" ],
"output": "John Doe",
"weight" : 34
}
}'
Now if I call suggest and provide only the first name John it works fine:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john",
"completion": {
"field" : "test_suggest"
}
}
}'
Same works for last names:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"doe",
"completion": {
"field" : "test_suggest"
}
}
}'
Even searching for parts of last or first names work fine:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"sm",
"completion": {
"field" : "test_suggest"
}
}
}'
However, when I try and search for something that includes part or all of the second word (last name) I get no suggestions, none of the calls below work:
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john d",
"completion": {
"field" : "test_suggest"
}
}
}'
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john doe",
"completion": {
"field" : "test_suggest"
}
}
}'
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":"john smith",
"completion": {
"field" : "test_suggest"
}
}
}'
I wonder how can I achieve such a thing without having to put the input a single text field, since I want both to match first and/or last names on completion.
You should do this:
curl -X PUT 'localhost:9200/test_index/item/1?refresh=true' -d '{
"suggest" : {
"input": [ "John", "Smith", "John Smith" ],
"output": "John Smith",
"weight" : 34
}
}'
i.e. add all wanted terms combinations into the input.
I faced the same problem, then I used something like
curl -XPOST localhost:9200/test_index/_suggest -d '{
"test_suggest":{
"text":["john", "smith"],
"completion": {
"field" : "test_suggest"
}
}
}'

Best way to search/index the data - with and without whitespace

I am having a problem indexing and searching for words that may or may not contain whitespace...Below is an example
Here is how the mappings are set up:
curl -s -XPUT 'localhost:9200/test' -d '{
"mappings": {
"properties": {
"name": {
"street": {
"type": "string",
"index_analyzer": "index_ngram",
"search_analyzer": "search_ngram"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"desc_ngram": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 20
}
},
"analyzer": {
"index_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "desc_ngram", "lowercase" ]
},
"search_ngram": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
}'
This is how I built the index:
curl -s -XPUT 'localhost:9200/test/name/1' -d '{ "street": "Lakeshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/2' -d '{ "street": "Sunnyshore Dr" }'
curl -s -XPUT 'localhost:9200/test/name/3' -d '{ "street": "Lake View Dr" }'
curl -s -XPUT 'localhost:9200/test/name/4' -d '{ "street": "Shore Dr" }'
Here is an example of the query that is not working correctly:
curl -s -XGET 'localhost:9200/test/_search?pretty=true' -d '{
"query":{
"bool":{
"must":[
{
"match":{
"street":{
"query":"lake shore dr",
"type":"boolean"
}
}
}
]
}
}
}';
If a user attempts to search for "Lake Shore Dr", I want to only match to document 1/"Lakeshore Dr"
If a user attempts to search for "Lakeview Dr", I want to only match to document 3/"Lake View Dr"
So is the issue with how I am setting up the mappings (tokenizer?, edgegram vs ngrams?, size of ngrams?) or the query (I have tried things like setting the minimum_should_match, and the analyzer to use), but I have not been able to get the desired results.
Thanks all.

accessing _id or _parent fields in script query in elasticsearch

when writing a search query with a script, I can access fields using "doc['myfield']"
curl -XPOST 'http://localhost:9200/index1/type1/_search' -d '
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "doc[\"myfield\"].value>0",
"params": {},
"lang":"python"
}
}
}
}
}'
how do I go about accessing the _id or _parent fields?
The "ctx" object does not seem to be available in a search query (while it is accessible in an update API request, why?).
Mind you, I am using the python language instead of mvel, but both of them pose the same question.
By default, both document id and parent id are indexed in uid format: type#id. Elasticsearch provides a few methods that can be used to extract type and id from uid string. Here is an example of using these methods in MVEL:
curl -XDELETE localhost:9200/test
curl -XPUT localhost:9200/test -d '{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string"
}
}
},
"child_doc": {
"_parent": {
"type": "doc"
},
"properties": {
"name": {
"type": "string"
}
}
}
}
}'
curl -XPUT "localhost:9200/test/doc/1" -d '{"name": "doc 1"}'
curl -XPUT "localhost:9200/test/child_doc/1-1?parent=1" -d '{"name": "child 1-1 of doc 1"}'
curl -XPOST "localhost:9200/test/_refresh"
echo
curl "localhost:9200/test/child_doc/_search?pretty=true" -d '{
"script_fields": {
"uid_in_script": {
"script": "doc[\"_uid\"].value"
},
"id_in_script": {
"script": "org.elasticsearch.index.mapper.Uid.idFromUid(doc[\"_uid\"].value)"
},
"parent_uid_in_script": {
"script": "doc[\"_parent\"].value"
},
"parent_id_in_script": {
"script": "org.elasticsearch.index.mapper.Uid.idFromUid(doc[\"_parent\"].value)"
},
"parent_type_in_script": {
"script": "org.elasticsearch.index.mapper.Uid.typeFromUid(doc[\"_parent\"].value)"
}
}
}'
echo

Resources