Prefix Query not working in ElasticSearch - elasticsearch

I've loaded the 'accounts.json' data from the following link into an ES instance on my machine:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/_exploring_your_data.html
This adds 1000 docs to the index 'bank' with the type 'account'. Simple enough!
Every doc has this structure:
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie#euron.com",
"city": "Hobucken",
"state": "CO"
}
Now I'm trying to run a simple 'prefix' query on this index.
Here is one that works just fine (comes back with plenty of correct results):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "address" : "963" } }
}'
Here is another one (this one doesn't work):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "Op" } }
}'
But there is definitely a record present which should be returned in the previous request. The following works:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match" : { "firstname" : "Opal" } }
}'
I have also verified the mapping and there doesn't seem to be any difference in the 2 fields, 'firstname' and 'address':
curl -XGET 'localhost:9200/bank/_mapping/account?pretty'
Here is the relevant mapping portion for those 2:
"address": {
"type": "string"
}
"firstname": {
"type": "string"
}
Can't figure out why one prefix query works and the other one doesn't. Any pointers on what I'm missing?

I think you'll find that this will do what you expect:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "op" } }
}'
The reason is that, since you have not specified an analyzer, the standard analyzer is used, which converts tokens to lower-case.
Here is some code that I used to test out my suspicion:
http://sense.qbox.io/gist/a4f087cee78fd694dd4223eb56e842e1cd1d5847

Related

Don't make some fields searchable when using query_string or term/terms in Elasticsearch

Having this mapping:
curl -XPUT 'localhost:9200/testindex?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"tags" : { "type": "string" }
}}}}'
I add two simple items:
curl -XPUT 'localhost:9200/testindex/items/1' -d '{
"title": "This is a test title",
"body" : "This is the body of the java",
"tags" : "csharp"
}'
curl -XPUT 'localhost:9200/testindex/items/2' -d '{
"title": "Another text title",
"body": "My body is great and Im super handsome",
"tags" : ["cplusplus", "python", "java"]
}'
If I search the string java:
curl -XGET 'localhost:9200/testindex/items/_search?q=java&pretty=true'
... it will match both items. The first item will match on the body and the other one on the tags.
How can I avoid to search in some fields? In the example I dont know it to match with the field tags. But I want to maintain tags indexed as I use them for getting aggregations.
I know I can do it using this:
{
"query" : {
"query_string": {
"query": "java AND -tags:java"
}},
"_source" : {
"exclude" : ["*.tags"]
}
}'
But is there any other more elegant way, like putting something in the mapping?
PS: My searches are always query_strings and term / terms and I'm using ES 2.3.2
You can specify fields option if you only want to match against certain fields
{
"query_string" : {
"fields" : ["body"],
"query" : "java"
}
}
EDIT 1
You could use the "include_in_all": false param inside mapping. Check the documentation. Query string query defaults to _all so you can add "include_in_all": false to all the fields in which you don't want match and after that this query would only look in body field
{
"query_string" : {
"query" : "java"
}
}
Does this help?

null_value mapping in Elasticsearch

I have created a mapping for a tweetb type in a twitter index:
curl -XPUT http://www.mydomain:9200/twitter/tweetb/_mapping -d '{
"twitter": {
"mappings": {
"tweetb": {
"properties": {
"message": {
"type": "string",
"null_value": "NA"
}
}
}
}
}
}'
Then, I put one document:
curl -XPUT http://www.mydomain.com:9200/twitter/tweetb/1 -d '{"message": null}'
Then, I tried to get the inserted doc back:
curl -XGET http://www.mydomain:9200/twitter/tweetb/1
And that returned:
{
"_index": "twitter",
"_type": "tweetb",
"_id": "1",
"_version": 2,
"found" : true,
"_source" : { "message": null }
}
I was expecting "message" : "NA" in the _source field. However, it looks like "null_value" isn't working. Am I missing something?
The "null_value" field mapping does not change the value stored, rather it changes the value that is used in searches.
If you try searching for your "message" using "NA", then it should appear in the results:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"match" : { "message" : "NA" }
}
}'
Of interest, it should respond with the actual value being null. Now, if you add a new document whose raw value is literally "NA" and perform the search, then you should see both results returned for the above query--one with a value and the other with null defined.
Perhaps of similar interest, this works for other queries as well based on how it is indexed, which is why a lowercase n.* matches, but N.* semi-surprisingly will not match:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"regexp" : { "message" : "n.*" }
}
}'

How to avoid cross object search behavior with nested types in elastic search

I am trying to determine the best way to index a document in elastic search. I have a document, Doc, which has some fields:
Doc
created_at
updated_at
field_a
field_b
But Doc will also have some fields specific to individual users. For example, field_x will have value 'A' for user 1, and field_x will have value 'B' for user 2. For each doc, there will be a very limited number of users (typically 2, up to ~10). When a user searches on field_x, they must search on the value that belongs to them. I have been exploring nested types in ES.
Doc
created_at
updated_at
field_x: [{
user: 1
field_x: A
},{
user: 2
field_x: B
}]
When user 1 searches on field_x for value 'A', this doc should result in a hit. However, it should not when user 1 searches by value 'B'.
However, according to the docs:
One of the problems when indexing inner objects that occur several
times in a doc is that “cross object” search match will occur
Is there a way to avoid this behavior with nested types or should I explore another type?
Additional information regarding performance of such queries would be very valuable. Just from reading the docs, its stated that nested queries are not too different in terms of performance as related to regular queries. If anyone has real experience this, I would love to hear it.
Nested type is what you are looking for, and don't worry too much about performance.
Before indexing your documents, you need to set the mapping for your documents:
curl -XDELETE localhost:9200/index
curl -XPUT localhost:9200/index
curl -XPUT localhost:9200/index/type/_mapping -d '{
"type": {
"properties": {
"field_x": {
"type": "nested",
"include_in_parent": false,
"include_in_root": false,
"properties": {
"user": {
"type": "string"
},
"field_x": {
"type": "string",
"index" : "not_analyzed" // NOTE*
}
}
}
}
}
}'
*note: If your field really contains only singular letters like "A" and "B", you don't want to analyze the field, otherwise elasticsearch will remove these singular letter "words".
If that was just your example, and in your real documents you are searching for proper words, remove this line and let elasticsearch analyze the field.
Then, index your documents:
curl -XPUT http://localhost:9200/index/type/1 -d '
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}'
And run your query:
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "A"
}
}
]
}
}
}
}
}';
This will result in
{"took":13,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.987628,"hits":[{"_index":"index","_type":"type","_id":"1","_score":1.987628, "_source" :
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}}]}}
However, querying
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "B"
}
}
]
}
}
}
}
}';
won't return any results
{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

Elasticsearch index aliases (w/ routing) and parent/child docs

I'm trying to set up an index with the following characteristics:
The index houses data for many projects. Most work is project-specific, so I set up aliases for each project, using project_id as the routing field. (And as an associated term filter.)
The data in question have a parent/child structure. For simplicity, let's call the doc types "mama" and "baby."
So we create the index and aliases:
curl -XDELETE http://localhost:9200/famtest
curl -XPOST http://localhost:9200/famtest -d '
{ "mappings" :
{ "mama" :
{ "properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
},
"baby" :
{ "_parent" :
{ "type" : "mama" },
"properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
}
}
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family1",
"index": "famtest",
"routing": "100",
"filter":
{ "term": { "project_id": "100" } }
}
} ]
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family2",
"index": "famtest",
"routing": "200",
"filter":
{ "term": { "project_id": "200" } }
}
} ]
}'
Now let's make some mamas:
curl -XPOST localhost:9200/family1/mama/1 -d '{ "name" : "Family 1 Mom", "project_id" : "100" }'
curl -XPOST localhost:9200/family2/mama/2 -d '{ "name" : "Family 2 Mom", "project_id" : "200" }'
These documents are now available via /familyX/_search. So now we want to add a baby:
curl -XPOST localhost:9200/family1/baby/1?parent=1 -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
Unfortunately, ES doesn't like that:
{"error":"ElasticSearchIllegalArgumentException[Alias [family1] has index routing associated with it [100], and was provided with routing value [1], rejecting operation]","status":400}
So... any idea how to use alias routing and still set the parent id? If I understand this right, it shouldn't be a problem: all project operations ("family1", in this case) go through the alias, so parent and child docs will wind up on the same shard anyway. Is there some alternative way to set the parent id, without interfering with the routing?
Thanks. Please let me know if I can be more specific.
Interesting question! As you already know the parent id is used for routing too since children must be indexed in the same shard as the parent documents. What you're trying to do is fine, since parent and children would fall into the same family, thus in the same shard anyway since you configured the routing in the family alias.
But I'm afraid the parent id has higher priority than the routing defined in the alias, which gets overwritten, but that's not possible and that's why you get the error. In fact, if you try again providing the routing in your index request it works:
curl -XPOST 'localhost:9200/family1/baby/1?parent=1&routing=100' -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
I would fill in a github issue with a curl recreation.

No query registered for [match]

I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo

Resources