I'm trying to set up an index with the following characteristics:
The index houses data for many projects. Most work is project-specific, so I set up aliases for each project, using project_id as the routing field. (And as an associated term filter.)
The data in question have a parent/child structure. For simplicity, let's call the doc types "mama" and "baby."
So we create the index and aliases:
curl -XDELETE http://localhost:9200/famtest
curl -XPOST http://localhost:9200/famtest -d '
{ "mappings" :
{ "mama" :
{ "properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
},
"baby" :
{ "_parent" :
{ "type" : "mama" },
"properties" :
{ "project_id" : { "type" : "string", "index" : "not_analyzed" } }
}
}
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family1",
"index": "famtest",
"routing": "100",
"filter":
{ "term": { "project_id": "100" } }
}
} ]
}'
curl -XPOST "http://localhost:9200/_aliases" -d '
{ "actions":
[ { "add":
{ "alias": "family2",
"index": "famtest",
"routing": "200",
"filter":
{ "term": { "project_id": "200" } }
}
} ]
}'
Now let's make some mamas:
curl -XPOST localhost:9200/family1/mama/1 -d '{ "name" : "Family 1 Mom", "project_id" : "100" }'
curl -XPOST localhost:9200/family2/mama/2 -d '{ "name" : "Family 2 Mom", "project_id" : "200" }'
These documents are now available via /familyX/_search. So now we want to add a baby:
curl -XPOST localhost:9200/family1/baby/1?parent=1 -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
Unfortunately, ES doesn't like that:
{"error":"ElasticSearchIllegalArgumentException[Alias [family1] has index routing associated with it [100], and was provided with routing value [1], rejecting operation]","status":400}
So... any idea how to use alias routing and still set the parent id? If I understand this right, it shouldn't be a problem: all project operations ("family1", in this case) go through the alias, so parent and child docs will wind up on the same shard anyway. Is there some alternative way to set the parent id, without interfering with the routing?
Thanks. Please let me know if I can be more specific.
Interesting question! As you already know the parent id is used for routing too since children must be indexed in the same shard as the parent documents. What you're trying to do is fine, since parent and children would fall into the same family, thus in the same shard anyway since you configured the routing in the family alias.
But I'm afraid the parent id has higher priority than the routing defined in the alias, which gets overwritten, but that's not possible and that's why you get the error. In fact, if you try again providing the routing in your index request it works:
curl -XPOST 'localhost:9200/family1/baby/1?parent=1&routing=100' -d '{ "name": "Fam 1 Baby","project_id" : "100" }'
I would fill in a github issue with a curl recreation.
Related
Having this mapping:
curl -XPUT 'localhost:9200/testindex?pretty=true' -d '{
"mappings": {
"items": {
"dynamic": "strict",
"properties" : {
"title" : { "type": "string" },
"body" : { "type": "string" },
"tags" : { "type": "string" }
}}}}'
I add two simple items:
curl -XPUT 'localhost:9200/testindex/items/1' -d '{
"title": "This is a test title",
"body" : "This is the body of the java",
"tags" : "csharp"
}'
curl -XPUT 'localhost:9200/testindex/items/2' -d '{
"title": "Another text title",
"body": "My body is great and Im super handsome",
"tags" : ["cplusplus", "python", "java"]
}'
If I search the string java:
curl -XGET 'localhost:9200/testindex/items/_search?q=java&pretty=true'
... it will match both items. The first item will match on the body and the other one on the tags.
How can I avoid to search in some fields? In the example I dont know it to match with the field tags. But I want to maintain tags indexed as I use them for getting aggregations.
I know I can do it using this:
{
"query" : {
"query_string": {
"query": "java AND -tags:java"
}},
"_source" : {
"exclude" : ["*.tags"]
}
}'
But is there any other more elegant way, like putting something in the mapping?
PS: My searches are always query_strings and term / terms and I'm using ES 2.3.2
You can specify fields option if you only want to match against certain fields
{
"query_string" : {
"fields" : ["body"],
"query" : "java"
}
}
EDIT 1
You could use the "include_in_all": false param inside mapping. Check the documentation. Query string query defaults to _all so you can add "include_in_all": false to all the fields in which you don't want match and after that this query would only look in body field
{
"query_string" : {
"query" : "java"
}
}
Does this help?
I've loaded the 'accounts.json' data from the following link into an ES instance on my machine:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/_exploring_your_data.html
This adds 1000 docs to the index 'bank' with the type 'account'. Simple enough!
Every doc has this structure:
{
"account_number": 0,
"balance": 16623,
"firstname": "Bradshaw",
"lastname": "Mckenzie",
"age": 29,
"gender": "F",
"address": "244 Columbus Place",
"employer": "Euron",
"email": "bradshawmckenzie#euron.com",
"city": "Hobucken",
"state": "CO"
}
Now I'm trying to run a simple 'prefix' query on this index.
Here is one that works just fine (comes back with plenty of correct results):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "address" : "963" } }
}'
Here is another one (this one doesn't work):
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "Op" } }
}'
But there is definitely a record present which should be returned in the previous request. The following works:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "match" : { "firstname" : "Opal" } }
}'
I have also verified the mapping and there doesn't seem to be any difference in the 2 fields, 'firstname' and 'address':
curl -XGET 'localhost:9200/bank/_mapping/account?pretty'
Here is the relevant mapping portion for those 2:
"address": {
"type": "string"
}
"firstname": {
"type": "string"
}
Can't figure out why one prefix query works and the other one doesn't. Any pointers on what I'm missing?
I think you'll find that this will do what you expect:
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
"query": { "prefix" : { "firstname" : "op" } }
}'
The reason is that, since you have not specified an analyzer, the standard analyzer is used, which converts tokens to lower-case.
Here is some code that I used to test out my suspicion:
http://sense.qbox.io/gist/a4f087cee78fd694dd4223eb56e842e1cd1d5847
This issue seems to be related to using the XDCR in couchbase. If I had the following simple objects
1: { "name" : "Mark", "age" : 30}
2: { "name" : "Bill", "age" : "forty"}
and set up an elasticsearch index as such
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"dynamic_templates": [
{
"store_generic": {
"match": "*",
"mapping": {
"store": "yes"
}
}
}
]
}
}'
I can then add the two objects to this index using the REST API
curl -XPUT localhost:9200/test/couchbaseDocument/1 -d '{
"name" : "Mark",
"age" : 30
}'
curl -XPUT localhost:9200/test/couchbaseDocument/2 -d '{
"name" : "Bill",
"age" : "forty"
}'
They are now both searchable (despite the fact the "age" is long for one and string for the other.
If, however, I stored these two objects in a couchbase bucket (rather than straight to elasticsearch) and set up the XDCR the first object replicates fine but the second fails with the following error
failed to execute bulk item (index) index {[test][couchbaseDocument][2], source[{"doc":{"name":"Bill","age":"forty"},"meta":{"id":"2","rev":"8-00000b9360d0a0bf0000000000000000","expiration":0,"flags":0}}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [doc.age]
I can't figure out why it works via the REST API but not when couchbase replicates the same objects.
I followed the answer and used the following mapping to get things to work via XDCR
curl -XPUT 'http://localhost:9200/test/couchbaseDocument/_mapping' -d '
{
"couchbaseDocument" : {
"properties" : {
"doc": {
"properties" : {
"name" : {"type" : "string", "store" : "yes"},
"age" : {"type" : "string", "store" : "yes"}
}
}
}
}
}'
Now all the objects (despite having different types for the same fields) are replicated and searchable. I don't think there was any need to include the dynamic_templates approach I initially tried. The mapping works.
It's something you have to solve on elasticsearch side.
If the same field name can contain both numeric values and string values, you should create a mapping first which says that age is a String.
So elasticsearch won't try to auto guess type for this field.
Hope this helps
I am trying to determine the best way to index a document in elastic search. I have a document, Doc, which has some fields:
Doc
created_at
updated_at
field_a
field_b
But Doc will also have some fields specific to individual users. For example, field_x will have value 'A' for user 1, and field_x will have value 'B' for user 2. For each doc, there will be a very limited number of users (typically 2, up to ~10). When a user searches on field_x, they must search on the value that belongs to them. I have been exploring nested types in ES.
Doc
created_at
updated_at
field_x: [{
user: 1
field_x: A
},{
user: 2
field_x: B
}]
When user 1 searches on field_x for value 'A', this doc should result in a hit. However, it should not when user 1 searches by value 'B'.
However, according to the docs:
One of the problems when indexing inner objects that occur several
times in a doc is that “cross object” search match will occur
Is there a way to avoid this behavior with nested types or should I explore another type?
Additional information regarding performance of such queries would be very valuable. Just from reading the docs, its stated that nested queries are not too different in terms of performance as related to regular queries. If anyone has real experience this, I would love to hear it.
Nested type is what you are looking for, and don't worry too much about performance.
Before indexing your documents, you need to set the mapping for your documents:
curl -XDELETE localhost:9200/index
curl -XPUT localhost:9200/index
curl -XPUT localhost:9200/index/type/_mapping -d '{
"type": {
"properties": {
"field_x": {
"type": "nested",
"include_in_parent": false,
"include_in_root": false,
"properties": {
"user": {
"type": "string"
},
"field_x": {
"type": "string",
"index" : "not_analyzed" // NOTE*
}
}
}
}
}
}'
*note: If your field really contains only singular letters like "A" and "B", you don't want to analyze the field, otherwise elasticsearch will remove these singular letter "words".
If that was just your example, and in your real documents you are searching for proper words, remove this line and let elasticsearch analyze the field.
Then, index your documents:
curl -XPUT http://localhost:9200/index/type/1 -d '
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}'
And run your query:
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "A"
}
}
]
}
}
}
}
}';
This will result in
{"took":13,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.987628,"hits":[{"_index":"index","_type":"type","_id":"1","_score":1.987628, "_source" :
{
"field_a": "foo",
"field_b": "bar",
"field_x" : [{
"user" : "1",
"field_x" : "A"
},
{
"user" : "2",
"field_x" : "B"
}]
}}]}}
However, querying
curl -XGET localhost:9200/index/type/_search -d '{
"query": {
"nested" : {
"path" : "field_x",
"score_mode" : "avg",
"query" : {
"bool" : {
"must" : [
{
"term": {
"field_x.user": "1"
}
},
{
"term": {
"field_x.field_x": "B"
}
}
]
}
}
}
}
}';
won't return any results
{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
I'm working through some examples in the ElasticSearch Server book and trying to write a simple match query
{
"query" : {
"match" : {
"displayname" : "john smith"
}
}
}
This gives me the error:
{\"error\":\"SearchPhaseExecutionException[Failed to execute phase [query],
....
SearchParseException[[scripts][4]: from[-1],size[-1]: Parse Failure [Failed to parse source
....
QueryParsingException[[kb.cgi] No query registered for [match]]; }
I also tried
{
"match" : {
"displayname" : "john smith"
}
}
as per examples on http://www.elasticsearch.org/guide/reference/query-dsl/match-query/
EDIT: I think the remote server I'm using is not the latest 0.20.5 version because using "text" instead of "match" seems to allow the query to work
I've seen a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
It appears the remote server I'm using is not the latest 0.20.5 version of ElasticSearch, consequently the "match" query is not supported - instead it is "text", which works
I came to this conclusion after seeing a similar issue reported here: http://elasticsearch-users.115913.n3.nabble.com/Character-escaping-td4025802.html
Your first query looks fine, but perhaps the way you use in the request is not correct. Here is a complete example that works:
curl -XDELETE localhost:9200/test-idx
curl -XPUT localhost:9200/test-idx -d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"doc": {
"properties": {
"name": {
"type": "string", "index": "analyzed"
}
}
}
}
}
'
curl -XPUT localhost:9200/test-idx/doc/1 -d '{
"name": "John Smith"
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
curl "localhost:9200/test-idx/_search?pretty=true" -d '{
"query": {
"match" : {
"name" : "john smith"
}
}
}
'
echo