How to add subdocument to an ElasticSearch index - elasticsearch

In ElasticSearch, given the following document, Is it possible to add items to the "Lists" sub-document without passing the parent attributes (i.e. Message and tags)?
I have several attributes in the parent document which I dont want to pass every time I want to add one item to the sub-document.
{
"tweet" : {
"message" : "some arrays in this tweet...",
"tags" : ["elasticsearch", "wow"],
"lists" : [
{
"name" : "prog_list",
"description" : "programming list"
},
{
"name" : "cool_list",
"description" : "cool stuff list"
}
]
}
}

What you are looking for is, how to insert a nested documents.
In your case, you can use the Update API to append a nested document to your list.
curl -XPOST localhost:9200/index/tweets/1/_update -d '{
"script" : "ctx._source.tweet.lists += new_list",
"params" : {
"new_list" : {"name": "fun_list", "description": "funny list" }
}
}'
To support nested documents, you have to define your mapping, which is described here.
Assuming your type is tweets, the follwoing mapping should work:
curl -XDELETE http://localhost:9200/index
curl -XPUT http://localhost:9200/index -d'
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0
},
"mappings": {
"tweets": {
"properties": {
"tweet": {
"properties": {
"lists": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
}
}
}
}
}
}'
Then add a first entry:
curl -XPOST http://localhost:9200/index/tweets/1 -d '
{
"tweet": {
"message": "some arrays in this tweet...",
"tags": [
"elasticsearch",
"wow"
],
"lists": [
{
"name": "prog_list",
"description": "programming list"
},
{
"name": "cool_list",
"description": "cool stuff list"
}
]
}
}'
And then add your element with:
curl -XPOST http://localhost:9200/index/tweets/1/_update -d '
{
"script": "ctx._source.tweet.lists += new_list",
"params": {
"new_list": {
"name": "fun_list",
"description": "funny list"
}
}
}'

Related

how do you set a field to be not_analyized on a field that contains spaces?

I have a 'grade' field in an Elasticsearch index that contains text and numbers. I have set the field mapping to be 'not_analyized' but I can't search for grade ==== 'Year 1'.
I have read the finding exact values section of the docs but it doesn't seem to work for me.
Create the index.
curl -XPUT http://localhost:9200/my_test_index
Create the mapping template.
curl -XPUT http://localhost:9200/_template/my_test_index_mapping -d '
{
"template" : "my_test_index",
"mappings" : {
"my_type": {
"properties": {
"grade": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
'
Create some documents.
curl -XPUT 'http://localhost:9200/my_test_index/my_type/1' -d '{
"title" : "some title",
"grade" : "Year 1"
}'
curl -XPUT 'http://localhost:9200/my_test_index/my_type/3' -d '{
"title" : "some title",
"grade" : "preschool"
}'
Query for "Year 1" returns 0 results.
curl -XPOST http://localhost:9200/my_test_index/_search -d '{
"query": {
"filtered" : {
"filter" : {
"term": {
"grade": "Year 1"
}
}
}
}
}'
Query for 'preschool' returns 1 result.
curl -XPOST http://localhost:9200/my_test_index/_search -d '{
"query": {
"filtered" : {
"filter" : {
"term": {
"grade": "preschool"
}
}
}
}
}'
Checking the mapping and the 'grade' field does not show 'not_analyzed'.
curl -XGET http://localhost:9200/my_test_index/_mapping
{
"my_test_index" : {
"mappings" : {
"my_type" : {
"properties" : {
"grade" : {
"type" : "string"
},
"title" : {
"type" : "string"
}
}
}
}
}
}
The template will only impact newly created indices.
Re-Created the index after the template has been created.
Alternatively, specify the mappings while creating the index, instead of relying on templates to a single index.
If you don't want the field to be analysed you can specify "index" : "not_analyzed" in the mapping. You'll then be able to search for exact matches as desired.
See: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#string
In your case,Please try to Re-create your mapping.

How to specify ElasticSearch copy_to order?

ElasticSearch has the ability to copy values to other fields (at index time), enabling you to search on multiple fields as if it were one field (Core Types: copy_to).
However, there doesn't seem to be any way to specify the order in which these values should be copied. This could be important when phrase matching:
curl -XDELETE 'http://10.11.12.13:9200/helloworld'
curl -XPUT 'http://10.11.12.13:9200/helloworld'
# copy_to is ordered alphabetically!
curl -XPUT 'http://10.11.12.13:9200/helloworld/_mapping/people' -d '
{
"people": {
"properties": {
"last_name": {
"type": "string",
"copy_to": "full_name"
},
"first_name": {
"type": "string",
"copy_to": "full_name"
},
"state": {
"type": "string"
},
"city": {
"type": "string"
},
"full_name": {
"type": "string"
}
}
}
}
'
curl -X POST "10.11.12.13:9200/helloworld/people/dork" -d '{"first_name": "Jim", "last_name": "Bob", "state": "California", "city": "San Jose"}'
curl -X POST "10.11.12.13:9200/helloworld/people/face" -d '{"first_name": "Bob", "last_name": "Jim", "state": "California", "city": "San Jose"}'
curl "http://10.11.12.13:9200/helloworld/people/_search" -d '
{
"query": {
"match_phrase": {
"full_name": {
"query": "Jim Bob"
}
}
}
}
'
Only "Jim Bob" is returned; it seems that the fields are copied in field-name alphabetical order.
How would I switch the copy_to order such that the "Bob Jim" person would be returned?
This is more deterministically controlled by registering a transform script in your mapping.
something like this:
"transform" : [
{"script": "ctx._source['full_name'] = [ctx._source['first_name'] + " " + ctx._source['last_name'], ctx._source['last_name'] + " " + ctx._source['first_name']]"}
]
Also, transform scripts can be "native", i.e. java code, made available to all nodes in the cluster by making your custom classes available in the elasticsearch classpath and registered as native scripts by the settings:
script.native.<name>.type=<fully.qualified.class.name>
in which case in your mapping you'd register the native script as a transform like so:
"transform" : [
{
"script" : "<name>",
"params" : {
"param1": "val1",
"param2": "val2"
},
"lang": "native"
}
],

Sort an elasicsearch resultset based on a filter term

For an ecommerce I am implementing elasticsearch in order to get a sorted and paginated resultset of product ids for a category.
I have a product document which looks like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"sort": 102,
"categories": [
"28554568",
"28554577",
"28554578"
],
}
To get the resultset I filter and sort like this:
POST /products/_search
{
"filter": {
"term": {
"categories": "28554666"
}
},
"sort" : [
{ "sort" : {"order" : "asc"}}
]
}
However, how I now learned the requirement is, that the product sorting depends on the category. Looking at the example above this means that I need to add a different sort value for each value in the categories array and depending on the category that I filter by I want to sort by the corresponding sort value.
The document should look something like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"categories": [
{ "id": "28554568", "sort": "102" },
{ "id": "28554577", "sort": "482" },
{ "id": "28554578", "sort": "2" }
]
}
My query now should be able to sort something like this:
POST /products/_search
{
"filter": {
"term": {
"categories.id": "28554666"
}
},
"sort" : [
{ "categories.{filtered_category_id}.sort" : {"order" : "asc"}}
]
}
Is it somehow possible to accomplish this?
To achieve this, you will have to store your categories as nested documents. If not, Elasticsearch will not know what sort is associated with what category ID.
Then, you will have to sort on the nested documents, by also filtering to choose the right one.
Here's a runnable example you can play with: https://www.found.no/play/gist/47282a07414e1432de6d
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"categories": {
"type": "nested"
}
}
}
}
}'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"id":1,"title":"foobar","categories":[{"id":"28554568","sort":102},{"id":"28554577","sort":482},{"id":"28554578","sort":2}]}
{"index":{"_index":"play","_type":"type"}}
{"id":2,"title":"barbaz","categories":[{"id":"28554577","sort":0}]}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"nested": {
"path": "categories",
"query": {
"term": {
"categories.id": {
"value": 28554577
}
}
}
}
},
"sort": {
"categories.sort": {
"order": "asc",
"nested_filter": {
"term": {
"categories.id": 28554577
}
}
}
}
}
'

Indexing website/url in Elastic Search

I have a website field of a document indexed in elastic search. Example value: http://example.com . The problem is that when I search for example, the document is not included. How to map correctly the website/url field?
I created the index below:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_html":{
"type":"custom",
"tokenizer": "standard",
"filter":"standard",
"char_filter": "html_strip"
}
}
}
}
},
"mapping":{
"blogshops": {
"properties": {
"category": {
"properties": {
"name": {
"type": "string"
}
}
},
"reviews": {
"properties": {
"user": {
"properties": {
"_id": {
"type": "string"
}
}
}
}
}
}
}
}
}
I guess you are using standard analyzer, which splits http://example.dom into two tokens - http and example.com. You can take a look http://localhost:9200/_analyze?text=http://example.com&analyzer=standard.
If you want to split url, you need to use different analyzer or specify our own custom analyzer.
You can take a look how would be url indexed with simple analyzer - http://localhost:9200/_analyze?text=http://example.com&analyzer=simple. As you can see, now is url indexed as three tokens ['http', 'example', 'com']. If you don't want to index tokens like ['http', 'www'] etc, you can specify your analyzer with lowercase tokenizer (this is the one used in simple analyzer) and stop filter. For example something like this:
# Delete index
#
curl -s -XDELETE 'http://localhost:9200/url-test/' ; echo
# Create index with mapping and custom index
#
curl -s -XPUT 'http://localhost:9200/url-test/' -d '{
"mappings": {
"document": {
"properties": {
"content": {
"type": "string",
"analyzer" : "lowercase_with_stopwords"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
},
"analyzer": {
"lowercase_with_stopwords": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
}
}
}
}
}' ; echo
curl -s -XGET 'http://localhost:9200/url-test/_analyze?text=http://example.com&analyzer=lowercase_with_stopwords&pretty'
# Index document
#
curl -s -XPUT 'http://localhost:9200/url-test/document/1?pretty=true' -d '{
"content" : "Small content with URL http://example.com."
}'
# Refresh index
#
curl -s -XPOST 'http://localhost:9200/url-test/_refresh'
# Try to search document
#
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
"query" : {
"query_string" : {
"query" : "content:example"
}
}
}'
NOTE: If you don't like to use stopwords here is interesting article stop stopping stop words: a look at common terms query

Multiple properties in facet (elasticsearch)

I have following index:
curl -XPUT "http://localhost:9200/test/" -d '
{
"mappings": {
"files": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"owners": {
"type": "nested",
"properties": {
"name": {
"type":"string",
"index":"not_analyzed"
},
"mail": {
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
}
}
'
With sample documents:
curl -XPUT "http://localhost:9200/test/files/1" -d '
{
"name": "first.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Joe Smith",
"mail": "joes#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/2" -d '
{
"name": "second.jpg",
"owners": [
{
"name": "John Smith",
"mail": "js#example.com"
},
{
"name": "Ann Smith",
"mail": "as#example.com"
}
]
}
'
curl -XPUT "http://localhost:9200/test/files/3" -d '
{
"name": "third.jpg",
"owners": [
{
"name": "Kate Foo",
"mail": "kf#example.com"
}
]
}
'
And I need to find all owners that match some query, let's say "mit":
curl -XGET "http://localhost:9200/test/files/_search" -d '
{
"facets": {
"owners": {
"terms": {
"field": "owners.name"
},
"facet_filter": {
"query": {
"query_string": {
"query": "*mit*",
"default_field": "owners.name"
}
}
},
"nested": "owners"
}
}
}
'
This gives me following result:
{
"facets" : {
"owners" : {
"missing" : 0,
"_type" : "terms",
"other" : 0,
"total" : 4,
"terms" : [
{
"count" : 2,
"term" : "John Smith"
},
{
"count" : 1,
"term" : "Joe Smith"
},
{
"count" : 1,
"term" : "Ann Smith"
}
]
}
},
"timed_out" : false,
"hits" : {...}
}
And it's ok.
But what I exaclty need is to get owners with their email addresses (for each entry in facet I need additional field in results).
Is it achievable?
Not possible i think? Depending on your needs I would have
Create a composite field with both name & email and do the facet on that field, or
Run the query in addition to the facet and extract it from the query-result, but this is obviously not scalable
Two step-operation, get the facet, build the needed queries and merge results.

Resources