Elasticsearch join-like query within same index - elasticsearch

I have an index with a following structure (mapping)
{
"properties": {
"content": {
"type": "text",
},
"prev_id": {
"type": "text",
},
"next_id": {
"type": "text",
}
}
}
where prev_id and next_id are IDs of documents in this index (may be null values).
I want to perform _search query and get prev.content and next.content fields.
Now I use two queries: the first for searching by content field
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"match": {
"content": "yellow fox"
}
}
}'
and the second to get next and prev records.
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"ids": {
"values" : ["5bb93552e42140f955501d7b77dc8a0a", "cd027a48445a0a193bc80982748bc846", "9a5b7359d3081f10d099db87c3226d82"]
}
}
}'
Then I join results on application-side.
Can I achieve my goal with one query only?
PS: the purpose to store next-prev as IDs is to safe disk space. I have a lot of records and content field is quite large.

What you are doing is the way to go. But how large is the content? - Maybe you can consider not storing content ( source = false)?

Related

elasticsearch mapping is empty after creating

I'm trying to create an autocomplete index for my elasticsearch using the search_as_you_type datatype.
My first command I run is
curl --request PUT 'https://elasticsearch.company.me/autocomplete' \
'{
"mappings": {
"properties": {
"company_name": {
"type": "search_as_you_type"
},
"serviceTitle": {
"type": "search_as_you_type"
}
}
}
}'
which returns
{"acknowledged":true,"shards_acknowledged":true,"index":"autocomplete"}curl: (3) nested brace in URL position 18:
{
"mappings": {
"properties": etc.the rest of the json object I created}}
Then I reindex using
curl --silent --request POST 'http://elasticsearch.company.me/_reindex?pretty' --data-raw '{
"source": {
"index": "existing_index"
},
"dest": {
"index": "autocomplete"
}
}' | grep "total\|created\|failures"
I expect to see some "total":1000,"created":5etc but some kind of response from the terminal, but I get nothing. Also, when I check the mapping of my autocomplete index, by running curl -u thething 'https://elasticsearch.company.me/autocomplete/_mappings?pretty',
I get an empty mapping result:
{
"autocomplete" : {
"mappings" : { }
}
}
Is my error in the creation of my index or the reindexing? I'm expecting the autocomplete mappings to show the two fields I'm searching for, ie: "company_name" and "serviceTitle". Any ideas how to fix?

elastic-search query all fields including nested fields

I am using ES version 5.6.
I have a document like below stored in ES.
{
"swType": "abc",
"swVersion": "xyz",
"interfaces": [
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
},
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
}
]
}
I want to search on all fields that has "enabled".
I tried the below queries, but they did not work.
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "enabled"
}
}
}'
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string": {
"query": "enabled",
"fields": ["*"]
}
}
}'
But the below query worked
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "abc"
}
}
}'
So, looks _all is matching only top level fields and not nested fields.
Is there any way to query for a text contained in all fields including nested ones. I don't want to specify the nested field names explicitly.
I am looking for kind of global search where I want to search for "text"
anywhere in the document.
Thanks.
OK. Got it working.
I had set mapping dynamic:false. Looks like ES will search only in the fields
specified in mappings and I was having my search words in dynamically added fields.
Making dynamic:'strict' helped me in narrowing the issue.

Document Count in keyword buckets from list in document as aggregation in Elasticsearch

The situation:
I am a starter in Elasticsearch and cannot wrap my head around how to use the aggregations go get what I need.
I have documents with the following structure:
{
...
"authors" : [
{
"name" : "Bob",
"#type" : "Person"
}
],
"resort": "Politics",
...
}
I want to use an aggregation to get the documents count for every author. Since there may be more than one author for some documents, these documents should be counted for every author individually.
What I've tried:
Since the terms aggregation worked with the resort field I tried using it with authors or the name field inside, but always getting no buckets at all. For this I used the following curl request:
curl -X POST 'localhost:9200/news/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": false,
"aggs": {
"author_agg": { "terms": {"field": "authors.keyword" } }
}
}'
I concluded, that the terms aggregation doesn't work with fields, that are contained by a list.
Next I thought about the nested aggregation, but the documentation says, it is a
single bucket aggregation
so not what I am searching for. Because I ran out of ideas I tried it, but was getting the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I found this answer and tried use it for my data. I had the following request:
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "authors"
},
"aggs": {
"authorname": {
"terms" : {
"field": "name.keyword"
}
}
}
}
}
}'
which gave me the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I searched for how to make my path nested using mappings, but I couldn't find out how to accomplish that. I don't even know, if this actually makes sense or not.
So how can I aggregate the documents into buckets based on a key, that lies in elements of a list inside the documents?
Maybe this question have been answered somewhere else, but then I'm not able to state my problem in the right way, since I'm still confused by all the new information. Thank you for your help in advance.
I finally solved my problem:
The idea of getting the authors key mapping nested was totally right. But unfortunately Elasticsearch does not let you change the type from un-nested to nested directly, because all items in this key then have to be indexed too. So you have to go the following way:
Create a new index with a custom mapping. Here we go into the document type _doc, into it's properties and then into the documents field authors. There we set type to nested.
~
curl -X PUT "localhost:9200/new_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc" : {
"properties" : {
"authors": { "type": "nested" }
}
}
}
}'
Then we reindex our dataset and set the destination to our newly created index. This will index the data from the old index into the new index, essentially copying the pure data, but taking the new mapping (since settings and mappings are not copied this way).
~
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}'
Now we can do the nested aggregation here, to sort the documents into buckets based on the authors:
curl -X GET 'localhost:9200/new_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"authors": {
"nested": {
"path": "authors"
},
"aggs": {
"authors_by_name": {
"terms": { "field": "authors.name.keyword" }
}
}
}
}
}'
I don't know how to rename indices until now, but surely you can just simple delete the old index and then do the described procedure to create another new index with the name of the old one but the custom mapping.

How to denormalize hierarchy in ElasticSearch?

I am new to ElasticSearch and I have a tree, which describes a path to a certain document (not real filesystem paths, just simple text fields categorizing articles, images, documents as one). Each path entry has a type, like.: Group Name, Assembly name or even Unknown. The types could be used in queries to skip certain entries in the path for example.
My source data is stored in SQL Server, the schema looks something like this:
Tree builds up by connecting the Tree.Id to Tree.ParentId, but each node must have a type. The Documents are connected to a leaf in the Tree.
I am not worried about querying the structure in SQL Server, however I should find an optimal approach to denormalize and search them in Elastic. If I flatten the paths and make a list of "descriptors" for a document, I can store each of the Document entries as an Elastic Document.:
{
"path": "NodeNameRoot/NodeNameLevel_1/NodeNameLevel_2/NodeNameLevel_3/NodeNameLevel_4",
"descriptors": [
{
"name": "NodeNameRoot",
"type": "type1"
},
{
"name": "NodeNameLevel_1",
"type": "type1"
},
{
"name": "NodeNameLevel_2",
"type": "type2"
},
{
"name": "NodeNameLevel_3",
"type": "type2"
},
{
"name": "NodeNameLevel_4",
"type": "type3"
}
],
"document": {
...
}
}
Can I query such a structure in ElasticSearch? Or Should I denormalize the paths in a different way?
My main questions:
Can query them based on type or text value (regex matching for example). For example: Give me all the type2->type3 paths (practically leave the type1 out), where the path contains X?
Is it possible to query based on levels? Like I would like the paths where there are 4 descriptors.
Can I do the searching with the built-in functionality or do I need to write an extension?
Edit
Based on G Quintana 's anwser, I made an index like this.:
curl -X PUT \
http://localhost:9200/test \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"mappings": {
"path": {
"properties": {
"names": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"tokens": {
"type": "text",
"analyzer": "pathname_analyzer"
},
"depth": {
"type": "token_count",
"analyzer": "pathname_analyzer"
}
}
},
"types": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"tokens": {
"type": "text",
"analyzer": "pathname_analyzer"
}
}
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"pathname_analyzer": {
"type": "pattern",
"pattern": "#->>",
"lowercase": true
}
}
}
}
}'
And could query the depth like this.:
curl -X POST \
http://localhost:9200/test/path/_search \
-H 'content-type: application/json' \
-d '{
"query": {
"bool": {
"should": [
{"match": { "names.depth": 5 }}
]
}
}
}'
Which return correct results. I will test it a little more.
First of all you should identify all your query patterns to design how you will index your data.
From the example you gave, I would index documents of the form:
{
"path": "NodeNameRoot/NodeNameLevel_1/NodeNameLevel_2/NodeNameLevel_3/NodeNameLevel_4",
"types: "type1/type1/type2/type2/type3",
"document": {
...
}
}
Before indexing, you must configure mapping and analysis:
Field path:
use type text + analyzer based on pattern analyzer to split at / characters
use type token_count + same analyzer to compute path depth. Create a multi field (path.depth)
Field types
use type text + analyzer based on pattern analyzer to split at / characters
Configure index mappings and analysis to split the path and types fields and the , use a or a
Give me all the type2->type3 paths use a match_phrase query on the types field
where the path contains X use match query on the path field
where there are 4 descriptors use term query on path.depth sub field
Your descriptors field is not interesting.
The Path tokenizer might be interesting for some usecases.
You can apply multiple analyzer on the same field using multi-fields and then query if sub fields.

Why Elasticsearch "not_analyzed" field is split into terms?

I have the following field in my mapping definition:
...
"my_field": {
"type": "string",
"index":"not_analyzed"
}
...
When I index a document with value of my_field = 'test-some-another' that value is split into 3 terms: test, some, another.
What am I doing wrong?
I created the following index:
curl -XPUT localhost:9200/my_index -d '{
"index": {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Then I index the following document:
curl -XPOST localhost:9200/my_index/my_type -d '{
"my_field": "test-some-another"
}'
Then I use the plugin https://github.com/jprante/elasticsearch-index-termlist with the following API:
curl -XGET localhost:9200/my_index/_termlist
That gives me the following response:
{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}
Verify that mapping is actually getting set by running:
curl localhost:9200/my_index/_mapping?pretty=true
The command that creates the index seems to be incorrect. It shouldn't contain "index" : { as a root element. Try this:
curl -XPUT localhost:9200/my_index -d '{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
},
"mappings": {
"my_type": {
"_all": {
"enabled": false
},
"_source": {
"compressed": true
},
"properties": {
"my_field": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
In ElasticSearch a field is indexed when it goes within the inverted index, the data structure that lucene uses to provide its great and fast full text search capabilities. If you want to search on a field, you do have to index it. When you index a field you can decide whether you want to index it as it is, or you want to analyze it, which means deciding a tokenizer to apply to it, which will generate a list of tokens (words) and a list of token filters that can modify the generated tokens (even add or delete some). The way you index a field affects how you can search on it. If you index a field but don't analyze it, and its text is composed of multiple words, you'll be able to find that document only searching for that exact specific text, whitespaces included.
You can have fields that you only want to search on, and never show: indexed and not stored (default in lucene).
You can have fields that you want to search on and also retrieve: indexed and stored.
You can have fields that you don't want to search on, but you do want to retrieve to show them.

Resources