What are concrete indices - elasticsearch

In relation to elasticsearch, what are concrete indices.
The elasticsearch docs mention them hundreds of times, but I can't find a definition anywhere.
For example:
count - allowNoIndices:
Whether to ignore if a wildcard indices expression resolves into no ~concrete indices~.

A concrete index is simply a real index that is stored in Elasticsearch and that you can list with a /_cat/indices command such as
curl 'localhost:9200/_cat/indices?v'
As you probably know, when searching you can either specify:
one concrete index: /my_index/_search
more than one concrete indices: /my_index1,my_index2/_search
one alias: /my_alias/_search
more than one aliases: /my_alias1,my_alias2/_search
an index wildcard: /my_*/_search
In cases 1 and 2, you specify concrete indices, i.e. indices that you would see listed by the /_cat/indices command above.
In cases 3 and 4, the alias(es) you specify will resolve to concrete indices, so that in the end if my_alias is an alias for my_index1 and my_index2, then 3. is equivalent to 2.
In case 5, it's just a shortcut to not have to list all concrete indices whose name starts with the prefix my_. You often use that when you have time-based indices, such as logstash-2015* for all logstash indices of the year 2015.
To sum it up, a concrete index is an index that you have created one way or another and that will show when listing all indices of present in your Elasticsearch instance.

Related

Query a text/keyword field in Elasticsearch that contains at least one item not matching a set

I have a document has a "bag.contents" field (indexed as text with a .keyword derivative) that contains a comma separated list of items contained in it. Below are some samples:
`Apple, Apple, Apple`
`Apple, Orange`
`Car, Apple` <--
`Orange`
`Bus` <--
`Grape, Car` <--
'Car, Bus` <--
The desired query results should be all documents where there is at least one instance of something other than 'Apple', 'Orange', 'Grape', as per the arrows above.
I'm sure the DSL is a combination of must and not but after 20 or so iterations it seems very difficult to get Elasticsearch to return the correct result set short of one that doesn't contain any of those 3 things.
It is also worth noting that this field in the original document is a JSON array and Kibana shows it as a single field with the elements as a comma-separated field. I suspect this may be complicating it.
1 - If it is showing up as single field, probably its not indexed as array - Please make sure document to index is formed properly. i.e, you need it to be
{ "contents": ["apple","orange","grape"]}
and not
{"contents": "apple,orange,grape"}
2- Regarding query - if you know all the terms possible while doing query- you can form a term_set query with all other terms but apple , orange and grape. termset query allows to control min matches required ( 1 in your case)
If you dont know all possible terms , may be create a separate field for indexing all other words minus apple orange and grape and query against that field.

Is there a way to add newly added field in one of the indexes to be included in index pattern?

I've an alias setup for rolling indices in elastic search. Let's call the alias : "alias" for now. It points to a number of indexes and rolls over after every 100gb. Now, let's say the number of fields in previous indices associated with alias is 100 and I've added one more field while writing to latest index. so, the number of fields become 101.
I've setup an index pattern by the name of "alias" and I can see all the indices listed via that index pattern but I am unable to visualize the 101th field I just added in the recent indices. Is there a way to do it ?
Please let me know if more details are needed regarding the same.
Hope you added the new field in the write index that your alias is pointing to, an alias can have only one write index but can have many read index and if you added the new field to a read index of your alias, you will not be able to visualise it using your alias.

Can you have an index pattern with a field with multiple field types?

Currently I have an elasticsearch index that rolls over periodically. We have an index mapping applied to a certain index pattern. We want to update the field type of the index for subsequent indices that gets rolled over.
If we change the mapping of a field from a string type to number for new rolled over indices, what happens in the index pattern when refreshed?
Would the index pattern have the field as one type over the other?
There is only one version of an index pattern at any given time. When you update it (i.e. change some mapping type), all the existing indices matching that index pattern remain unchanged. All future indices created out of that index pattern will get the modification (i.e. new field mapping type).
What you need to be aware is that you'll end up with (old) indices containing documents with a field having the old mapping type and (new) indices containing documents with a field having the new mapping type. Depending on the change you make, some of your queries running on old and new indices might not run correctly afterwards. Make sure that your queries still work with that mapping change.

What is the use of maintaining two aliases for a single Elastic Search Index

I have been exploring Elastic Search lately.
I have been going through aliases. I see ES provides an API to create multiple aliases to a single index like below:
{ "actions" : [{ "add" : { "indices" : ["test1", "test2"], "alias" : "alias1" } }] }
Refer: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html#indices-aliases
I'm wondering what is the use case of this.
Won't the queries on aliases get split if an alias point to multiple indices?
I have tried getting the info, but failed to do so as everywhere it's being explained how to acheive this but not the use case.
Directing me to a resource where I could get more info would also help.
A possible use case is when your application has to switch from an
old index to a newindex with zero downtime.
Let's say you want to reindex an index because of some reasons and you're not using aliases with your index then you need to update your application to use the new index name.
How this is helpful?
Assume that your application is using the alias instead of an index name.
Let's create an index:
PUT /my_index
Create its alias:
PUT /my_index/_alias/my_index_alias
Now you've decided to reindex your index (maybe you want to change the existing mapping).
Once documents have been reindexed correctly, you can switch your alias to point to the new index.
Note: You need to remove the alias from the old index at the same time as we add it to the new index. You can do it using _aliases endpoint atomically.
A good read : elastic
As per your question usage of maintaining two aliases for a single index:
Create “views” on a subset of the documents in an index.
Using multiple indices having same alias:
Group multiple indices under same name, which is helpful if you want to perform a single query on multiple index at the same time.
But you can't insert/index data using this strategy.
Lets say that you have to types of events, eventA & eventB. You want to "partition" them by time, so you use alias to map multiple indices (e.g. eventA-20220920) to one alias ('eventA' in this case). And you want make one alias for all the event types, so you need to give all the eventA-* and eventB-* indices another alias 'event'.
That way when you add a third type of event (eventC) you can just add them to the 'event' alias and don't change your queries

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

Resources