I've got an index with a mapping of 3 fields. Let's say f1, f2 and f3.
I want a new keyword field with the concatenation of the values of f1, f2 and f3 to be able to aggregate by it to avoid having lots of nested loops when checking the search results.
I've seen that this could be achieved by source transformation, but since elastic v5, this feature was deleted.
ElasticSearch version used: 6.5
Q: How can I archieve the concatenation in ElasticSearch v 6.5?
There was indeed source transformation prior to ES 5, but as of ES 5 there is now a more powerful feature called ingest nodes which will allow you to easily achieve what you need:
First, define an ingest pipeline using a set processor that will help you concatenate three fields into one:
PUT _ingest/pipeline/concat
{
"processors": [
{
"set": {
"field": "field4",
"value": "{{field1}} {{field2}} {{field3}}"
}
}
]
}
You can then index a document using that pipeline:
PUT index/doc/1?pipeline=concat
{
"field1": "1",
"field2": "2",
"field3": "3"
}
And the indexed document will look like:
{
"field1": "1",
"field2": "2",
"field3": "3",
"field4": "1 2 3"
}
Just make sure to create the index with the appropriate mapping for field4 prior to indexing the first document.
Related
I'm pretty new on Elasticsearch and all its concepts. I would like to understand how I could accomplish what I have in my Relational DB in an Elasticsearch architecture.
The scenario is the following
I have a index "data":
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A", "A1", "B"]
}
The requirement says that data can be queried by:
some text search in the context field
that belongs to a specific type or category
So far, so simple, so good.
This data will not be completed from the creating time. It might happen that new categories will be added/removed to the data later. So, many data uploads/re-indexes might happen along the way
For example:
create the data
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A"]
}
Then it was decided that all data with type=T1 must belong to both A & B categories.
{
"id": "00001",
"content" : "some text here ..",
"type": "T1",
"categories: ["A", "B"]
}
If I have a billion hits for type=T1 I would have to update/re-index a billion entries. Maybe it is how things should work and this where my question lands on.
Is ok to re-index all the data just to add/remove a new category, or would it be possible to have a second much smaller index just to do this association and somehow join both indexes at time to query?
Something like it:
Data:
{
"id": "00001",
"content" : "some text here ..",
"type": "T1"
}
DataCategories:
{
"type": "T1"
"categories" : ["A", "B"]
}
Is it acceptable/possible?
This is a common scenario - but unfortunately, there is no 1:1 mapping for RDBMS features in text search engines like Lucene/elasticsearch.
Possible options:
1 - For the best performance, reindex. It may not be practical depending on the velocity of your change
2 - Consider Parent-Child; Though it's a slower option - often will meet performance requirements. The category could be a parent document, each having several thousands of children.
3 - If its category renaming - Consider using IDs for the category and translating it to text in the application.
4 - Update document depends on the number of documents to be updated; maybe for few thousand - run an update query, if more - reindex.
Suggested reading - https://www.elastic.co/blog/managing-relations-inside-elasticsearch
I'm looking for a way of searching across a tokenized field in Elasticsearch, so instead of returning the Elements indexed with my search, return a unique set of values that matched the best.
{
"id": 1,
"brand": [
"word1",
"another"
]
},
{
"id": 2,
"brand": [
"word2",
"word3",
"yet_another"
]
}
So searching for wo, I would recieve a list of the words word1, word2 and word3 scored, of course.
Should I create a new index for that with these values?
Is there a way I can do that work by reusing the tokenization of my index?
If I run a search query, I want to be able to select just the distinct fields from the hits/sources. To be clear, I don't want to limit the fields that are returned, I want to select the available fields as a list. Is this possible?
For example, if I ran a typical search and there are three results each with different fields:
R1
"Field1": "foo"
"Field2": "bar"
R2
"Field2": "bara"
R3
"Field1": "fooa"
"Field3": "baz"
The result would be ["Field1", "Field2", "Field3"]
Given the following three User entries in an ElasticSearch index:
"user": [
{
"userId": "100",
"hobby": "chess"
}
"user": [
{
"userId": "200",
"hobby": "music"
}
"user": [
{
"userId": "300",
"hobby": ""
}
I want to create a vertical bar chart to compare the number of users who have a hobby as opposed to those who do not. Individual hobbies should not be shown separately, but grouped together.
If split along the Y axis, one block would take up two thirds of the height (the two users with hobbies) and one block one third of the height (the one user with no hobbies).
How could one achieve this grouping in Kibana?
Thanks
You'll need to choose Split Bars and then Filters aggregation. Once you have that selected you should see Query 1 with * in it. Change the * to hobby:*. Next hit Add Filter and put in NOT hobby:*
The filters aggregation lets you bucket things pretty much any way you can search for things.
Using ES 1.2.1
My aggregation
{
"size": 0,
"aggs": {
"cities": {
"terms": {
"field": "city","size": 300000
}
}
}
}
The issue is that some city names have spaces in them and aggregate separately.
For instance Los Angeles
{
"key": "Los",
"doc_count": 2230
},
{
"key": "Angeles",
"doc_count": 2230
},
I assume it has to do with the analyzer? Which one would I use to not split on spaces?
For fields that you want to perform aggregations on I would recommend either the keyword analyzer or do not analyze the field at all. From the keyword analyzer documentation:
An analyzer of type keyword that "tokenizes" an entire stream as a single token. This is useful for data like zip codes, ids and so on. Note, when using mapping definitions, it might make more sense to simply mark the field as not_analyzed.
However if you want to still perform analysis on the field to include for other searches, then consider using the field setting of ES 1.x As described in the field/multi_field documentation. This will allow you to have a value of the field for searching and one for aggregations.
There are 2 approaches to solve this.
The not_analyzed way - But this wont consider different capital and small cases
The keyword tokenizer way - Here we can map different terms with
different case as one.
These two concepts with working code examples are illustrated in this blog.