Use Ingestion Pipeline to split between two indexes

Use Ingestion Pipeline to split between two indexes - elasticsearch

I have documents containing the field "Status", this can have three values "Draft", "In Progress", or "Approved". I am trying to pass this document through a ingest pipeline, and if the status is equal to "Approved" then it should add it in the B index, whereas by default it should index in A index irrespective of status value.
for ex -
1.
{
"id":"123",
"status":"Draft"
}
{
"id":"1234",
"status":"InProgress"
}
{
"id":"12345",
"status":"Approved"
}
1,2,3 document should go to A Index and only document 3 should go to B Index
Is it possible to do it via Ingest Pipeline?

In your ingest pipeline, you can change the _index field very easily like this:
{
"set": {
"if": "ctx.status == 'Approved'",
"field": "_index",
"value": "index-b"
}
},
{
"set": {
"if": "ctx.status != 'Approved'",
"field": "_index",
"value": "index-a"
}
}
It is worth nothing, though, that you cannot send a document to two different indexes within the same pipeline, it's either index-a or index-b, but not both.
However, this can easily be solved by querying both indexes through an alias that spans both index-a and index-b

Related

How to enrich data in elastic search when the data to be enriched is inside an array

I have used information from the below link to create a pipeline for enriching the data using lookup from other indices.
Enriching the Data in Elastic Search
The problem that I am facing is :
my payload has this structure:
{
field1: value1,
field2:value2,
field3[
{
field3.1.1: value1,
field3.1.2: value2
},
{
field3.2.1: value1,
field3.2.2: value2
}
]
}
I created ingest pipeline for this and I am able to enrich the data correctly at the parent level, i.e. field1, field2.
However since field3 is an array element, enrichment doesn't work straight away. So I applied foreach processor in the pipeline.
and the processor of the pipeline is enrich processor.
PUT _ingest/pipeline/test-data-lookup
{
"processors": [
{
"foreach": {
"field": "field3",
"processor": {
"enrich": {
"policy_name": "field3-policy",
"field": "_ingest._value.field3.1.1",
"target_field": "{{{_ingest._value.field3.1.1}}}"
}
}
}
}
]
}
the target field is generated correctly however if I have to set field3.1.1 with the look up value defined in target_field. I have to use set processor like this.
{
"set": {
"if": "_ingest._value.field3.1.1 != null",
"field": "field1",
"value": "{{_ingest._value.field3.1.1.codevalue}}"
}
}
The problem is if condition here doesn't like _ingest._value and so gives compilation error, and because of this I am not able to compare the value of the target field with the incoming value and so all the elements of the array end up having the same codevalue.
I am new to elastic and have read almost all the documentation that I am able to understand right now. What I am trying to do, is it even possible or not?

ElasticSearch query returns wrong results

I'm relatively new to ElasticSearch and encountered this issue which I can't seem to get why.
So for this particular field, it seems to be treating all the values to be zero, even though the individual records are non-zero values. This only seems to happen to this number field and not other similar fields (such as cpu pct, mem pct etc)
The records only show when I query for records that have 'system.filesystem.used.pct == 0', whereas none of them show when I do something like 'system.filesystem.used.pct > 0'.
I also did the querying in the dev tools in kibana like so, yet I don't get any results:
GET metricbeat-*/_search{
"query": {
"range":{
"system.filesystem.used.pct":{
"gt":0
}
}
}
}
However, if I did this, I will get all non-zero results, just like in discover:
GET metricbeat-*/_search
{
"query": {
"term": {
"system.filesytem.used.pct":0
}
}
}

As pointed out by #Ron Serruya, there is a mapping issue. The mapping for system.filesytem.used.pct is detected as to be of integer type. Since, you are getting the expected search results for cpu.pct field, the mapping of cpu.pct, must have been of float type
CASE 1:
If you index the two sample data as (in the same order)
{
"count": 0.45
}
{
"count": 0
}
Then float data type is detected by elasticsearch (if you are using dynamic mapping). this is because the detection of the field type depends on the first data that you have inserted in the field.
CASE 2:
Now, if you index the data in this order
{
"count": 0
}
{
"count": 0.45
}
Here elasticsearch will detect count to be of long data type.
You need to recreate the index, with the new index mapping, reindex the data and then run the search query on system.filesytem.used.pct
Modified index mapping will be
{
"mappings": {
"properties": {
"system": {
"properties": {
"filesytem": {
"properties": {
"used": {
"properties": {
"pct": {
"type": "float"
}
}
}
}
}
}
}
}
}
}

How to change the field type in an ElasticSearch Index?

I have index_A, which includes a number field "foo".
I copy the mapping for index_A, and make a dev tools call PUT /index_B with the field foo changed to text, so the mapping portion of that is:
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
I then reindex index_A to index_B with:
POST _reindex
{
"source": {
"index": "index_A"
},
"dest": {
"index": "index_B"
}
}
When I go to view any document for index_B, the entry for the "foo" field is still a number. (I was expecting for example: "foo": 30 to become "foo" : "30" in the new document's source).
As much as I've read on Mappings and reindexing, I'm still at a loss on how to accomplish this. What specifically do I need to run in order to get this new index with "foo" as a text field, and all number entries for foo in the original index changed to text entries in the new index?

There's a distinction between how a field is stored vs indexed in ES. What you see inside of _source is stored and it's the "original" document that you've ingested. But there's no explicit casting based on the mapping type -- ES stores what it receives but then proceeds to index it as defined in the mapping.
In order to verify how a field was indexed, you can inspect the script stack returned in:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(doc['foo'])"
}
}
}
}
as opposed to how a field was stored:
GET index_b/_search
{
"script_fields": {
"debugging_foo": {
"script": {
"source": "Debug.explain(params._source['foo'])"
}
}
}
}
So in other words, rest assured that foo was indeed indexed as text + keyword.
If you'd like to explicitly cast a field value into a different data type in the _source, you can apply a script along the lines of:
POST _reindex
{
"source": {
"index": "index_a"
},
"dest": {
"index": "index_b"
},
"script": {
"source": "ctx._source.foo = '' + ctx._source.foo"
}
}
I'm not overly familiar with java but I think ... = ctx._source.foo.toString() would work too.
FYI there's a coerce mapping parameter which sounds like it could be of use here but it only works the other way around -- casting/parsing from strings to numerical types etc.
FYI#2 There's a pipeline processor called convert that does exactly what I did in the above script, and more. (A pipeline is a pre-processor that runs before the fields are indexed in ES.) The good thing about pipelines is that they can be run as part of the _reindex process too.

ElasticSearch append non matched docs at the end of the search result

Is there any way to append non matched docs at the end of the search result?
I have been working on a project where we need to search docs by geolocation data but some docs don't have the geolocation data available. As a result of that these docs not returning in the search result.
Is there any way to append non matched docs at the end of the search result?
Example mapping:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text"
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
Data with geo location:
PUT /my_locations/_doc/1
{
"address" : {
"city: "XYZ",
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Data without geo location:
PUT /my_locations/_doc/2
{
"address" : {
"city: "ABC"
}
}
Is there any way to perform geo distance query which will select the docs with geolocation data plus append the non geo docs at the end of the result?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html#query-dsl-geo-distance-query

You have two separate queries
Get documents within the area
Get other documents
To get both of these in one search, would mean all of the documents appear in one result, and share ranking. It would be difficult to create a relevancy model which gets first 9 documents with address, and one without.
But you can just run two queries at once, one for say, the first 9 documents with location, and one for without any.
Example:
GET my_locations/_msearch
{}
{"size":9,"query":{"geo_distance":{"distance":"200km","pin.location":{"lat":40,"lon":-70}}}}
{}
{"size":1,"query":{"bool":{"must_not":[{"exists":{"field":"pin.location"}}]}}}

How can i get unique suggestions without duplicates when i use completion suggester?

I am using elastic 5.1.1 in my environment. I have chosen completion suggester on a field name post_hashtags with an array of strings to have suggestion on it. I am getting response as below for prefix "inv"
Req:
POST hashtag/_search?pretty&&filter_path=suggest.hash-suggest.options.text,suggest.hash-suggest.options._source
{"_source":["post_hashtags" ],
"suggest": {
"hash-suggest" : {
"prefix" : "inv",
"completion" : {
"field" : "post_hashtags"
}
}
}
Response :
{
"suggest": {
"hash-suggest": [
{
"options": [
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid"
]
}
},
{
"text": "invalid",
"_source": {
"post_hashtags": [
"invalid",
"coment_me",
"daya"
]
}
}
]
}
]
}
Here "invalid" is returned twice because it is also a input string for same field "post_hashtags" in other document.
Problems is if same "invalid" input string present in 1000 documents in same index then i would get 1000 duplicated suggestions which is huge and not needed.
Can I apply an aggregation on a field of type completion ?
Is there any way I can get unique suggestion instead of duplicated text field, even though if i have same input string given to a particular field in multiple documents of same index ?

ElasticSearch 6.1 has introduced the skip_duplicates operator. Example usage:
{
"suggest": {
"autocomplete": {
"prefix": "MySearchTerm",
"completion": {
"field": "name",
"skip_duplicates": true
}
}
}
}

Edit: This answer only applies to Elasticsearch 5
No, you cannot de-duplicate suggestion results. The autocomplete suggester is document-oriented in Elasticsearch 5 and will thus return suggestions for all documents that match.
In Elasticsearch 1 and 2, the autocomplete suggester automatically de-duplicated suggestions. There is an open Github ticket to bring back this functionality, and it looks like it is possible to do so in a future version.
For now, you have two options:
Use Elasticsearch version 1 or 2.
Use a different suggestion implementation not based on the autocomplete suggester. The only semi-official suggestion I have seen so far involve putting your suggestion strings in a separate index.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Use Ingestion Pipeline to split between two indexes - elasticsearch

Related

How to enrich data in elastic search when the data to be enriched is inside an array

ElasticSearch query returns wrong results

How to change the field type in an ElasticSearch Index?

ElasticSearch append non matched docs at the end of the search result

How can i get unique suggestions without duplicates when i use completion suggester?

Categories

Resources