how to using append first string update by query elasticsearch - elasticsearch

i have doc:
{
"_index" : "name_index",
"_type" : "_doc",
"_id" : "45db3736bcccb55f28b9162b20d0c3",
"_score" : 1.0,
"_source" : {
"path" : {
"virtual" : "/2014/01/01/filename.pdf"
}
}
}
how to append a string to first path.virtual:
"virtual" : "Uploads/2014/01/01/filename.pdf"

If you want to update all document of your index (or a sub-set thereof), you can do it with _update_by_query coupled with an ingest pipeline. First, define your ingest pipeline:
PUT _ingest/pipeline/modify-path
{
"processors": [
{
"set": {
"field": "path.virtual",
"value": "Uploads{{{path.virtual}}}"
}
}
]
}
And then run it over your index, like this:
POST name_index/_update_by_query?pipeline=modify-path
{
"query": {
"match_all": {}
}
}
If you want to do it just over that one document, you can do it with a normal update like this:
POST name_index/_doc/45db3736bcccb55f28b9162b20d0c3/_update
{
"doc": {
"path": {
"virtual": "Uploads/2014/01/01/filename.pdf"
}
}
}

Related

How to extract matching groups when searching regex in elasticsearch

Am using elasticsearch to index some data (a text article) and mapped it as
"article": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
, then I use regexp queries to search for some matches, the result returns the correct documents, but is there a way to also return the matching groups/text which triggered the regex hit ?
You can use highlghting functionality of Elasticsearch.
Lets consider below is your sample document:
{
"article":"Elasticsearch Documentation"
}
Query:
{
"query": {
"regexp": {
"article": "el.*ch"
}
},
"highlight": {
"fields": {
"article": {}
}
}
}
Response
{
"_index" : "index1",
"_type" : "_doc",
"_id" : "cHzAH4IBgPd6xUeLm9QF",
"_score" : 1.0,
"_source" : {
"article" : "Elasticsearch Documentation"
},
"highlight" : {
"article" : [
"<em>Elasticsearch</em> Documentation"
]
}
}

How to Order Completion Suggester with Fuzziness

When using a Completion Suggester with Fuzziness defined the ordering of results for suggestions are alphabetical instead of most relevant. It seems that whatever the fuzzines is set to is removed from the search/query term at the end of the term. This is not what I expected from reading the Completion Suggester Fuzziness docs which state:
Suggestions that share the longest prefix to the query prefix will be scored higher.
But that is not true. Here is a use case that proves this:
PUT test/
{
"mappings":{
"properties":{
"id":{
"type":"integer"
},
"title":{
"type":"keyword",
"fields": {
"suggest": {
"type": "completion"
}
}
}
}
}
}
POST test/_bulk
{ "index" : {"_id": "1"}}
{ "title": "HOLARAT" }
{ "index" : {"_id": "2"}}
{ "title": "HOLBROOK" }
{ "index" : {"_id": "3"}}
{ "title": "HOLCONNEN" }
{ "index" : {"_id": "4"}}
{ "title": "HOLDEN" }
{ "index" : {"_id": "5"}}
{ "title": "HOLLAND" }
The above creates an index and adds some data.
If a suggestion query is done on said data:
POST test/_search
{
"_source": {
"includes": [
"title"
]
},
"suggest": {
"title-suggestion": {
"completion": {
"fuzzy": {
"fuzziness": "1"
},
"field": "title.suggest",
"size": 3
},
"prefix": "HOLL"
}
}
}
It returns the first 3 results in alphabetical order of the last matching character, instead of the longest prefix (which would be HOLLAND):
{
...
"suggest" : {
"title-suggestion" : [
{
"text" : "HOLL",
"offset" : 0,
"length" : 4,
"options" : [
{
"text" : "HOLARAT",
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"title" : "HOLARAT"
}
},
{
"text" : "HOLBROOK",
"_index" : "test",
"_type" : "_doc",
"_id" : "2",
"_score" : 3.0,
"_source" : {
"title" : "HOLBROOK"
}
},
{
"text" : "HOLCONNEN",
"_index" : "test",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"title" : "HOLCONNEN"
}
}
]
}
]
}
}
If the size param is removed then we can see that the score is the same for all entries, instead of the longest prefix being higher as stated.
With this being the case, how can results from Completion Suggesters with Fuzziness defined be ordered with the longest prefix at the top?
This has been reported in the past and this behavior is actually by design.
What I usually do in this case is to send two suggest queries (similar to what has been suggested here), one for exact match and another for fuzzy match. If the exact match contains a suggestion, I use it, otherwise I resort to using the fuzzy ones.
With the suggest query below, you'll get HOLLAND as exact-suggestion and then the fuzzy matches in fuzzy-suggestion:
POST test/_search
{
"_source": {
"includes": [
"title"
]
},
"suggest": {
"fuzzy-suggestion": {
"completion": {
"fuzzy": {
"fuzziness": "1"
},
"field": "title.suggest",
"size": 3
},
"prefix": "HOLL"
},
"exact-suggestion": {
"completion": {
"field": "title.suggest",
"size": 3
},
"prefix": "HOLL"
}
}
}

Elasticsearch Aggregation on array of single objects

My query search result is of following structure
[
{
"_index" : "xxxx",
"_type" : "status",
"_id" : "01xxxxxxxxxxx",
"_score" : 6.297049,
"_source" : {
"messageDetail" : {
"errors" : [
{
"errorMessage" : ".metaData should have required property 'schemaVersion'"
}
]
}
}
},
{
"_index" : "xxxx",
"_type" : "status",
"_id" : "076XXXXxxx",
"_score" : 6.297049,
"_source" : {
"messageDetail" : {
"errors" : [
{
"errorMessage" : ".metaData should have required property 'scenarioName'"
}
]
}
}
},
...]
I would like to aggregate over messageDetail.errors.errorMessage and create a map alike structure that will hold the different error messages and their number of occurrence in a key-value pair.
P.S. - messageDetail.error is an array of single object.
Can someone please provide any query for the same.
Adding a working example with index data (used same as that given in question), index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"messageDetail": {
"properties": {
"errors": {
"properties": {
"errorMessage": {
"type": "keyword"
}
}
}
}
}
}
}
}
Search Query
{
"size": 0,
"aggs" : {
"states" : {
"terms" : {
"field" : "messageDetail.errors.errorMessage"
}
}
}
}
Search Result:
"aggregations": {
"states": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": ".metaData should have required property 'scenarioName'",
"doc_count": 1
},
{
"key": ".metaData should have required property 'schemaVersion'",
"doc_count": 1
}
]
}

How to make automatically #timestamp value into elasticsearch7's doc?

I have some problem with setting elasticsearch 7 version.
My purpose is make automatically #timestamp field value after make new doc in ES.
I found some answer about similar question. but it can't be solution because it is different version.
I tried _default_ object in mappings object. But it seems to not provide anymore in ES 7 version.
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
And I want to make #timestamp value in this case.
PUT /locations
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
},
"id": {
"type": "text"
}
}
}
}
PUT /locations/_doc/1
{
"location" : "31.387593,121.123446",
"id" : "xxxxxxxxxxxxxxxxxxxxxx"
}
expectd result :
{
#timestamp : "2019-10-23 10:23:50",
"location" : "31.387593,121.123446",
"id" : "xxxxxxxxxxxxxxxxxxxxxx"
}
You can create an ingest pipeline
PUT _ingest/pipeline/timestamp
{
"description": "Adds timestamp to documents",
"processors": [
{
"set": {
"field": "_source.timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
And call it while inserting documents
POST index39/_doc?pipeline=timestamp
{
"id":1
}
Response:
{
"_index" : "index39",
"_type" : "_doc",
"_id" : "KWF6920BpmJq35glEsr3",
"_score" : 1.0,
"_source" : {
"id" : 1,
"timestamp" : "2019-10-23T07:17:15.639200400Z"
}
}
}

How can I find all documents in elasticsearch that contain a number in a certain field?

I have a keyword type'd field that can contain either a number or a string. If the field does not contain any letters, I would like to hit on that document. How can I do this?
My index mapping looks like:
{
"mappings": {
"Entry": {
"properties": {
"testField": {
"type": "keyword"
}
}
}
}
}
My documents look like this:
{
"testField":"123abc"
}
or
{
"testField": "456789"
}
I've tried the query:
{
"query": {
"range": {
"gte": 0,
"lte": 2000000
}
}
}
but it stills hits on 123abc. How can I design this so that I only hit on the documents with a number in that particular field?
There is another more optimal option for achieving exactly what you want. You can leverage the ingest API pipelines and using a script processor you can create another numeric field at indexing time that you can then use more efficiently at search time.
The ingestion pipeline below contains a single script processor which will create another field called numField that will only contain numeric values.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
ctx.numField = /\D/.matcher(ctx.testField).replaceAll("");
"""
}
}
]
},
"docs": [
{
"_source": {
"testField": "123"
}
},
{
"_source": {
"testField": "abc123"
}
},
{
"_source": {
"testField": "123abc"
}
},
{
"_source": {
"testField": "abc"
}
}
]
}
Simulating this pipeline with 4 different documents having a mix of alphanumeric content, will yield this:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "abc123"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "123",
"testField" : "123abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"numField" : "",
"testField" : "abc"
},
"_ingest" : {
"timestamp" : "2019-05-09T04:14:51.448Z"
}
}
}
]
}
After indexing your documents using this pipeline, you can run your range query on numField instead of testField. Compared to the other solution (sorry #Kamal), it will shift the scripting burden to run only once per document at indexing time, instead of everytime on every document at search time.
{
"query": {
"range": {
"numField": {
"gte": 0,
"lte": 2000000
}
}
}
}
Afaik, Elasticsearch does not have a direct solution for this.
Instead you would need to write a Script Query. Below is what you are looking for:
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"lang": "painless",
"source": """
try{
String temp = doc['testField'].value;
int a = Integer.parseInt(temp);
if(a instanceof Integer)
return true;
}catch(NumberFormatException e){
return false;
}
"""
}
}
}
]
}
}
}
Hope it helps!

Resources