I have to upsert bulk records in elastic search index with _id being combination of more than one field from the message. Can I do so. if that can be done then please give me a sample json for the same.
Regards
A sample _id field I am looking for some thing like below
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "<<<combination of name , period_type>>>",
"_score": 1,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Naturally, you can specify your own Elasticsearch document ids during a call to the Index API:
PUT kpi_aggr/KPIBackChannel/kpi-v1,w
{
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
You can also do so during a _bulk API call:
POST _bulk
{ "index" : { "_index" : "kpi_aggr", "_type" : "KPIBackChannel", "_id" : "kpi-v1,w" } }
{"name":"kpi-v1","period_type":"w","country":"AL","pg_name":"DENTAL CARE","panel_type":"retail","number_of_records_with_proposal":10000,"number_of_proposals":80000,"overall_number_of_records":2000,"#timestamp":1442162810}
Notice that Elasticsearch will replace the document with the new version.
If you execute these two queries on an empty index, then querying by document id:
GET kpi_aggr/KPIBackChannel/kpi-v1,w
will give you the following:
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "kpi-v1,w",
"_version": 2,
"found": true,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Notice "_version": 2, which in our case indicates that a document has been indexed twice, hence performed an "upsert" (but in general is meant to be used for Optimistic Concurrency Control).
Hope that helps!
Related
My events in Elasticsearch look something like that (simplified version):
{
"_index": "greatest_index-2023.01",
"_type": "_doc",
"_id": "5BQ8yIUBtpR1CBn8kFyo",
"_version": 1,
"_score": 0,
"_source": {
"#version": "1",
"#timestamp": "2023-01-18T09:35:50.251Z",
"id": "4e80c00dd8e003c8",
"action": "action1"
},
"fields": {
"#timestamp": [
"2023-01-18T09:35:50.251Z"
]
}
}
Basically, the "id" field is common to multiple events. Each id goes through a few "action" field values through time (action1, action2, action3) - only once for each action value.
I'm trying to create a visualization in Kibana that would display the actions each id went through.
If it were a table, it could look something like this :
id
actions
5BQ8yIUBtpR1CBn8kFyo
action1, action 2
pISQ9VDSJVlkqklv9VQ9
action1
cohqBHSQC85AHB67AB2h
action1, action 2, action 3
I tried to use Transforms in the Elasticsearch section of Kibana (v 7.5.0), but it doesn't seem to be the right way.
How would you recommend doing that ?
This is payload
{
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
}
Result in the form
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn"
}
}
This is a document where I want to increment the count value of the field every time when this doc gets updated.
we have to add new field which has the name counter_value.
Expected Resultt
{"_index": "rwe",
"_type": "_doc",
"_id": "8wEed3ABcYN_H8khP4hB",
"_score": 1,
"_source": {
"videourl": "*****",
"name": "ABCqq",
"description": "AAAnb",
"tags": "#AAAzx",
"uploadedtime": "2020-02-24T05:48:37.527Z",
"uploadedby": "Dr AAAgh",
"thumbnail": "http://",
"duration": "5:32",
"postedby": "AAAdf",
"doctorimage": "AAA12",
"doctorname": "nnn",
"counter_value": 1
}
}
You can just increment the counter via scripting, see here and here. However, elastic already has a version field. Depending on your usecase, it might be enough to add the version parameter to your query, as described here:
curl -XGET 'http://localhost:9200/rwe/_search?version=true'
I search for key word machine4 in my ES . My python client is simply:
result = es.search('machine4', index='machines')
Result look like this
[
{
"_score": 0.13424811,
"_type": "person",
"_id": "2",
"_source": {
"date": "**20180601**",
"deleted": [],
"changed": [
"machine1",
"machine2",
"machine3"
],
"**added**": [
"**machine4**",
"machine5"
]
},
"_index": "contacts"
},
{
"_score": 0.13424811,
"_type": "person",
"_id": "3",
"_source": {
"date": "**20180701**",
"deleted": [
"machine2"
],
"**changed**": [
"machine1",
"**machine4**",
"machine3"
],
"added": [
"machine7"
]
},
"_index": "contacts"
}
]
So we can easily see:
In date 20180601 , machine4 belonged to added.
In date 20180701 , machine4 belonged to changed.
I can write another function to analyze the result. Basically loop through every key,value of each items and check if the searched keyword belong, like this:
for result in search_results['hits']['hits']:
source_result = result['_source']
for key,value in source_result.items():
if 'machine4' in value:
print key
However, I wonder if ES having API to detect which key/mapping/field that the searched keywords belonged to ? In this case is added of the 1st result, and changed in 2nd result
Thank you so much
Alex
The simple answer seems to be that no, Elasticsearch doesn't have a way to do this out of the box, because Lucene doesn't have it, as per this thread
Elasticsearch has the concept of highlights, however. These could be useful, but they do require you to have some idea about which fields the match may be in.
The ES Python search documentation suggests there's no way to do that as a parameter to search, but you could create a custom query and pass it on as the q argument. It would look something like:
q = {"query" : {"match": { "content": "'machine4'" }}, "highlight" : {"fields" : {"added" : {}, "updated": {}}}}
result = es.search(index='machines', q=q)
Hope this is helpful!
I'm use elastic search for about one month and i've found one thing one query fuzzie that i can't understand.
The scenario is i've a set of users on a type and index almost 10.000 items, and i want to search for username, and return all the items that match with search string in a fuzzy mode, for example my user is "masterviana" if i search by only with text "mastervi" i expect to see the masterviana at the top of results using a fuzzy query right?
"fuzzy" : {
"public_name" : {
"value" : "mastervi",
"boost" : 1.0,
"fuzziness" : 2,
"prefix_length" : 0,
"max_expansions": 100
}
}
However i'm not seeing my username (masterviana) at the first page and also i see usernames that are "less similar" like my query string, i'll show the only the first 5 hits for not extended to much the post
{
"_index": "username",
"_type": "username",
"_id": "2061|FZ4y1t042482S3EqobiVllmv00",
"_score": 9.198499,
"_source": {
"public_name": "masterv",
"bbid": "FZ4y1t042482S3EqobiVllmv00",
"hash": 2061,
"avata": "http://goo.gl/4CRt3v"
}
},
{
"_index": "username",
"_type": "username",
"_id": "2048|r0I5XZ31076phruMS1gu9Hjv00",
"_score": 5.9688096,
"_source": {
"public_name": "project--master",
"bbid": "r0I5XZ31076phruMS1gu9Hjv00",
"hash": 2048,
"avata": "http://goo.gl/4CRt3vr"
}
},
{
"_index": "username",
"_type": "username",
"_id": "1980|W5Wal166832UV5oCqUH9Vjcv00",
"_score": 5.7984095,
"_source": {
"public_name": "masterjv",
"bbid": "W5Wal166832UV5oCqUH9Vjcv00",
"hash": 1980,
"avata": "http://goo.gl/4CRt3v"
}
},
{
"_index": "username",
"_type": "username",
"_id": "2108|Kufhm899338GPWHsuoei1HOv00",
"_score": 5.7984095,
"_source": {
"public_name": "master25",
"bbid": "Kufhm899338GPWHsuoei1HOv00",
"hash": 2108,
"avata": "http://goo.gl/4CRt3v"
}
},
{
"_index": "username",
"_type": "username",
"_id": "1952|AtPw2a97575sC5JT406msOXv00",
"_score": 5.7984095,
"_source": {
"public_name": "masterpiz",
"bbid": "AtPw2a97575sC5JT406msOXv00",
"hash": 1952,
"avata": "http://goo.gl/4CRt3v"
}
},
AS you can see i'm getting at top 1. masterv 2. project-master i think my query "mastervi" is more close to "masterviana" that for example "masterv" or "project-master"
One more thing if i search with exactly the same text "masterviana" i'm getting only this item
The ranking is a blend of edit distance and (often unhelpfully) how rare a term is.
I'm not sure which of these is to blame in this case but the term scarcity ranking is a long-standing Lucene issue. There is a work-around in elasticsearch with FuzzyLikeThisQuery but that might not be around for much longer so this has accelerated the need to fix Lucene (see here for background https://github.com/elastic/elasticsearch/pull/10391 )
I am posting a query to http://localhost:9200/movie_db/movie/_search but _source attribute is always empty on the return resposne. I made it enabled but that doesn't help.
Movie DB:
TRY DELETE /movie_db
PUT /movie_db {"mappings": {"movie": {"properties": {"title": {"type": "string", "analyzer": "snowball"}, "actors": {"type": "string", "position_offset_gap" : 100, "analyzer": "standard"}, "genre": {"type": "string", "index": "not_analyzed"}, "release_year": {"type": "integer", "index": "not_analyzed"}, "description": {"_source": true, "type": "string", "analyzer": "snowball"}}}}}
BULK INDEX movie_db/movie
{"_id": 1, "title": "Hackers", "release_year": 1995, "genre": ["Action", "Crime", "Drama"], "actors": ["Johnny Lee Miller", "Angelina Jolie"], "description": "High-school age computer expert Zero Cool and his hacker friends take on an evil corporation's computer virus with their hacking skills."}
{"_id": 2, "title": "Johnny Mnemonic", "release": 1995, "genre": ["Science Fiction", "Action"], "actors": ["Keanu Reeves", "Dolph Lundgren"], "description": "A guy with a chip in his head shouts incomprehensibly about room service in this dystopian vision of our future."}
{"_id": 3, "title": "Swordfish", "release_year": 2001, "genre": ["Action", "Crime"], "actors": ["John Travolta", "Hugh Jackman", "Halle Berry"], "description": "A cast of characters challenge society's commonly held view that computer experts are not the beautiful people. Somehow, the CIA is hacked in under 5 minutes."}
{"_id": 4, "title": "Tomb Raider", "release_year": 2001, "genre": ["Adventure", "Action", "Fantasy"], "actors": ["Angelina Jolie", "Jon Voigt"], "description": "The story of a girl and her quest for antiquities in the face of adversity. This epic is adapter from its traditional video-game format to the big screen"}
Query:
{
"query" :
{
"term" : { "genre" : "Crime" }
},
}
Results:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
"_index": "movie_db",
"_type": "movie",
"_id": "3",
"_score": 0.30685282,
"_source": {}
},
{
"_index": "movie_db",
"_type": "movie",
"_id": "1",
"_score": 0.30685282,
"_source": {}
}
]
}
}
I had the same problem: despite enabling _source in my query as well as in my mappings, _source would always be {}.
Your proposed solution of setting cluster.name in elasticsearch.yml gave me the hint that the problem must be some hidden setting in the old cluster.
I found out that I had an index template definition that came with a plugin I installed (in my case elasticsearch-transport-couchbase), which said
"_source" : {
"includes" : [ "meta.*" ]
},
thereby implicitely excluding all fields other than meta.* from source.
Check your templates like this:
curl -XGET localhost:9200/_template/?pretty
I deleted the couchbase template like so
curl -XDELETE localhost:9200/_template/couchbase
and created a new, almost identical one but with source enabled.
Here is how:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
Solution:
In elasticsearch config folder, open elasticsearch.yml and set cluster.name to a different value, then restart elasticsearch.bat
I once accidentally passed a single field in source array and that too didn't exist. Just for example "_source": ["bazinga"] and in the aggregations result source was empty.
So maybe you could simple pass a totally unrelated string into the _source array. This can be a better solution instead of making changes in the elasticsearch.yml file.