Bulk action to remove values in Elasticsearch - elasticsearch

I have these documents:
curl -XPOST -H "Content-Type: application/json" "http://localhost:9200/jerewan/product/_search?pretty" -d'
{
{"index":{"_index":"test","_type":"product"}}
{"id_product":"1", "categories":[1,2], "enums":[10,11,20,21] }
{"index":{"_index":"test","_type":"product"}}
{"id_product":"2", "categories":[1,2], "enums":[10,15,25,26] }
{"index":{"_index":"test","_type":"product"}}
{"id_product":"3", "categories":[2,3], "enums":[11,12,13,21,22,23,24] }
}
I need to remove numbers 10 and 11 in enums field in all documents with bulk action.
So, output will be after this update:
{"id_product":"1", "categories":[1,2], "enums":[20,21] }
{"id_product":"2", "categories":[1,2], "enums":[15,25,26] }
{"id_product":"3", "categories":[2,3], "enums":[12,13,21,22,23,24]
Is it possible to do it?

Yes, it is possible.
curl -XPOST -H "Content-Type: application/json" "http://localhost:9200/jerewan/product/_search?pretty" -d'
{
{ "update" : {"_id" : "1", "_type" : "product", "_index" : "test"} }
{ "doc" : {"enums" : [20,21]} }
{ "update" : {"_id" : "2", "_type" : "product", "_index" : "test"} }
{ "doc" : {"enums" : [15,25, 26]} }
}
using this you can update for further read bulk api

Related

Elasticsearch Bulk API using curl and text file

I'm a beginner with Elasticsearch and am following an "Essential Training" in LinkedIn Learning. I'm trying to follow with bulk loading API and the instructor is using Linux, I'm on Windows. He created a text file to read in with data using "VI". I just created a text file and pasted the data and removed the ".txt". The contents of the file, called reqs, is this:
{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"1"
}
}{
"col1":"val1"
}{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"2"
}
}{
"col1":"val2"
}{
"index":{
"_index":"my-test",
"_type":"my-type",
"_id":"3"
}
}{
"col1":"val3"
}
I've tried saving it with a carriage return (new line) after the last line and without. I saved this into my elasticsearch folder (C:\elasticsearch-7.12.0) which is the same directory I'm running the following command from:
c:\elasticsearch-7.12.0>curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "#reqs"; echo
When I do this, I'm getting the following error:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
Use this below curl command
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/index-name/_bulk?pretty' --data-binary #reqs.json
reqs.json should look like this
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "1"}}
{"col1" : "val1"}
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "2"}}
{"col1" : "val2"}
{"index" : {"_index" : "my-test", "_type" : "my-type", "_id" : "3"}}
{"col1" : "val3"}

Elasticsearch completion : strange behavior when multiple matches per document

When I use the completion type inside a suggest as described in the ElasticSearch documentation (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-completion.html), I do not manage to get all the matching words (I only get one matching word per document)
I test the following commands on my ElasticSearch 6.7.2 (which is the latest available on AWS at this moment) :
Deleting the index in case it exists
curl http://localhost:9200/test -H 'Content-Type: application/json' -X DELETE
Creating the index
curl http://localhost:9200/test -H 'Content-Type: application/json' -X PUT -d '
{
"mappings": {
"page": {
"properties": {
"completion_terms": {
"type": "completion"
}
}
}
}
}
'
Indexing a document
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json' -X PUT -d '
{
"completion_terms": ["restaurant", "restauration", "réseau"]
}'
Check the document exists
curl http://localhost:9200/test/_doc/1 -H 'Content-Type: application/json'
Use the completion
curl -X GET "localhost:9200/test/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"_source": ["suggestExact"],
"suggest": {
"suggestExact" : {
"prefix" : "res",
"completion" : {
"field" : "completion_terms"
}
}
}
}
'
The result is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : 0.0,
"hits" : [ ]
},
"suggest" : {
"suggestExact" : [
{
"text" : "res",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "restaurant",
"_index" : "test",
"_type" : "page",
"_id" : "1",
"_score" : 1.0,
"_source" : { }
}
]
}
]
}
}
I'd like to get ALL the matching words (here, I get at most one result per document)
In the example, "restauration" and "réseau" are missing
Am I doing something wrong ?
After many searches, I found that this is the intended behavior (that is to "suggest documents", instead of "suggest terms")
Especially, see https://github.com/elastic/elasticsearch/issues/31738
However, I still do not manage to achieve "suggest terms" even with the term suggester which seems to be the correct way (https://www.elastic.co/guide/en/elasticsearch/reference/6.7/search-suggesters-term.html)

Return document on update elasticsearch

Lets say I'm updating user data
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"fields": ["_source"]
}'
Heres an example of what I'm getting back when I perform an update
{
"_index" : "test",
"_type" : "type1",
"_id" : "1",
"_version" : 4
}
How do I perform an update that returns the given document post update?
The documentation is a little misleading with regards to returning fields when performing an Elasticsearch update. It actually uses the same approach that the Index api uses, passing the parameter on the url, not as a field in the update.
In your case you would submit:
curl -XPOST 'localhost:9200/test/type1/1/_update?fields=_source' -d '{
"doc" : {
"name" : "new_name"
}
}'
In my testing in Elasticsearch 1.2.1 it returns something like this:
{
"_index":"test",
"_type":"testtype",
"_id":"1","_version":9,
"get": {
"found":true,
"_source": {
"user":"john",
"body":"testing update and return fields",
"name":"new_name"
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

Elasticsearch TermsFacet giving wrong count

I had problem with Elasticsearch Term Facet
i put data as follows :
curl -X DELETE "http://localhost:9200/articles'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : "foo","datetime":"2005-12-23 23:10:52"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : "bar","datetime":"2005-12-23 23:10:53"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : "baz","datetime":"2005-12-23 23:10:54"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "four", "tags" : "baz","datetime":"2005-12-23 23:10:55"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "five", "tags" : "foo","datetime":"2005-12-23 23:10:56"}'
so whenever i query for terms facet it gives correct result following is my Elasticsearch query :
curl 'http://localhost:9200/articles/article/_search?pretty=true' -d '{
"query": {
"match_all": {}
},
"facets" : { "myfacet" : { "terms" : {"field" : "tags"}}
}
}'
But, when i added filter to Facet it won't show any facet count following is query :
curl 'http://localhost:9200/articles/article/_search?pretty=true' -d '{
"query": {
"match_all": {}
},
"facets" : {
"myfacet" : { "terms" : {"field" : "tags"},
"filter" : { "range" :{
"datetime" : {"from" : "2005-12-23 3:10:52","to" : "2005-12-23 23:10:56" }
}
}
}
}
}'
I get result as follows
facets" : {
"myfacet" : {
"_type" : "filter",
"count" : 0
}
}
so, anyone know's why it is giving such count.
The dates are in an invalid format, have a look at the supported date time formats that elasticsearch supports (too long, don't read, any date that is supported by jodatime is supported by elasticsearch).
http://www.elasticsearch.org/guide/reference/mapping/date-format.html
With that being said, you just have to modify your dates in your insert statements and put them in a valid date format, like 2005-12-23T23:10:55Z. Then just change your query to the proper time range in that time format, and that should give you the result.
Also be careful when writing these queries, as I noticed the date you used in your from clause is not valid.
Here are the modified curl scripts:
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "One", "tags" : "foo","datetime":"2005-12-23T23:10:52Z"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Two", "tags" : "bar","datetime":"2005-12-23T23:10:53Z"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "Three", "tags" : "baz","datetime":"2005-12-23T23:10:54Z"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "four", "tags" : "baz","datetime":"2005-12-23T23:10:55Z"}'
curl -X POST "http://localhost:9200/articles/article" -d '{"title" : "five", "tags" : "foo","datetime":"2005-12-23T23:10:56Z"}'
and the modified search:
curl 'http://localhost:9200/articles/article/_search?pretty=true' -d '{
"query": {
"match_all": {}
},
"facets" : {
"myfacet" : {
"terms" : {"field" : "tags"},
"filter" : { "range" :{
"datetime" : {
"from" : "2005-12-23T23:10:52Z",
"to" : "2005-12-23T23:10:54Z"
}
}
}
}
}
}'
Hope this helps,
Matt

Sort the basic of Number of documents in ElasticSearch

I am saving user relations in ES Index
i.e
{'id' => 1, 'User_id_1' => '2001', 'relation' => 'friend', 'User_id_2' => '1002'}
{'id' => 2, 'User_id_1' => '2002', 'relation' => 'friend', 'User_id_2' => '1002'}
{'id' => 3, 'User_id_1' => '2002', 'relation' => 'friend', 'User_id_2' => '1001'}
{'id' => 4, 'User_id_1' => '2003', 'relation' => 'friend', 'User_id_2' => '1003'}
no suppose i want to get the user_id_2 who has most friends,
in above case its 1002 as 2001, and 2002 are its friends. (Count = 2)
I just can't figure out the query
Thanks.
EDIT:
Well as suggested by #imotov, term facets is very good choice, but
The problem I have is 2 Indexes
1st index is for saving the main docs and 2nd index for saving the relations
now problem is
Suppose I have 100 USER Docs in my main index, only 50 of them has made relations, so I'll have only 50 USER Docs in my relationship index
So when i implement the "term facet", it sorts the results and gives the correct output i want, but I am missing those left 50 users who don't have any relations yet, i need them in my final output after the 50 sorted users.
First of all, we need to ensure that relationships saved in ES are unique. It can be done by replacing arbitrary ids with ids constructed from user_id_1, relation and user_id_2. We also need to make sure that analyzer for user_ids doesn't produce multiple tokens. If ids are strings, they have to be indexed not_analyzed. With these two conditions satisfied, we can simply use terms facet query for the field user_id_2 on the result list limited by relation:friend. This query will retrieve top user_id_2 ids sorted by number of occurrences in the index. All together it could look something like this:
curl -XPUT http://localhost:9200/relationships -d '{
"mappings" : {
"relation" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id_1": { "type": "string", "index" : "not_analyzed"},
"relation": { "type": "string", "index" : "not_analyzed"},
"user_id_2": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
echo
curl -XGET 'http://localhost:9200/relationships/relation/_search?pretty=true&search_type=count' -d '{
"query": {
"term" : {
"relation" : "friend"
}
},
"facets" : {
"popular" : {
"terms" : {
"field" : "user_id_2"
}
}
}
}'
Please, note that due to distributed nature of facets calculation, counts reported by the facet query might be lower than the actual number of records if multiple shards are used. See elasticsearch issue 1832
EDIT:
There are two solutions for the edited question. One solution is to use facet on two fields:
curl -XPUT http://localhost:9200/relationships -d '{
"mappings" : {
"relation" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id_1": { "type": "string", "index" : "not_analyzed"},
"relation": { "type": "string", "index" : "not_analyzed"},
"user_id_2": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/users -d '{
"mappings" : {
"user" : {
"_source" : {"enabled" : false },
"properties" : {
"user_id": { "type": "string", "index" : "not_analyzed"}
}
}
}
}'
curl -XPUT http://localhost:9200/users/user/1001 -d '{"user_id": 1001}'
curl -XPUT http://localhost:9200/users/user/1002 -d '{"user_id": 1002}'
curl -XPUT http://localhost:9200/users/user/1003 -d '{"user_id": 1003}'
curl -XPUT http://localhost:9200/users/user/1004 -d '{"user_id": 1004}'
curl -XPUT http://localhost:9200/users/user/1005 -d '{"user_id": 1005}'
curl -XPUT http://localhost:9200/relationships/relation/2001-friend-1002 -d '{"user_id_1": "2001", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1002 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1002"}'
curl -XPUT http://localhost:9200/relationships/relation/2002-friend-1001 -d '{"user_id_1": "2002", "relation":"friend", "user_id_2": "1001"}'
curl -XPUT http://localhost:9200/relationships/relation/2003-friend-1003 -d '{"user_id_1": "2003", "relation":"friend", "user_id_2": "1003"}'
curl -XPOST http://localhost:9200/relationships/_refresh
curl -XPOST http://localhost:9200/users/_refresh
echo
curl -XGET 'http://localhost:9200/relationships,users/_search?pretty=true&search_type=count' -d '{
"query": {
"indices" : {
"indices" : ["relationships"],
"query" : {
"filtered" : {
"query" : {
"term" : {
"relation" : "friend"
}
},
"filter" : {
"type" : {
"value" : "relation"
}
}
}
},
"no_match_query" : {
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"type" : {
"value" : "user"
}
}
}
}
}
},
"facets" : {
"popular" : {
"terms" : {
"fields" : ["user_id", "user_id_2"]
}
}
}
}'
Another solution is to add "self" relation to the relationships index for every user when user is created. I would prefer the second solution since it seems to be less complicated.

Resources