Using field instead of "_id" for more-like-this query - elasticsearch

I have a slug field that I want to use to identify object to use as a reference instead of "_id" field. But instead of using it as a reference, doc seems to use it as query to comapre against. Since slug is a unique field with a simple analyzer, it just returns exactly one result like the following. As far as I know, there is no way to use a custom field as _id field:
https://github.com/elastic/elasticsearch/issues/6730
So is double look up, finding out elasticsearch's id first then doing more_like_this the only way to achieve what I am looking for? Someone seems to have asked a similar question three years ago, but it doesn't have an answer.
ArticleDocument.search().query("bool",
should=Q("more_like_this",
fields= ["slug", "text"],
like={"doc": {"slug": "OEXxySDEPWaUfgTT54QvBg",
}, "_index":"article", "_type":"doc"},
min_doc_freq=1,
min_term_freq=1
)
).to_queryset()
Returns:
<ArticleQuerySet [<Article: OEXxySDEPWaUfgTT54QvBg)>]>

You can make some of your documents field as "default" _id while ingesting data.
Logstash
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_name"
document_id => "%{some_field_id}"
}
}
Spark (Scala)
DF.saveToEs("index_name" + "/some_type", Map("es.mapping.id" -> "some_field_id"))
Index API
PUT twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"result" : "created"
}

Related

Get from ElasticSearch why a result is a hit

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.
I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

Querying against an array only works for first element

I am having an issue searching for items within an array in a document; a simple tagging system in my case. I have a relatively simple document representing a recipe. This is a truncated version of the data in the index:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 37,
"title" : "Crab Cakes",
"tags" : [
"seafood",
"appetizer"
]
}
}
When I search for the tag seafood it matches this recipe. However, when I search for the tag appetizer, I get nothing. Here is the explain for a very basic appetizer query:
curl -XGET 'http://localhost:9200/recipes/recipe/37/_explain?pretty' -H 'Content-Type: application/json' -d'{"query":{"term":{"tags":"appetizer"}}}'
Which results in this:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}
Correct answer came in comments from sramalingam24. It was to change the query to a match instead of a term.
[updated]
I also tested switching the tags to be a keyword field and that works as well. This is the solution I ended up going with.

Elasticsearch - How to update document

How does elasticsearch update document? It will delete original document and make new one? I've heard this is how nosql's updating method. does elasticsearch do, same as any other nosql db? or It will replace/insert just field which need to be?
For example, I'm running with Elasticsearh 7.0.0.
First, I created one document,
PUT /employee/_doc/1
{
"first_name" : "John",
"last_name" : "Snow",
"age" : 19,
"about" : "King in the north",
"sex" : "male"
}
Then I updated it via
POST /employee/_update/1/
{
"doc": {
"first_name" : "Aegon",
"last_name" : "Targaryen",
"skill": "fighting and leading"
}
}
Finally, I got correct result when
GET /employee/_doc/1
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_version" : 9,
"_seq_no" : 11,
"_primary_term" : 1,
"found" : true,
"_source" : {
"first_name" : "Aegon",
"last_name" : "Targaryen",
"age" : 19,
"about" : "King in the north",
"sex" : "male",
"skill" : "fighting and leading"
}
}
Document in elasticsearch are immutable object. Updating a document is always a reindexing and it consist of the following steps:
Retrieve the JSON (that you want to reindex)
Change it
Delete the old document
Index a new document
Elasticsearch documentation
For the answer you can check the documentation:
In addition to being able to index and replace documents, we can also
update documents. Note though that Elasticsearch does not actually do
in-place updates under the hood. Whenever we do an update,
Elasticsearch deletes the old document and then indexes a new document
with the update applied to it in one shot.

Search elasticsearch all content from especific source

i want to know if it is possible to search all content from a specific _source in elasticsearch.
for example i have this:
{
"_shards":{
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits":{
"total" : 1,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
]
}
}
i want query all users from source without specifying the name.
for eg: something similar in SQL is like this
SELECT * user from twitter
and with that give all users
thanks and sorry for my bad english
edit:
i want search only for the source.
i give you an example, i have a source who store random word, sometimes store, sometimes not. i want to search for this source only when have new words.
the plan is verify from last 10 minutes if in my specific source have something new, if not, i don't care
You can just:
$ curl -XGET 'http://localhost:9200/twitter/_search'
That by default will return 10 documents. You can sipecify a size:
$ curl -XGET 'http://localhost:9200/twitter/_search?size=BIGNUM'
Or you can use scrool: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

_mget and _search differences on ElasticSearch

I've indexed 2 documents:
As you can see, after having indexed those ones, I can see them in a search result:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_search?pretty'
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2, <<<<<<<<<<<<<<<<
"max_score" : 1.0,
"hits" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1", <<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge1","noteId":"null"},{"resourceId":"idResourceMerge2","noteId":null}]}
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2", <<<<<<<<<<<<<<<<<<
"_score" : 1.0,
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
}
After that, I perform a multiget request setting the document ids:
[root#centos7 ~]# curl 'http://ESNode01:9201/living/fuas/_mget?pretty' -d '
{
"ids": ["idFuaMerge1", "idFuaMerge2"]
}
'
{
"docs" : [ {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge1",
"found" : false <<<<<<<<<<<<<<<<<<<<!!!!!!!!!!!!!!
}, {
"_index" : "living",
"_type" : "fuas",
"_id" : "idFuaMerge2",
"_version" : 4,
"found" : true, <<<<<<<<<<<<<<<!!!!!!!!!!!!!!!!!
"_source":{"timestamp":"2015-10-14T16:13:49.004Z","matter":"null","comment":"null","status":"open","backlogStatus":"unknown","metainfos":[],"resources":[{"resourceId":"idResourceMerge3","noteId":null}]}
} ]
}
How on earth, on a multiget request, the first document is NOT found and the other one does?
This can only happen if you have used routing key to index your document. Or even parent child relation can also imply the same.
When a document is given for indexing , that document is mapped to a unique shard using the mechanism of routing. In this mechanism the docID is converted to a hash and modulas operation of that hash is taken to determine to which shard the document should go.
So in short
for documentA by default the shard might be 1. Default shard is computed based on routing key.
But then because you applied the routing key yourself , this document is mapped to a different shard , tell 0.
Now when you try to get the document without the routing key , it expects the document to be in shard 1 and not shard 0 and hence your multi get fails as it directly looks in shard 1 to get the document.
The search works because search operation happens across all shards/

Resources