can terms lookup mechanism query be nested - elasticsearch

I want to know can I nest a terms lookup mechanism query in anther terms lookup mechanism.
For instance:
curl -XPUT localhost:9200/users/user/2 -d '{
"tweets" : ["1", "3"]
}'
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":["1","2","3"]
}'
curl -XPUT localhost:9200/comments/comment/1 -d '{
"uuid" : "1"
}'
As you know, we can use a terms lookup mechanism query to get tweets which belong to the user:
curl -XGET localhost:9200/tweets/tweet/_search -d'{
"query" : {
"terms" : {
"uuid" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "tweets"
}
}
}
}'
But if i want to get comments, i must do anther query.
However my documents is so many, it is not a good method.
So i want to nest terms lookup query in order to get comments in only one query by user's id, can i?
I will so appreciate it, if you can give me some help. Thank you! :)

At the moment, this is not possible as far as I know, because you expect data from three different indices to be returned in one query, which would equate to a JOIN. The terms lookup query sort of implements JOINs between two indices "only" (which is already quite cool considering the fact that ES does not want to support JOINs in the first place).
One way out of this would be to refactor your data model to get rid of the comments index and use either parent/child and/or nested relationships within the tweet mapping type. Since a comment can only belong to a single tweet and there aren't usually hundreds of comments on a tweet (I'm pretty confortable with the idea that 99% of the time there are less than half a dozen comments per tweet, if any at all), you could add comments either as a child documents or as a nested document (my preference), instead of just storing their ids in the comments array. That way you'd get your comments right away with your existing query, without the need for a second query.
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":[{
"id": 1,
"content": "Nice tweet!"
},{
"id": 2,
"content": "Way to go!"
},{
"id": 3,
"content": "Sucks!"
}]
}'
Or you can wait for this pull request (#3278) (Terms Lookup by Query/Filter (aka. Join Filter)) to be merged, which will effectively allow to do what you're asking for, but that PR has been created more than 2 years ago and there still are conflicts to be resolved.

Related

How do I search by java enums

I have data stored in elastic search. One of the fields is logging level. These are defined in Java enum.
The enums are :
0 => undefined
1 => info
2 => low
3 => high
4 => fatal
EDIT:
This is what I am trying, but keep getting Variable [level] is not defined error.
curl -H 'Content-Type: application/json' "http://localhost:33206/_search" -d'
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"source": "params.mapping[doc['level'].value]",
"params" : {
"UNDEFINED": 0,
"INFO": 1,
"LOW": 2,
"HIGH": 3,
"FATAL": 4
}
},
"order" : "asc"
}
}
}
'
In elastic search we are storing Strings rather than the number.
If I wanted to query elastic search and have it ordered by the corresponding numbers, how do I do that? Of course sorting by string will produce wrong results.
It is not possible and not recommended with scripting as it is not good from performance perspective.
You should have a separate field where you need to store the integer value and sort it.
Reasons not to have scripts:
If possible, avoid using scripts or scripted fields in searches.
Because scripts can’t make use of index structures, using scripts in
search queries can result in slower search speeds.
If you often use scripts to transform indexed data, you can speed up
search by making these changes during ingest instead. However, that
often means slower index speeds.
And one more thing is security. There are loopholes which makes it vulnerable.
Reference
My script was correct, except for the single quote around "level". Changing it to double quotes makes it work.

Elastic Search - Sort by multiple fields with the missing parameter

I am trying to apply a sort to an Elastic Search query by two different fields:
price_sold and price_list
I would like to first sort on price_sold, but if that value is null, I would like to then sort by price_list
Would the query be correct if I just set the sorts to:
"sort": [
{ "price_sold": { "order": "desc"}},
{ "price_list": { "order": "desc"}}
]
I have executed the query, and I do not get any errors, and it seems like the results are correct, however I am curious if I have overlooked something.
I have been reading about the missing filter, along with possibly using a custom value. This may not be required, but I am not quite sure.
Would there be a way to define a second field to sort on if the first field is missing, or is that not necessary? Something like:
"sort": [{"price_sold: {"order": "desc", "missing": "doc['field_name']"}]
Would simply adding these two sorts give me the desired result?
Thanks.
I think I understand what you're asking. In SQL terms, you'd like to ORDER BY COALESCE(price_sold, price_list) DESC.
The first sort you listed is a little different. It's similar to ORDER BY price_sold DESC, price_list DESC - in other words, primary sort is by price_sold, and for entries where price_sold is equal, secondary sort is by price_list.
Your second sort attempt would be great if "missing" worked that way. Unfortunately, missing's "custom" option appears to allow you to specify a constant value only.
If you don't need to limit your search using from and size, you should be able to use sort's _script option to write some logic that works for you. I ended up here because I do use from and size to retrieve batches, and when I sort by _script, the items I'm getting don't make sense - the items are sorted correctly, but I'm not getting the right set of items. So, I added a new analyzer and expanded my fields to use the new analyzer, and I was hoping to be able to sort using the new field or, if the new field doesn't exist (for previously-indexed items), use the old field's value instead. But that doesn't seem to be possible. I think I'm going to have to reindex my items so my new field is populated.
In case someone is still looking I ended up creating a script similar to this:
curl -XGET 'localhost:9200/_search?pretty&size=10&from=0' -H 'Content-Type: application/json' -d'
{
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"inline": "doc[\u0027price_sold\u0027] == null ? doc[\u0027price_list\u0027].value : doc[\u0027price_sold\u0027].value"
},
"order" : "desc"
}
},
}
'
For sorting dates, the type still has to remain number but you replace .value with .date.getMillisOfDay() as discussed here.
The from and size worked fine in my version of ElasticSearch (5.1.1).
To make sure your algorithm is working fine check the generated value in the response, e.g.: "sort" : [ 5.0622E7 ].

can terms lookup mechanism query by other field but id?

here is elasticsearch official website about terms:
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/query-dsl-terms-query.html
As we can see, if we want to do terms lookup mechanism query, we should use command like this:
curl -XGET localhost:9200/tweets/_search -d '{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
}
}
}
}'
But what if i want to do query by other field of users.
Assume that users has some other fields such as name and can i use terms lookup mechanism finding the tweets by giving users name but not id.
I have tried to use command like this:
curl -XGET localhost:9200/tweets/_search -d '{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"name" : "Jane",
"path" : "followers"
}
}
}
}'
but it occurs error.
Looking forward to your help. Thank you!
The terms lookup mechanism is basically a built-in optimization to not have to make two queries to JOIN two indices, i.e. one in index A to get the ids to lookup and a second to fetch the documents with those ids in index B.
In contrary to SQL, such a JOIN can only work on the id field since this is the only way to uniquely retrieve a document from Elasticsearch via a GET call, which is exactly what Elasticsearch will do in the terms lookup.
So to answer your question, the terms lookup mechanism will not work on any other field than the id field since the first document to be retrieved must be unique. In your case, ES would not know how to fetch the document for the user with name Jane since name is just a field present in the user document, but in no way a unique identifier for user Jane.
I think you did not understand exactly how this works. Terms lookup query works by reading values from a field of a document with the given id. In this case, you are trying to match the value of field user in tweets index with values of field followers in document with id "2" present in users index and user type.
If you want to read from any other field then simply mention that in "path".
What you mainly need to understand is that the lookup values are all fetched from a field of a single document and not multiple documents.

Substring and similarity matching in elasticsearch

I am learning to use elastisearch as alternative for database queries and I am not able to perform substring matches on the built index.
The mapping I have used to create index is
"mappings" : {
"user" : {
"properties" : {
"name" : {"type": "string"},
"specialty" : {"type": "string" ,"analyzer":"snowball"},
"address : {"type": "string" ,"analyzer":"snowball"}
}
}
}
The document I am indexing is
{
"name" : "John Doe",
"speciality": ["pediatrician","Child Doctor"],
"address": ["#123 park road Abbeyville","#423 park road AbbeyTown" ]
}
When I perform a search like
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatrician
I get the correct document
However when I search strings like
curl -XGET localhost:9200/test/user/_search?q=speciality:pedia
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatricians
No results are retrieved
P.S I know that wild cards can be used for matching but I need to be able to search for both the whole word and parts of the words based on user input so as to return the most relevant documents.
Did you try reindexing after changing the mapping? Also try setting the search analyzer to snowball in the settings.
UPDATE:
You can go for wild card search. Better go for trailing wild card search alone instead of both leading and trailing wild card search.
curl -XGET localhost:9200/test/user/_search?q=speciality:pedia*
curl -XGET localhost:9200/test/user/_search?q=speciality:pediatricians*

Queries vs. Filters

I can't see any description of when I should use a query or a filter or some combination of the two. What is the difference between them? Can anyone please explain?
The difference is simple: filters are cached and don't influence the score, therefore faster than queries. Have a look here too. Let's say a query is usually something that the users type and pretty much unpredictable, while filters help users narrowing down the search results , for example using facets.
This is what official documentation says:
As a general rule, filters should be used instead of queries:
for binary yes/no searches
for queries on exact values
As a general rule, queries should be used instead of filters:
for full text search
where the result depends on a relevance score
An example (try it yourself)
Say index myindex contains three documents:
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world!" }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hello world! I am Sam." }'
curl -XPOST localhost:9200/myindex/mytype -d '{ "msg": "Hi Stack Overflow!" }'
Query: How well a document matches the query
Query hello sam (using keyword must)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "must": { "match": { "msg": "hello sam" }}}}
}'
Document "Hello world! I am Sam." is assigned a higher score than "Hello world!", because the former matches both words in the query. Documents are scored.
"hits" : [
...
"_score" : 0.74487394,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
"_score" : 0.22108285,
"_source" : {
"name" : "Hello world!"
}
...
Filter: Whether a document matches the query
Filter hello sam (using keyword filter)
curl localhost:9200/myindex/_search?pretty -d '
{
"query": { "bool": { "filter": { "match": { "msg": "hello sam" }}}}
}'
Documents that contain either hello or sam are returned. Documents are NOT scored.
"hits" : [
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world!"
}
...
"_score" : 0.0,
"_source" : {
"name" : "Hello world! I am Sam."
}
...
Unless you need full text search or scoring, filters are preferred because frequently used filters will be cached automatically by Elasticsearch, to speed up performance. See Elasticsearch: Query and filter context.
Filters -> Does this document match? a binary yes or no answer
Queries -> Does this document match? How well does it match? uses scoring
Few more addition to the same.
A filter is applied first and then the query is processed over its results. To store the binary true/false match per document , something called a bitSet Array is used.
This BitSet array is in memory and this would be used from second time the filter is queried. This way , using bitset array data-structure , we are able to utilize the cached result.
One more point to note here , the filter cache is created only when the request is executed hence only from the second hit , we actually get the advantage of caching.
But then you can use warmer API , to outgrow this. When you register a query with filter against a warmer API , it will make sure that this is executed against a new segment whenever it comes live. Hence we will get consistent speed from the first execution itself.
Basically, a query is used when you want to perform a search on your documents with scoring.
And filters are used to narrow down the set of results obtained by using query. Filters are boolean.
For example say you have an index of restaurants something like zomato.
Now you want to search for restaurants that serve 'pizza', which is basically your search keyword.
So you will use query to find all the documents containing "pizza" and some results will obtained.
Say now you want list of restaurant that serves pizza and has rating of atleast 4.0.
So what you will have to do is use the keyword "pizza" in your query and apply the filter for rating as 4.0.
What happens is that filters are usually applied on the results obtained by querying your index.
Since version 2 of Elasticsearch, filters and queries have been merged and any query clause can be used as either a filter or a query (depending on the context). As with version 1, filters are cached and should be used if scoring does not matter.
Source: https://logz.io/blog/elasticsearch-queries/
Queries : calculate score; thus they’re able to return results sorted by relevance.
Filters : don’t calculate score, making them faster and easier to cache.

Resources