can terms lookup mechanism query by other field but id? - elasticsearch

here is elasticsearch official website about terms:
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/query-dsl-terms-query.html
As we can see, if we want to do terms lookup mechanism query, we should use command like this:
curl -XGET localhost:9200/tweets/_search -d '{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers"
}
}
}
}'
But what if i want to do query by other field of users.
Assume that users has some other fields such as name and can i use terms lookup mechanism finding the tweets by giving users name but not id.
I have tried to use command like this:
curl -XGET localhost:9200/tweets/_search -d '{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"name" : "Jane",
"path" : "followers"
}
}
}
}'
but it occurs error.
Looking forward to your help. Thank you!

The terms lookup mechanism is basically a built-in optimization to not have to make two queries to JOIN two indices, i.e. one in index A to get the ids to lookup and a second to fetch the documents with those ids in index B.
In contrary to SQL, such a JOIN can only work on the id field since this is the only way to uniquely retrieve a document from Elasticsearch via a GET call, which is exactly what Elasticsearch will do in the terms lookup.
So to answer your question, the terms lookup mechanism will not work on any other field than the id field since the first document to be retrieved must be unique. In your case, ES would not know how to fetch the document for the user with name Jane since name is just a field present in the user document, but in no way a unique identifier for user Jane.

I think you did not understand exactly how this works. Terms lookup query works by reading values from a field of a document with the given id. In this case, you are trying to match the value of field user in tweets index with values of field followers in document with id "2" present in users index and user type.
If you want to read from any other field then simply mention that in "path".
What you mainly need to understand is that the lookup values are all fetched from a field of a single document and not multiple documents.

Related

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

elastic search fetch the exact match first followed by others

I am newbie to elastic search
I have an education index in es
index creation
when i search 'btech' with match query as
"match" : { "name" : "btech" }
the result is like
result json object
but i need btech(exact match word) as the first document and remaining documents followed by it.
so for that what i have to change in my index creation
can anybody please help me
You can use term query
"term" : { "name" : "btech" }
Or regexp query
"regexp" : { "name" : "btech" }
You are using text type, make sure to check keyword type too
from documentation
If you need to index structured content such as email addresses,
hostnames, status codes, or tags, it is likely that you should rather
use a keyword field.

can terms lookup mechanism query be nested

I want to know can I nest a terms lookup mechanism query in anther terms lookup mechanism.
For instance:
curl -XPUT localhost:9200/users/user/2 -d '{
"tweets" : ["1", "3"]
}'
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":["1","2","3"]
}'
curl -XPUT localhost:9200/comments/comment/1 -d '{
"uuid" : "1"
}'
As you know, we can use a terms lookup mechanism query to get tweets which belong to the user:
curl -XGET localhost:9200/tweets/tweet/_search -d'{
"query" : {
"terms" : {
"uuid" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "tweets"
}
}
}
}'
But if i want to get comments, i must do anther query.
However my documents is so many, it is not a good method.
So i want to nest terms lookup query in order to get comments in only one query by user's id, can i?
I will so appreciate it, if you can give me some help. Thank you! :)
At the moment, this is not possible as far as I know, because you expect data from three different indices to be returned in one query, which would equate to a JOIN. The terms lookup query sort of implements JOINs between two indices "only" (which is already quite cool considering the fact that ES does not want to support JOINs in the first place).
One way out of this would be to refactor your data model to get rid of the comments index and use either parent/child and/or nested relationships within the tweet mapping type. Since a comment can only belong to a single tweet and there aren't usually hundreds of comments on a tweet (I'm pretty confortable with the idea that 99% of the time there are less than half a dozen comments per tweet, if any at all), you could add comments either as a child documents or as a nested document (my preference), instead of just storing their ids in the comments array. That way you'd get your comments right away with your existing query, without the need for a second query.
curl -XPUT localhost:9200/tweets/tweet/1 -d '{
"uuid" : "1",
"comments":[{
"id": 1,
"content": "Nice tweet!"
},{
"id": 2,
"content": "Way to go!"
},{
"id": 3,
"content": "Sucks!"
}]
}'
Or you can wait for this pull request (#3278) (Terms Lookup by Query/Filter (aka. Join Filter)) to be merged, which will effectively allow to do what you're asking for, but that PR has been created more than 2 years ago and there still are conflicts to be resolved.

Finding fields Elasticsearch has matched on

I am using Elasticsearch to search for a group a user should join. I have the user data nested into the search query. On return I get back the closest matched group that user should be in.
The field I am searching on is a nested field as follows:
`{"interests": [
{"topics":["python", "stackoverflow", "elasticsearch"]},
{"topics":["arts", "textiles"]}
]}`
However if you want an understanding of a match - how do you do this?
Elasticsearch does have an explain function which says what the scoring is made up of using tfidf, but not specifically what terms were used.
For example, if I search for 'Textile', the doc should match on 'textiles'. Thus I want the term 'textiles' to be returned in explain or some other way.
The only way I see that provides this need, is to store the search and the document retrieved and then process both to discover words ES has most likely matched on.
EDIT - for some more clarity of the question
An example in my index of a group which has "interests": ['arts', 'fine arts', 'art painting', 'arts and crafts', 'sports']
Now my search, I am looking for Arts and many other things. Now the term I am searching for comes up in this list many times, thus should always be a contributor.
What I want in the response is to say these words were matched ['arts', 'fine arts', 'art painting', 'arts and crafts']along with the degree to which they match i..e 'arts' should be higher than the others, but all others are also relevant
Elasticsearch allows you to specify the _name field for all queries and
filters. This means that you can separate your query into different parts with
separate names, which will allow you to determine which parts matched.
For example:
{
"query" : {
"bool" : {
"should" : [
{"match" : { "interests.topics" : {"query" : "python", "_name" : "py-topic"} }},
{"match" : { "interests.topics" : {"query" : "arts", "_name" : "arts-topic"} }}
]
}
}
}
Then, in your response, you will get back any array of which queries (or
filters) matched and you can determine if the py-topic query and/or the
arts-topic query matched above.

one-to-many relationships in Elastic Search

Suppose I have 2 tables called "twitter_user" and "twitter_comments".
twitter_users has the fields: username and bio
twitter_comments has the fields: username and comment
Obviously, an user has 1 entry in twitter_users and potentially many in twitter_comments
I want to model both twitter_users and twitter_comments in Elastic Search, have ES search both models when I query, knowing that a comment counts towards the overall relevancy score for a twitter user.
I know I can mimic this with just 1 model, by creating a single extra field (in addition to username and bio) with all the comments concatenated. But is there another "cleaner" way?
It depends.
If you just want to be able to search for a users comments ,full-text and over all fields, simply store all comments within the user object (no need to concatenate anything):
{
"user" : {
"username" : "TestUser",
"bio" : "whatever",
"comments" : [
{
"title" : "First comment",
"text" : "My 1st comment"
},
{
"title" : "Second comment",
"text" : "My 2nd comment"
}
]
}
}
If you need per-comment-based queries you need to map the comments as nested (before submitting any data), so that every comment gets treated as a single item.
For your scoring, simply add another field "comment_count" and use this for your boost/scoring.
As Thorsten already suggested you can use nested query and it's a good approach.
Alternatively, you can index comments as children of users. Then you can can search users as you do now, search comments using top_children query to find all relevant to your search comments, and finally combine scores from both of them together using bool or dis_max queries.
Nested approach would be more efficient during search, but you will have to reindex the user and all comments every time an additional comment is added. With child/parent approach you will need to index only new comments, but search will be slower and it will require more memory.

Resources