Elasticsearch query based on properties of another document - elasticsearch

Is there a way in ES to do a single query that finds documents that are based on values "close" (whose logic I determine) to values in another document?
Example: i have document like this:
{
"myId": 10,
"price": 200
}
Now I want to run a query that finds documents that are within 100 either side of the price of the above document (but I don't know the price of document on the client..all I have is the myId)
In other words, i want to write a client method like this:
GetSimilarDocuments(int myId);
Is that possible to do in a single ES query? Or do I need two round trips? (get the document, then do another query based on the values of the document)

Related

Search inside _id field Elasticsearch

recently I made a change to the way ids were being generated in my ES index. Previously, we were generating the ids in the code, using a format like: uuid_WEEKDAY_COUNTRY_TIMESTAMP
I removed this and instead let the value of this field be auto-generated by ES (as i guess it should be)
How can i write a query that checks none of the old-format ids are still being generated? I tried something like
GET /_search
{
"query": {
"query_string": {
"query": "*WEDNESDAY*",
"default_field": "_id"
}
}
}
But got errors saying i can't query _id field, only text or keyword
how can i do this otherwise?
thanks
The _id field is special field handled in elastic search as the ID of the document. It is not indexed field like other text fields, though we can set the value , for documents where we do not specify this field it is actually "generated" based on the UID of the document (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html 2.8k).
The drop side of this is that , this field only supports a limited subset of the query functionality. One way to get over this is to add a field called id_field (as a text / keyword) into the document itself and then term queries on this field

Avoid ranking all matching documents in elasticsearch search query

I am having Elasticsearch index with multi-millions of documents. I am running a following search query.
POST testIndex/_search?size=200
{
"query": {
"query_string": {
"query": "(title:QA Manager OR title:QA Lead) AND (skills:JIRA OR skills:Software Development OR skills:Test Case)"
}
}
}
Even if we have passed the limit with size=200, it seems Elasticsearch is doing ranking for all the matching documents and bringing the top 200 with the highest rank.
Is there a way we can limit ranking? meaning do ranking on max 1000 matching documents only?
ES will consider your all data for search and ranking that is how Elasticsearch work. What basically do is, It executes your query in 2 phases, one is query and the second is fetch.
In Query Phase, it executes your query in all shades and get document id and score from each shard and return to requesting node. So in your scenario as size is set to 200, it will get 200 documents id from each shard and return to requesting node.
On requesting node, all the document id and score are merged and sorted based on score and select top document based on size param.
In Fetch phase, the actual docs are retrieved from individual shards where they reside based on ID which are selected in Query Phase and Results are returned to the client.
If you don't want to calculate score for some of your query, then you can move that query to the filter clause in bool query.

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

Apollo GraphQl Query get data from two documents

i have a query and i want to go through two different documents, is it possible to do it ?
Document A { _id, senders_ID[sender_id]}
Document B { sender_id, firstname}
// i want to loop through document A and get each sender_id and go to document B to get the firstname.
Is it what you are looking for ?

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Resources