Group by field in found document - elasticsearch

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?

You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Related

Search inside _id field Elasticsearch

recently I made a change to the way ids were being generated in my ES index. Previously, we were generating the ids in the code, using a format like: uuid_WEEKDAY_COUNTRY_TIMESTAMP
I removed this and instead let the value of this field be auto-generated by ES (as i guess it should be)
How can i write a query that checks none of the old-format ids are still being generated? I tried something like
GET /_search
{
"query": {
"query_string": {
"query": "*WEDNESDAY*",
"default_field": "_id"
}
}
}
But got errors saying i can't query _id field, only text or keyword
how can i do this otherwise?
thanks
The _id field is special field handled in elastic search as the ID of the document. It is not indexed field like other text fields, though we can set the value , for documents where we do not specify this field it is actually "generated" based on the UID of the document (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html 2.8k).
The drop side of this is that , this field only supports a limited subset of the query functionality. One way to get over this is to add a field called id_field (as a text / keyword) into the document itself and then term queries on this field

Search for parents only (with joins)

I'm relatively new to elasticsearch and I'm trying to follow this documentation page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
It is mentioned here that a plain match_all query will return everything, both the "parents" and the "children" - https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html#_searching_with_parent_join .
What isn't mentioned there (maybe because it's basic knowledge) is how do you get the parents only? I simply want to get all the parents, without the children.
How would a query like that look like?
I believe that you have already known the relationship of your join when you do the mappings. Let's use the example from the documentation, you know that the parent will be question, and the answer will be children. You simply need to query your join field to be question only:
{
"query": {
"term": {
"my_join_field": "question"
}
},
"sort": ["my_id"]
}
If you use this query, only the documents with my_join_field of question will be returned (and they are your parent documents). If you want only the children, you could do the same (my_join_field will be answer).

Type of field for prefix search in Elastic Search

I'm confused on what index type I should apply for my field for prefix search, many show search_as_you_type but I think auto complete is not what I'm going for.
I have a UUID field:
id: 34y72ca1-3739-41ff-bbec-f6d17479384c
The following terms should return the doc above:
3
34
34y72ca1
34y72ca1-3739
34y72ca1-3739-41ff-bbec-f6d17479384c
Using 3739 should not return it as it doesn't start with 3739. Initially this is what I was going for but then the wildcard field is not supported by Amazon AWS, so I compromise for prefix search instead of partial search.
I tried search_as_you_type field but it doesn't return the result when I use the whole ID. Actually, my use case is when user click enter, the results will be shown, instead of real-live when they type, so if speed is compromised its OK, just that I hope for something that will be good for many rows of data.
Thanks
If you have not explicitly defined any index mapping, then you need to use id.keyword field instead of the id field for the prefix query to show the appropriate results. This uses the keyword analyzer instead of the standard analyzer
{
"query": {
"prefix": {
"id.keyword": {
"value": "34y72ca1"
}
}
}
}
Otherwise, you can modify your index mapping, by adding multi fields for id field

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

Relative Performance of ElasticSearch on inner fields vs outer fields

All other things being equal, including indexing, I'm wondering if it is more performant to search on fields closer to the root of the document.
For example, lets say we have a document with a customer ID. Two ways to store this:
{
"customer_id": "xyz"
}
and
{
"customer": {
"id": "xyz"
}
}
Will it be any slower to search for documents where "customer.id = 'xyq'" than to search for documents where "customer_id = 'xyz'" ?
That's pure syntactic sugar. The second form, i.e. using object type, will be flattened out and internally stored as
"customer.id": "xyz"
Hence, both forms you described are semantically equivalent as far as what gets indexed into ES, i.e.:
"customer_id": "xyz"
"customer.id": "xyz"

Resources