Joining two indexes in Elastic Search like a table join - elasticsearch

I am relatively new to this elastic search. So, I have an index called post which contain documents like this:
{
"id": 1,
"link": "https:www.instagram.com/p/XXXXX/",
"profile_id": 11,
"like_count": 100,
"comment_count": 12
}
I have another index called profile which contain documents like this:
{
"id": 11,
"username": "superman",
"name": "Superman",
"followers": 12312
}
So, as you guys can see, I have all profiles data under the index called profile and all posts data under the index called post. The "profile_id" present in the post document is linked with the "id" present in the profile document.
Is there any way, when I am querying the post index and filtering out the post documents the profile data will also appear along with the post document based on the "profile_id" present in the post document? Or somehow fetch the both data doing a multi-index search?
Thank you guys in advance, any help will be appreciated.

For the sake of performance, Elasticsearch encourages you to denormalize your data and model your documents accordingly to the responses you wish to get from your queries. However, in your case, I would suggest defining the relation post-profile by using a Join datatype (link to Elastic documentation) and using the parent-join queries to run your searches (link to Elastic documentation).

Related

How indexing work for dictionary in Elastic Search?

I have an elastic index containing a dictionary in each document.
Docs:
{
"name" : "name1",
"paymentDict":
{
"card1": { "CardType": "Credit", "CardName": "Axis"},
"card2": { "CardType": "Debit", "CardName": "Axis"}
}
}
Dictionary Type: Dictionary<int,object>
I am expecting a good amount of write on this elastic index and want to test the performance aspect and didn't find anything useful in elastic docs explaining explicitly about the dictionary indexing. Need help in below query
How does indexing work for the dictionary?
Is this indexing will be the same as List<object>?
that would be an object in Elasticsearch - https://www.elastic.co/guide/en/elasticsearch/reference/7.14/object.html
you could also make this super simple and just have a document per card, that way you flatten things out

Why does ES recommend to use single mapping per index and doesn't provide any "Join" functionality for this?

As you know, starting from version 6, ElasticSearch team deprecates multiple types per index as well as parent-child relationships. Proof is here
They recommend to use join queries instead of parent-child. But let's look on this join query here. They write:
The join datatype is a special field that creates parent/child
relation within documents of the same index.
They offer to use multiple indexes, restrict their indexes to work with only 1 single mapping _doc, but join query is designed to work only in bounds of the same index.
How to live on? How could I create parent-child relationships for separate indexes?
Example:
Index: "City"
{
"name": "Moscow",
"id": 1
}
Index: "Product"
{
"name": "Shirt",
"city": 1,
"id": 1
}
How could I get that "Shirt" above if I know only "Moscow" city name?

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

how elastic search find document content by doc id

There are many articles talking about inverted index and posting list in elastic search. But I did not find any article which explain that how elastic search find document content by doc id.
Could anyone explain this to me?
thx.
Ragav is correct. However, I do have a bit to add that may help you work with document Ids.
When you index documents that don't have an ID, and ID is generated for you by ElasticSearch. That field name is "_id".
If you know the Id value of the document you wish to find, you can simply perform the query like this:
GET my_index/_search
{
"query": {
"terms": {
"_id": [ "1", "2" ]
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
The above query would return documents that have have _id equal to 1 OR 2.
As Ragav said in his answer, if you created documents in the way described with id 1 or 2, you would return them with that sample query I pulled from the ElasticSearch documentation.
Hope this helps.
Elasticsearch is built on top of Lucene.
When you index a new document onto Elasticsearch, it indexes _index, _type and _id as a part of the document along with the actual content(_source).
So, when you try to get a document using the get API _index/_type/_id, it is basically converted into a query which searches for doc matching the _index, _type and the _id.
This is how Elasticsearch is able to return you the document.

How can I query/filter an elasticsearch index by an array of values?

I have an elasticsearch index with numeric category ids like this:
{
"id": "50958",
"name": "product name",
"description": "product description",
"upc": "00302590602108",
"**categories**": [
"26",
"39"
],
"price": "15.95"
}
I want to be able to pass an array of category ids (a parent id with all of it's children, for example) and return only results that match one of those categories. I have been trying to get it to work with a term query, but no luck yet.
Also, as a new user of elasticsearch, I am wondering if I should use a filter/facet for this...
ANSWERED!
I ended up using a terms query (as opposed to term). I'm still interested in knowing if there would be a benefit to using a filter or facet.
As you already discovered, a termQuery would work. I would suggest a termFilter though, since filters are faster, and cache-able.
Facets won't limit result, but they are excellent tools. They count hits within your total results of specific terms, and be used for faceted navigation.

Resources