Elasticsearch: can you use the results of aggregates in the same search? - elasticsearch

we have an Elasticsearch index with about 50000 "product" entries per user over which our app performs complex queries. Each of those entries has a corresponding "supplier" and "supervisor". The suppliers and supervisors are stored in their own indices, and there are only ~200 of each per user. They are big documents, so in the product index we store just their name and ID, which are the only things used in queries over products. However, on each product query we would also like to return aggregate information about suppliers and supervisors. Example: if the query returns 800 products and they have 10 different suppliers and 12 different supervisors, I want information on those. I know how to use bucket aggregates over their IDs, (or names treated as keyword). However these results return only their ID or name. Is there any way to retrieve all the information from the supplier and supervisor documents using these aggregate IDs on the same query? Or do I have to perform a second query?

Related

Fast fulltext comparation of two databases

I have 2 databases with product data. Data in both presented in third normal form and tables have the following fields:
id, FullName, AttributeName, AttributeValue
So, there are many rows (attributes) for every id (product).
I need to find relevant products (with relevance value) from first DB for every product from second DB. Comparation should be structed (I need to compare both names and attributes).
Comparation by FullName and AttributeName (both are strings) between two products should be performed using fulltext search or some kind of fuzzy comparation (may be some embeddings).
I have tens of millions products in first database and millions of products in second. Products could be added or deleted from both databases. If we had new product in first database, we need to calculate relevance of every product in second database with it, and if we had new product in second one we could perform search query on all records in first one.
Because of number of products, I look towards fulltext search engines like Sphinx, ElasticSearch of Apache Solr.
But question is could I calculate relevance of all products in second DB with some new products in first DB not performing "bruteforce querying" (perform search using every product from second DB as query)? May be there is some "inverted relevance search" in such engines, or some else engine.
I use Python as a programming language in my system, so engine should have API I could use from Python.
More than a month late, but if you are still on this, maybe you can check this - Manticore Percolate
I am not sure if I understand your question properly.

In Elasticsearch, how can I retrieve products grouped by the store that sells them?

I've got a bunch of stores, each of which sells several products, and those products have descriptions. I would like to build a search experience where the user can search for products by words in the description, and have a search result page where matching products are shown, grouped by the store that sells them. My question is:
How can I design an efficient Elasticsearch schema and query scheme that will let me query for products with the results grouped by store, with the guarantee that every store in the search results contains a complete list of items that match the query?
For instance, suppose I had the following data:
Store 1
Product 1a, description: "Peanut butter and jelly sandwich"
Product 1b, description: "Taco"
Product 1c, description: "Sandwich holder"
Store 2
Product 2a, description: "Burrito bowl"
Store 3
Product 3a, description: "Sandwich maker"
Product 3b, description: "Sandwich bread"
Product 3c, description: "Salad tongs"
In my overall application, I want a query for "sandwich" to return something like:
Store 1
product 1a
product 1c
Store 3
product 3a
product 3b
Whenever I show a store, I always want to show all hits for that store. In the domain I'm working in, there are lots of stores but each store only has a small number of products (max of around 10-20, with most stores only having 2 or 3).
I can see two ways to implement this, and both seem bad to me.
Approach #1
Index each product is a separate document. Then at query time, I could fetch every matching document and post-process them in Java to group them by store, and finally return that result. The problems I see with this approach are:
I can't use any kind of ranking, since I'm going to re-sort the results.
I also can't do any limiting; I have to fetch every single document, no matter how many there may be, since otherwise I can't guarantee that I have every product for a particular store. This will result in lots of wasted work.
Approach #2
Index each store as a separate document, with a nested field holding each product. At query time, I could retrieve stores where the product description nested field has a match on the search term. Then, once I have the stores I want to show, I'd have to run a separate query to fetch the matching products from those stores. The problems with this approach are:
I'm asking elasticsearch to do more work than necessary; internally, it had find everything I needed in the first query, but I'm asking a second query anyway
Issuing two related queries complicates the code and requires me to keep two queries in sync (e.g. I need to make sure that the documents matched in query 1 as subfields are the same documents that query 2 matches)
Can anyone more experienced with Elasticsearch than I am see a better option?
With Approach#2 I see 2 options:
Nested inner hits.
You could use top_hits with reverse_nested aggregator. You'll search for the products in query and you'll group the docs by store in the aggregator. The top_hits aggregation returns regular search hits meaning you'll get the children(products) along with the parent(store).

How to pre-filter data before execute ElasticSearch query?

I'm going to use MsSqlServer and Elastic together. MSSQL is the main DB and Elastic is the search db.
My db struct is look like this:
User (id, name, companyId) - 120k records
Company (id, name) - 10k records
Company_matrix (parent_company_id, child_company_Id) - 20k records
Company_share_data (company_id, share_to_company_Id) - 50k records
Product (id, name, allow_to_be_shared, companyId) - 1m records
I will create elastic index for product and company table to improve the search speed. A logged in user that have the companyId = X can see all products that have companyid = X, or all products of parent companies defined in Company_matrix, or all product shared by other companies from Company_share_data but allow_to_be_shared is not equal to false.
This query is very simple in SQL server, but a pain in a nosql db like elasticsearch. Is there the way to search the list of "available product ids" in sqlserver, then pass it to elasticsearch and mix them with the user's search conditions? Or any better idea?
Thanks
If I correctly understood you want to see all the documents in Elasticsearch based on some criteria. If this is the case you can use BoolQuery in Elasticsearch.
If this is a pain to you then the other way is to get the list of product Ids from your SQLQuery and use(pass) the Ids into Terms Query in Elasticsearch.
Hope this is helpful.

Elasticsearch extract/add id's from multiple queries

I have multiple queries that need to filter data on elasticsearch. This queries are returning document ids from indexes that match the filter.
However i need to do another operation depending from user selection, to extract/add document unique id's from previous sum of queries with current query. The maximum number of query search is 5.
Is there an option in elastic so it will extract/add document id's from previous query? Right now i am doing this part in PHP with foreach iteration that takes a lot of time.
Edit
Example :
Ok let say we have one query on same index that contains :
{"query":{"bool":{"filter":[{"wildcard":{"182_empanalyzed":"example"}}]}}}
we will need to substract the document ids from the following query on same index :
{"query":{"bool":{"must_not":[{"nested":{"path":"184","query":{"exists":{"field":"184.*"}}}}]}}}
Keep in mind that this queries are example with only one condition in it, there might be more complexes queries with many fields to be searched on in each query. And from each following query there is an option to substract/add documents ids

Sort by a different index's values

Given two indexes, I'm trying to sort the first based on values of the second.
For example, Index 1 ('Products') has fields id, name. Index 2 ('Prices') has fields id, price.
Struggling to figure out how to sort 'Products' by the 'Prices'.price, assuming the ids match. Reason for this quest is that hypothetically the 'Products' index becomes very large (with duplicate ids), and updating all documents becomes expensive.
Elasticsearch is a document based store, rather than a column based store. What you're looking for is a way to JOIN the two indices, however this is not supported in Elasticsearch. The 'Elasticsearch way' of storing these documents is to have 1 index that contains all relevant data. If you're worried about update procedures taking very long, look into creating an index with an Alias. When you need to do a major update, do it to a new index and only when you're done switch the alias target to the new index, this will allow you to update you data seamlessly

Resources