Join-like query for multiple indexes - elasticsearch

I have 2 entities stored in separate indexes:
City index has 2 fields mapping: name:keyword and url:text.
Product index has 2 fields mapping: name:keyword and city:text
I would like to query all Products by City's url.
Example:
Given: Search all shirts by url "http://shirts-shop.com/frankfurt"
Then (step 1): Search all cities where url is "http://shirts-shop.com/frankfurt" — it will return "Frankfurt" city
Then (step 2): Search all shirts by city "Frankfurt"
In SQL databases it is quite simple to write: we just need to use 'join' query. How to write such query in ElasticSearch 6.5 ?
WARN: Entities are in separate indexes, because as said in documentation ElasticSearch starting from version 6 recommends to use 1 index per mapping.

As per my understanding the url gives the name of city.
i.e. http://shirts-shop.com/<_city_>
From this we can extract city name
In the index Product I would suggest to keep the data-type of city as keyword instead of text (so that it doesn't get analyzed).
To get shirts in <_city_> use the term query:
{
"bool": {
"must": [
{
"terms": {
"city": <_city_>
}
}
]
}
}

Related

elastic search fulltext search on multiple index

Design Query for elasticsearch:
I have 10 tables in my mysql database : news, emails, etc. Which i would sync into elasticsearch. and i want to search across all these tables in the same go.
There are no relationship in tables and all have txt field in them. Just want to search in txt field .. so should i have multiple index or just 1 index.
How should i organize my indices:
Option 1 : Should i have just one elasticsearch index(with an attribute of table type) for all the tables
OR
Option 2 : Should i have just multiple elasticsearch index for all the tables
Considering:
want to make combined query in multiple data source ordered by hits . Example : search all email + news ..
or single query to only search email or news only
Have multiple indices and query any number of them at any given time:
POST emails/_doc
{
"txt": "abc"
}
POST news/_doc
{
"txt": "ab"
}
GET emails,news/_search
{
"query": {
"query_string": {
"default_field": "txt",
"query": "ab OR abc"
}
}
}
Wildcard index names are supported too in case you've got, say, timestamp-bucketed names such as emails_2020, emails_2019 etc:
GET em*,ne*/_search
...
Also you could use the msearch to search multiple indices:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Elasticsearch: Constant score applied within match query, but after search terms have been analysed?

Imagine I have some documents, with the following values contained within a text field called name
Document1: abc xyz group
Document2: group x/group y
Document3: group 1, group 2, group 3, group 4
Now imagine I'm sending a simple match query to ES for the term 'group':
{
"query": {
"match": {
"name": "group"
}
}
}
My desired outcome would be that all 3 documents would return with the same score, no matter how often the term appears, where it appears, etc.
Now, I already know that I can do this by wrapping my match with a constant_score, like so:
{
"query": {
"constant_score": {
"filter": {
"match": {
"name": "group"
}
},
"boost": 1
}
}
}
BUT, say I now want to query using the search term abc group. In this case, what I want to happen is that Document2 and Document3 will return the same score (matches group), but Document1 to have a better score as it matches both abc and group.
With a constant_score wrapping my match query, documents that contain any of the terms return the same score (i.e Document1, 2 and 3 return the same score for abc group). If I remove the constant_score, then Document 3 has the best score presumably because it contains more matches with the search text (group appearing 4 times).
It seems as though I need a way of moving the constant_score query to after the match query has analyzed my search text. Effectively causing a query of abc group to be two constant_score queries - one for abc and one for group.
Does anyone know of a way to achieve this?
I've managed to solve this by utilising Elasticsearch's unique token filter: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-unique-tokenfilter.html
I've added that to my name field in the index mappings, and it looks to be retrieving the desired results without having to worry about constant_score.
Note however all this does is eliminate term frequencies from having any effect on the _score - other metrics (such as fieldLength) still have an effect on the results. This isn't, therefore, the equivalent of using a post-analyzed version of constant_score as I hypothesized in the question, however this will suffice for my current requirements.

Exclude results from Elasticsearch / Kibana based on aggregation value

Is it possible to exclude results based on the outcome of an aggregation?
In other words, I have aggregated on a Term and a whole bunch of results appear in a data table ordered in descending order by the count. Is it possible to configure kibana / elasticsearch to exclude results where count is 1 or less. (Where count is an aggregation).
I realise I can export the raw data from the data table visualization and delete those records manually through a text editor or excel. But I am trying to convince my organization that elasticsearch is a cool new thing and this is one of their 1st requirements...
You can exclude the result from the search by applying a filter here a sample that can be helpfull.
"query": {
"bool": {
"filter": {
"range": {
"Your_term": {
"gte": 1
}
}
}
}

How does elasticsearch fetch AND operator query from its indexes

Suppose I have a AND/MUST operator query in elasticsearch on two different indexed fields
as follows :
"bool": {
"must": [
{
"match" : {
"query": "Will",
"fields": [ "first",],
"minimum_should_match": "100%" // assuming this is q1
}
},
{
"match" : {
"query": "Smith",
"fields": [ "last" ]
"minimum_should_match": "100%" //assuming this is q2
}
}
]
}
Now I wanted to know how in background elastic search will fetch documents.
Whether it will get all id of documents where index matches q1 and then iterate over all which also has index q2.
or
It does intersection of two sets and how?.
How can I index my data to optimize and QUERIES on two separate fields?
First some basics: ElasticSearch uses lucene behind the scenes. In lucene a query returns a scorer, and that scorer is responsible for returning the list of documents matching the query.
Your boolean query will internally be translated to lucene BooleanQuery which in this case will return ConjunctionScorer, as it has only must clauses.
Each of the clauses is a TermQuery that returns a TermScorer which, when advanced, gives next matching document in increasing order of document id.
ConjunctionScorer computes intersection of the matching documents returned by scorers for each clause by simply advancing each scorer in turns.
So you can think of TermScorer as of one returning an ordered list of the documents, and of ConjunctionScorer as of one simply intersecting two ordered lists.
There's not much you can do to optimize it. Maybe, since you're not really interested in scores, you could use a filter query instead and let ElasticSearch cache it.

Elastic search - change the relevance according to an external factor

My use-case is a bit complicated so I'm simplifying it by using products and purchases:
The application has a big database with varies tables, among them - products and purchases (many to many: user_id:product_id).
Elastic has an index for the products only, as this is the only entity needed an advanced/high scale search.
What I'm trying to achieve is as following:
The more times the current user bought a product, the more relevant I want it to be.
The tricky part is the fact that Elastic has an index of the products only, not the purchases.
I can execute a query in the DB and get the info of how many times a user bought a product, and pass the results to Elastic, the question is how to do it.
Thanks.
If you can produce a reasonably-bounded purchase history for each searcher, you could implement this inside a bool query using a list of optional should block term queries
E.g.
"bool": {
"must": [ <existing query logic> ],
"should": [
{
"term": { "product_id": 654321 },
"boost": 3 <e.g. Purchased 3 times>
},
{
...
}
]
}
As a heads-up, evaluating large numbers of these optional Boolean clauses will degrade your query performance, so you might also consider using a rescore request to apply your boosting logic to only, say, the top 100 unboosted search hits, if that would satisfy your requirement.

Resources