elastic search fulltext search on multiple index - elasticsearch

Design Query for elasticsearch:
I have 10 tables in my mysql database : news, emails, etc. Which i would sync into elasticsearch. and i want to search across all these tables in the same go.
There are no relationship in tables and all have txt field in them. Just want to search in txt field .. so should i have multiple index or just 1 index.
How should i organize my indices:
Option 1 : Should i have just one elasticsearch index(with an attribute of table type) for all the tables
OR
Option 2 : Should i have just multiple elasticsearch index for all the tables
Considering:
want to make combined query in multiple data source ordered by hits . Example : search all email + news ..
or single query to only search email or news only

Have multiple indices and query any number of them at any given time:
POST emails/_doc
{
"txt": "abc"
}
POST news/_doc
{
"txt": "ab"
}
GET emails,news/_search
{
"query": {
"query_string": {
"default_field": "txt",
"query": "ab OR abc"
}
}
}
Wildcard index names are supported too in case you've got, say, timestamp-bucketed names such as emails_2020, emails_2019 etc:
GET em*,ne*/_search
...

Also you could use the msearch to search multiple indices:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Related

Join-like query for multiple indexes

I have 2 entities stored in separate indexes:
City index has 2 fields mapping: name:keyword and url:text.
Product index has 2 fields mapping: name:keyword and city:text
I would like to query all Products by City's url.
Example:
Given: Search all shirts by url "http://shirts-shop.com/frankfurt"
Then (step 1): Search all cities where url is "http://shirts-shop.com/frankfurt" — it will return "Frankfurt" city
Then (step 2): Search all shirts by city "Frankfurt"
In SQL databases it is quite simple to write: we just need to use 'join' query. How to write such query in ElasticSearch 6.5 ?
WARN: Entities are in separate indexes, because as said in documentation ElasticSearch starting from version 6 recommends to use 1 index per mapping.
As per my understanding the url gives the name of city.
i.e. http://shirts-shop.com/<_city_>
From this we can extract city name
In the index Product I would suggest to keep the data-type of city as keyword instead of text (so that it doesn't get analyzed).
To get shirts in <_city_> use the term query:
{
"bool": {
"must": [
{
"terms": {
"city": <_city_>
}
}
]
}
}

Search in multiple indexes in elastica

I am looking for a way to search in more than one index at the same time using Elastica.
I have an index products, and an index user.
products contains {product_id, product_name, price} and user contains {product_id, user_name, date}. Knowing that the product_id in both of them is the same, in products each products_id is unique but in user they're not as a user can buy the same product multiple times.
Anyway, I want to automatically get the price of a product from the products index while searching through the user index.
I know that we can search over multiple indexes like so (correct me if I'm wrong) :
$search = new \Elastica\Search($client);
$search->addIndex('users')
->addType('user')
->addIndex('products')
->addType('product');
But the problem is, when I write an aggregation on the products_id for example and then create a new query with some filters :
$products_agg = new \Elastica\Aggregation\Terms('products_id');
$products_agg->setField('products_id')->setSize(0);
$query = new \Elastica\Query();
$query->addAggregation($products_agg);
$query->setQuery($bool);
$search->setQuery($query);
How does elastica know in which index to search? How can I link this products_id to the other index?
The Elastica library has support for Multi Search API, The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.
The format of the requests is similar to the bulk API, The first line
is header part that includes which index / indices to search on, The second line includes the typical search body requests.
{"index" : "products", "type": "products"}
{"query" : {"match_all" : {}}, "from" : 0, "size" : 10} // write your own query to get price
{"index" : "uesrs", "type" : "user"}
{"query" : {"match_all" : {}}} // query for user
Check test case in Multi/SearchTest.php to see how to use.
Basically you want to join two indexes based on a common field as in sql.
What you can do is model you data in the same index using join datatype
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html
Index all documents in the same index ,
Make all product documents - parent.
Make all user documents as child
And the use parent-child aggregations and queries
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_queries_and_aggregations
NOTE: make sure of the performance implication of parent-child mapping
https://www.elastic.co/guide/en/elasticsearch/reference/master/parent-join.html#_parent_join_and_performance
One more thing you can do is put all the information of the product with every user that buys it.
But this can unnecessarily waste you space and is not a good practice as per data rules are concerned.
But since this is a search engine and elasticsearch suggests that best is to normalise and duplicate data rather that using parent-child.
you can try the following:
1- naming indexes with specific name like the following
myFirstIndex-myProjectName
mySecIndex-myProjectName
myThirdIndex-myProjectName
and so on.
2- that's give me the ability using * in the field of indexes to search because it accepts wildcard so i can search across multiple fields like this using kibana Dev Tools
GET *-myProjectName/_search
{
"_source": {
"excludes": [ "*" ]
},
"query": { "match_all": {} },
}
this will search on each index includes -myProjectName.
You can't query two indices with different mappings. Best way to solve your problem is to just do two queries (application-side joins). First query you do the aggregations on the user and the second you get the prices.
Another option would be to add the price to the user index. Sometimes you have to sacrifice a little space for better usability.

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Exclude results from Elasticsearch / Kibana based on aggregation value

Is it possible to exclude results based on the outcome of an aggregation?
In other words, I have aggregated on a Term and a whole bunch of results appear in a data table ordered in descending order by the count. Is it possible to configure kibana / elasticsearch to exclude results where count is 1 or less. (Where count is an aggregation).
I realise I can export the raw data from the data table visualization and delete those records manually through a text editor or excel. But I am trying to convince my organization that elasticsearch is a cool new thing and this is one of their 1st requirements...
You can exclude the result from the search by applying a filter here a sample that can be helpfull.
"query": {
"bool": {
"filter": {
"range": {
"Your_term": {
"gte": 1
}
}
}
}

Elasticsearch sorting the data by using keywords

I am a noobie in elasticsearc.
Recently I am doing a research for the keyword search. I already done a version for the mysql with php. But I don't have idea how to do it in elasticsearch by using its default functions.
Here is the data format:
[{"id":"1","keyword":["A","B"]},
{"id":"2","keyword":["A","C"]}
]
Basically those keywords work as hashtag for searching to find out the data.
And I had to take the most keyword hits in the records and sort them according to how many keyword they got hit.
In this example, if I input "A B" for searching in this example, I will get the result as:
[{"id":"1","id":"2"}]
id 1 record hits two keywords and become the first record in the ordering,
id 2 record hits only one keyword and become the second record.
How can I do it in Elasticsearch?
Try this query
{
"query": {
"match": {
"keyword": "A B"
}
}
}

Resources