ElasticSearch routing based on array field - elasticsearch

I've tried to google my question but search results are flooded with articles using very basic implementation of routing. I didn't manage to find anything useful.
Let's say I have an object "product":
{ price: 100, category: 1 }
Routing based on "category" field will work as expected.
Now I change my "product" to:
{ price: 100, categories: [1,2,3] }
How routing will behave based on "categories" field? Is it safe to do it? Any side effects (except possible product duplication in search results)?
Will ES engine split an array and put product into different shards? Or will it combine all values into one (for example, "1_2_3") and put this product into a single shard?
I would be really thankful for any thoughts on this topic.

Related

Painless script with Spring Data Elasticsearch

We are using Spring Data Elasticsearch to build a 'fan out on read' user content feed. Our first attempt is currently showing content based on keyword matching and latest content using NativeSearchQueryBuilder.
We want to further improve the relevancy order of what is shown to the user based on additional factors (e.g. user engagement, what currently the user is working on etc).
Can this custom ordering be done using NativeSearchQueryBuilder or do we get more control using a painless script? If it's a painless script, can we call this from Spring Data ElasticSearch?
Any examples, recommendations would be most welcome.
Elasticsearch orders it result by it relevance-score (which marks a result relevancy to your search query), think that each document in the result set includes a number which signifies how relevant the document is to the given query.
If the data you want to change your ordering upon is part of your indexed data (document fields for example), you can use QueryDSL, to boost the _score field, few options I can think on:
boost a search query dependent on it criteria: a user searches for a 3x room flat but 4x room in same price would be much better match, then we can: { "range": { "rooms": { "gte": 4, "boost": 1 }}}
field-value-factor you can favor results by it field value: more 'clicks' by users, more 'likes', etc..,
random-score if you want randomness in your results: different
result every time a user refreshes your page or you can mix with existing scoring.
decay functions (Gauss!) to boost/unboost results that are close/far to our central point. lets say we want to search apartments and our budget is set to 1700. { "gauss": { "price": { "origin": "1700", "scale": "300" } } } will give us a feeling on how close we are to our budget of 1,700. any flat with much higher prices (let's say 2,300) - would get much more penalized by the gauss function - as it is far from our origin. the decay and the behavior of gauss function - will separate our results accordingly to our origin.
I don't think this has any abstraction on spring-data-es and I would use FunctionScoreQueryBuilder with the NativeSearchQueryBuilder.

What is the best way to index data on elasticsearch?

I have 4 tables:
country
state
city
address
These tables are related by ids where country is the top parent:
state.countryId
city.stateId
address.cityId
I want to integrate elastic search on my application and want to know what is the best way to index these table?
Should i create 1 index for each tables so that i have 1 index for each of country, state, city and address?
Or should i denormalize the tables and create only 1 index and store all the data with redundancy?
ES is not afraid of redundancy in your data, so I would clearly denormalize so that each document represents one address like this:
{
"country_id": 1,
"country_name": "United Stated of America",
"state_id": 1,
"state_name": "California"
"state_code": "CA",
"city_id": 1,
"city_name": "San Mateo"
"zip_code": 94402,
"address": "400 N El Camino Real"
}
You can then aggregate your data on whatever city, state, country field you wish.
Your mileage may vary as it ultimately depends on how you want to query/aggregate your data, but it's much easier to query address data like this in a single index instead of hitting several indices.
I like Val's answer, it is the most straight forward option. But if you really want to reduce duplication (for example to minimize size on disk) you could use parent-child mapping. It will make indexing and querying a bit more verbose though. I still sugges to go with "flat" mapping.
You asked "what if you need the individual country or state or city records?", I'd recommend to add an additional field (not_analyzed or integer) which would indicate which level of hierarchy this document represents. It is fine not to have fields which correspond to lower levels of hierarchy. This way you could easily have a filter on just searching states or countries.
Here is a very useful article by #adrien-grand which elaborates on the subject of the trade-offs between creating many indexes, or less indexes and many types.
Hope it helps!

How to structure Elasticsearch indices/types?

How would you structure indices/types for an eshop application? Such an eshop would consist of domain objects like product, category, tag, manufacturer etc. The fulltext search results page should display intermixed list of all domain objects.
I can think of two options:
One index per whole application, every domain object as a type.
Every domain object has its own index, the type is the same - "item".
Which option will scale better?
The most of the "items" in the database are products. Some products aren't yet/anymore available. How to boost currently available products?
The fulltext should prefer to show categories/manufacturers on top of the page. How to boost certain types / objects from certain index?
For better performance i suggest first option is better one.
1)"One index per whole application, every domain object as a type."
2)Consider you create an index named "eshop".And types such as mobile,book etc
3)Because you can query according to your user input.Consider you create a shopping website like flipkart.In search user can search with plain keyword.
4)Now you can search in Elasticsearch with only mentioning index name.If user refer sum filter like mobile,range 1000-10000.you need to search inside mobile type,moreover we can easily filter in Elasticsearch.it will reduce your execution memory and CPU.
To boost available products.Add a field called "available" in your document.And while searching mentions boost value for available product.Example:
{
"query": {
"term": {
"available": true
}
}
"boost": 1.5
}
For more Boosting refer
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
http://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/

different field according to categories

im trying to use elasticsearch to search through products. If product is a car for instance, it will have some field like "color", "brand", "model", "km", ...
If it is clothes, it will only have "color", "size", ...
I would like to index all this info in elastic to be able then to search cars with km between aaa km and bbb km, and / or xxxx model, same for clothes or any other products.
how can I create such field(s) in elasticsearch ? I want all products to be in same index, so user can search through all products, but also if user search a type a product, then he should be able to specify some more details according to this kind of product.
I was thinking about array field, but does that mean that all products will have all fields corresponding to all type of products even if some fields are not relevant with some products (ie clothes will have km field ??) ? Or is it possible on indexing to put just info needed corresponding to each product ?
thanks
You could use types. Create a type called car with fields color, brand, model, k etc. and then a type called cloth with fields color, size, etc.
A single index can have many types. The following two links might help you in this:
Creating indices
Creating types and mapping to the index
You could easily search across types so that you could issue a search like this to return all documents form all types within that index:
curl -XGET http://localhost:9200/_search?pretty=true -d '{"query":{"matchAll":{}}}'
Additional information - Searching across types
Having an array field is not a good idea since you would not be utilizing the ability of elasticsearch to index semi structured documents.
All the best.

many indexes for mongodb refined searches

Referring to this question here:
I am working on a similar site using mongodb as my main database. As you can imagine, each user object has a lot of fields that need to be serchable, say for example mood, city, age, sex, smoker, drinker, etc.
Now, apart from the problem that there cannot be more than 64 indexes per collection, is it wise to assign index to all of my fields?
There might be another viable way of doing it: tags (refer to this other question) If i set the index on an array of predetermined tags and then text-search over them, would it be better? as I am using only ONE index. What do you think? E.g.:
{
name: "john",
tags: ["happy", "new-york", "smoke0", "drink1"]
}
MongoDB doesn't (yet) support index intersection, so the rule is: one index per query. Some of your query parameters have extremely low selectivity, the extreme example being the boolean ones, and indexing those will usually slow things down rather than speed them up.
As a simple approximation, you could create a compound index that starts with the highest-selectivity fields, for instance {"city", "age", "mood", ... }. However, then you will always have to use a city constraint. If you query for {age, mood}, the above index wouldn't be used.
If you can narrow down your result set to a reasonable size using indexes, a scan within that set won't be a performance hog. More precisely, if you say limit(100) and MongoDB has to scan 200 items to fill up those 100, it won't be critical.
The danger lies is very narrow searches across the database - if you have to perform a scan on the entire dataset to find the only unhappy, drinking non-smoker older than 95, things get ugly.
If you want to allow very fine grained searches, a dedicated search database such as SolR might be a better option.
EDIT: The tags suggestion looks a bit like using the crowbar to me -- maybe the key/value multikey index recommended by in the MongoDB FAQ is a cleaner solution:
{ _id : ObjectId(...),
attrib : [
{ k: "mood", v: "happy" },
{ k: "city": v: "new york" },
{ k: "smoker": v: false },
{ k: "drinker": v: true }
]
}
However, YMMV and 'clean' and 'fast' often don't point in the same direction, so the tags approach might not be bad at all.

Resources