How to structure Elasticsearch indices/types? - full-text-search

How would you structure indices/types for an eshop application? Such an eshop would consist of domain objects like product, category, tag, manufacturer etc. The fulltext search results page should display intermixed list of all domain objects.
I can think of two options:
One index per whole application, every domain object as a type.
Every domain object has its own index, the type is the same - "item".
Which option will scale better?
The most of the "items" in the database are products. Some products aren't yet/anymore available. How to boost currently available products?
The fulltext should prefer to show categories/manufacturers on top of the page. How to boost certain types / objects from certain index?

For better performance i suggest first option is better one.
1)"One index per whole application, every domain object as a type."
2)Consider you create an index named "eshop".And types such as mobile,book etc
3)Because you can query according to your user input.Consider you create a shopping website like flipkart.In search user can search with plain keyword.
4)Now you can search in Elasticsearch with only mentioning index name.If user refer sum filter like mobile,range 1000-10000.you need to search inside mobile type,moreover we can easily filter in Elasticsearch.it will reduce your execution memory and CPU.
To boost available products.Add a field called "available" in your document.And while searching mentions boost value for available product.Example:
{
"query": {
"term": {
"available": true
}
}
"boost": 1.5
}
For more Boosting refer
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
http://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/

Related

Elastic Search and Search Ranking Models

I am new to Elastic Search. I would like to know if the following steps are how typically people use ES to build a search engine.
Use Elastic Search to get a list of qualified documents/results based on a user's input.
Build and use a search ranking model to sort this list.
Use this sorted list as the output of the search engine to the user.
I would probably add a few steps
Think about your information model.
What kinds of documents are you indexing?
What are the important fields and what field types are they?
What fields should be shown in the search result?
All this becomes part of your mapping
Index documents
Are the underlying data changing or can you index it just once?
How are you detecting new docuemtns/deletes/updates?
This will be included in your connetors, that can be set up in multiple ways, for example using the Documents API
A bit of trial and error to sort out your ranking model
Depending on your use case, the default ranking may be enough.
have a look at the Search API to try out different ranking.
Use the search result list to present the results to the end user

Painless script with Spring Data Elasticsearch

We are using Spring Data Elasticsearch to build a 'fan out on read' user content feed. Our first attempt is currently showing content based on keyword matching and latest content using NativeSearchQueryBuilder.
We want to further improve the relevancy order of what is shown to the user based on additional factors (e.g. user engagement, what currently the user is working on etc).
Can this custom ordering be done using NativeSearchQueryBuilder or do we get more control using a painless script? If it's a painless script, can we call this from Spring Data ElasticSearch?
Any examples, recommendations would be most welcome.
Elasticsearch orders it result by it relevance-score (which marks a result relevancy to your search query), think that each document in the result set includes a number which signifies how relevant the document is to the given query.
If the data you want to change your ordering upon is part of your indexed data (document fields for example), you can use QueryDSL, to boost the _score field, few options I can think on:
boost a search query dependent on it criteria: a user searches for a 3x room flat but 4x room in same price would be much better match, then we can: { "range": { "rooms": { "gte": 4, "boost": 1 }}}
field-value-factor you can favor results by it field value: more 'clicks' by users, more 'likes', etc..,
random-score if you want randomness in your results: different
result every time a user refreshes your page or you can mix with existing scoring.
decay functions (Gauss!) to boost/unboost results that are close/far to our central point. lets say we want to search apartments and our budget is set to 1700. { "gauss": { "price": { "origin": "1700", "scale": "300" } } } will give us a feeling on how close we are to our budget of 1,700. any flat with much higher prices (let's say 2,300) - would get much more penalized by the gauss function - as it is far from our origin. the decay and the behavior of gauss function - will separate our results accordingly to our origin.
I don't think this has any abstraction on spring-data-es and I would use FunctionScoreQueryBuilder with the NativeSearchQueryBuilder.

Elastic Search document modeling for history

I want to store products in elastic search
Each product has some fields (description, quantity, price, name). But every day the price and quantity could change.
How can I store this in elastic search so that I will be able to search for any product for all the past prices?
Should I have a document for the current value fields and another document which will have the product document as parent, and there will be some daily task to add the date and changed value in an array ?
Unfortunately, there's no built in way to deal with versioning in ElasticSearch. The built-in versioning isn't designed for the retrieval of previous versions. You will need to control versioning at the application layer.
What we've ultimately elected to do is store all the old copies of the documents like this:
{
"unversioned_prop1": "prop1",
"unversioned_prop2": "prop2",
...
"versions": [
{
"version": "version_x",
"version_metadata": { ... }
"document": {
"versioned_prop3": "prop3",
"versioned_prop4": "prop4"
...
}
},
{ "version": "version_y", "document": { ... versioned props ... } },
...
]
"current": { ... current versioned props ... }
}
Unversioned Properties
Having the unversioned properties outside of the array is useful because you may want to update some properties for ALL versions of the document. Additionally, it ensures that search weights behave predictably.
It has the downside of requiring us to seam some of the information together in the application layer.
Current Version
Breaking out the current version into a separate property allows you to use search filtering to only return the most recent version of the document.
Version metadata
This includes any versioning information that you might want to search on, such as dates.
Search
You can easily search the versioned properties just like you can subproperties. So search ends up looking like this:
...
{
"match": {"versions.document.versioned_prop": "query string"
}
This will search across ALL versions of the document, and return the combined document if there's a match.
Updates
When we need to create a new version, you can use a partial update to insert the new document and update the current document.
Alternative
The major downside with this approach is that you can't easily filter down some of the search results based on things inside of versions - you will likely want to filter them on the application side.
If you need your documents to behave independently, you will likely need to index them independently. To achieve that you can include a "collection id" on all the versions. The collection ID is unique to the document, and is shared across all versions.
The collection ID approach ended up having too many issues, and we moved to the approach outlined above, and have had a much higher level of success.
As a side note, I personally wouldn't recommend that you use ElasticSearch as the primary storage of important records. Only do it if you can live with the occasional data loss.
First thing you should not update existing document with new quantity/price.
I will suggest whenever there is a change in quantity/price , insert new document.There will be duplicate fields but you can have all information about that product on given date in a document.
You can also retrieve all documents for that product and it will have their own values(prices).Data will be duplicated in this modeling but i don't see this as an issue.

Cost of adding field mapping in elasticsearch type

I have a use-case, where I have got a set of predefined fields and also need to support adding dynamic fields to ElasticSearch with some basic searching on them. I am able to achieve this using dynamic template mapping. However, the frequency of adding such dynamic fields is quite high.
Consider the this ES document for the Event type:
{
"name":"Youth Conference",
"venue":"Ahmedabad",
"date":"10/01/2015",
"organizer":"Invincible",
"extensions":{
"about": {
"vision":"Visualizes the image of an ideal Country. ",
"mission":"Encapsulates the gravity of the top reformative solutions for betterment of Country."
}
// Any thing can go here..
}
}
In the example above, each event document may have any unknown/new fields. Hence, for every such new dynamic field introduced, ES will update the mapping of the type. My concern is what is the cost of adding new field mapping in the existing type?
I am planning to separate out all dynamic mappings(inside extensions) from Event type by introducing another type, say EventExtensions and using parent/child relationship to map it with Event type. I believe this may limit the cost(if any) of adding dynamic fields frequently to the type. However, to my knowledge, using parent/child relationship will need more memory.
The first thing to remember here is that field is per index and not per type.
So wherever you add new fields , it would be made in the same index. Be it , in another type or as parent or child.
So decoupling the new fields to another type but same index is not going to make any change.
Second field addition is not that very expensive thing. I know people who uses 1000 of fields and are good with it. That being said , there should be a tab on number of field so that it wont go out to crazy numbers.
Here we have multiple approaches to solve the problem
1) Lets assume that the new field data need not be exactly searchable. In this case , you can deserialize the entire JSON as a string and add it to a field. Also make sure this field is not indexed. This way you can search based on other fields but then on retrieval of the document , get the information that was deserialized.
2) Lets say the new field looks like this
{
"newInfo1" : "log Of Info",
"newInfo2" : "A lot more info"
}
Instead of this , you can use
{
"newInfo" : [
{
"fieldName" : "newInfo1",
"fieldValue" : "log Of Info"
},
{
"fieldName" : "newInfo2",
"fieldValue" : "A lot more info"
}
]
}
This way , fields wont increase. But then to make field level specific search , like give me all documents with filedName as newInfo2 and having the word more in it , you will need to make newInfo field nested.
Hope this helps.

different field according to categories

im trying to use elasticsearch to search through products. If product is a car for instance, it will have some field like "color", "brand", "model", "km", ...
If it is clothes, it will only have "color", "size", ...
I would like to index all this info in elastic to be able then to search cars with km between aaa km and bbb km, and / or xxxx model, same for clothes or any other products.
how can I create such field(s) in elasticsearch ? I want all products to be in same index, so user can search through all products, but also if user search a type a product, then he should be able to specify some more details according to this kind of product.
I was thinking about array field, but does that mean that all products will have all fields corresponding to all type of products even if some fields are not relevant with some products (ie clothes will have km field ??) ? Or is it possible on indexing to put just info needed corresponding to each product ?
thanks
You could use types. Create a type called car with fields color, brand, model, k etc. and then a type called cloth with fields color, size, etc.
A single index can have many types. The following two links might help you in this:
Creating indices
Creating types and mapping to the index
You could easily search across types so that you could issue a search like this to return all documents form all types within that index:
curl -XGET http://localhost:9200/_search?pretty=true -d '{"query":{"matchAll":{}}}'
Additional information - Searching across types
Having an array field is not a good idea since you would not be utilizing the ability of elasticsearch to index semi structured documents.
All the best.

Resources