different field according to categories - elasticsearch

im trying to use elasticsearch to search through products. If product is a car for instance, it will have some field like "color", "brand", "model", "km", ...
If it is clothes, it will only have "color", "size", ...
I would like to index all this info in elastic to be able then to search cars with km between aaa km and bbb km, and / or xxxx model, same for clothes or any other products.
how can I create such field(s) in elasticsearch ? I want all products to be in same index, so user can search through all products, but also if user search a type a product, then he should be able to specify some more details according to this kind of product.
I was thinking about array field, but does that mean that all products will have all fields corresponding to all type of products even if some fields are not relevant with some products (ie clothes will have km field ??) ? Or is it possible on indexing to put just info needed corresponding to each product ?
thanks

You could use types. Create a type called car with fields color, brand, model, k etc. and then a type called cloth with fields color, size, etc.
A single index can have many types. The following two links might help you in this:
Creating indices
Creating types and mapping to the index
You could easily search across types so that you could issue a search like this to return all documents form all types within that index:
curl -XGET http://localhost:9200/_search?pretty=true -d '{"query":{"matchAll":{}}}'
Additional information - Searching across types
Having an array field is not a good idea since you would not be utilizing the ability of elasticsearch to index semi structured documents.
All the best.

Related

Elasticsearch - what is better: query several fields or single combined field?

Declaimer: Possible duplicate of this SO question, not sure...
Let's assume I have something similar to IMDB (e.g. catalog of movies) and I want to store it in Elasticsearch.
Single Movie record contains Title, Description, and Categories (strings, e.g. "Children", "Action", etc).
Let's assume that users allowed to search a free text, which can be everything: words from title, from description or from categories (e.g. "movie for children").
I wondering, from search performance perspective, what is more efficient: to query on each of the fields, or to create a special big field which is a concatenation of all of the fields and then to query only on it.

What does the "Type" mean in Elasticsearch?

I am totally confused by Elasticsearch's documents.
In Basic Concepts: Type, "type" are somehow like collections in MongoDB:
In this index, you may define a type for user data, another type for blog data, and yet another type for comments data.
But in Types and Mappings: Type Takeaways, it says:
Types are not as well suited for entirely different types of data. If your two types have mutually exclusive sets of fields, that means half your index is going to contain "empty" values (the fields will be sparse), which will eventually cause performance problems.
Doesn't "user" and "blog" above mentioned have mutually exclusive sets of fields?
For example: there are "name", "age" fields for "user", and "createdAt", "content" for "blog".
I'm used to believe the mapping relation between Elasticsearch and MongoDB is:
index <=> database
type <=> collection
isn't it right?
If not, what is the recommended mapping style between them?
Types are not as well suited for entirely different types of data. If your two types have mutually exclusive sets of fields, that means half your index is going to contain "empty" values (the fields will be sparse), which will eventually cause performance problems.
The type is just another field in Elasticsearch, at the very basic level. When you do GET /my_index/my_type/_search ES will run a pre-filter for my_type value for field _type - it's like an automatic filter.
Don't think about indices and types as databases and tables in SQL world, because they are not that.
If you have type1 with fields f1 and f2 and type2 with fields f1 and f3 in the index there will be documents with fields f1, f2, f3. Why this matters - when the score for a document will be calculated with queries that search for values in field f1 the terms frequencies in field f1 will be global (both type1 and type2) so if you search some value in f1 from type1 then the score you get back is slightly influenced, also, by the values of f1 in type2.
Also, please, don't translate a set of SQL tables to ES by simply following the primary key/foreign key approach to define parent/child relationships in ES.
You're right, index == database and type == collection for elasticsearch. In RDBMS terms, index is a database and type can be a table which contains many rows(document in elasticsearch).
You could have a different index maintaining user information, with the "name", "age" and other such fields generally attributed to a person, and a different one for blogs with "createdAt", "content", etc. Yet, you might want to have a "user" field inside each blog document to be able to identify the person who posted it. Later, you can apply application-side joins, if need be.

How to tabulate data , without doing any aggregation in Kibana?

How can I tabulate data from events in kibana, without doing any aggregations?
I want to prepare a table containing 3 columns:
Hotel Name
No. of Rooms
Zipcode of Hotel
I want to extract this info from events and populate the table with above three values. How can I do this in Kibana?
You may be able to accomplish this by saving a search in the discover application and adding it to a dashboard directly (skipping the visualize step).
At the top of the "Add" panel in dashboard there is a "Searches" tab:
This tab lists all of the searches that you've saved from Discover and allows you to visualize the raw field values of documents as a table.
Hope that helps!
You can't make a table without aggregating, but (depending on your data) you may be able to get what you want by aggregating first on hotel name (Terms, Field=name, Order=Top, Size=100) then by zip code (Terms, Field=zip). The aggregation is so narrow that there is never more than one hotel in any given bucket.
Then use metric of Sum of number of rooms.
This assumes there are no two same-named hotels in the same zip code. If there are, you'll need to add a third column with some unique identifier.
I tried this using the following mapping
{"name": {"type":"string","index":"not_analyzed"},
"number-of-rooms":{"type":"integer"},
"zip": {"type":"string","index":"not_analyzed"}}
It worked fine, with the drawback that the table column header labels are "Top 100 name", "Top 100 zip" and "Sum of number-of-rooms", which isn't very user friendly.

Elasticsearch indexed database table column structure

I have a question regarding the setup of my elasticsearch database index... I have created a table which I have rivered to index in elasticsearch. The table is built from a script that queries multiple tables to denormalize data making it easier to index by a unique id 1:1 ratio
An example of a set of fields I have is street, city, state, zip, which I can query on, but my question is , should I be keeping those fields individually indexed , or be concatenating them as one big field like address which contains all of the previous fields into one? Or be putting in the extra time to setup parent-child indexes?
The use case example is I have a customer with billing info coming from one direction, I want to query elasticsearch to see if that customer already exists, or at least return the closest result
I know this question is more conceptual than programming, I just can't find any information of best practices.
Concatenation
For the first part of your question: I wouldn't concatenate the different fields into a field containing all information. Having multiple fields gives you the advantage of calculating facets and aggregates on those fields, e.g. how many customers are from a specific city or have a specific zip. You can still use a match or multimatch query to query for information from different fields.
In addition to having the information in separate fields I would use multifields with an analyzed and not_analyzed part (fieldname.raw). This again allows for aggregates, facets and sorting.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
Think of 'New York': if you analyze it will be stored as ['New', 'York'] and you will not be able to see all People from 'New York'. What you'd see are all people from 'New' and 'York'.
_all field
There is a special _all field in elasticsearch which does the concatenation in the background. You don't have to do it yourself. It is possible to enable/disable it.
Parent Child relationship
Concerning the part whether to use nested objects or parent child relationship: I think that using a parent child relationship is more appropriate for your case. Nested objects are stored in a 'flattened' way, i.e. the information from the nested objects in arrays is stored as being part of one object. Consider the following example:
You have an order for a client:
client: 'Samuel Thomson'
orderline: 'Strong Thinkpad'
orderline: 'Light Macbook'
client: 'Jay Rizzi'
orderline: 'Strong Macbook'
Using nested objects if you search for clients who ordered 'Strong Macbook' you'd get both clients. This because 'Samuel Thomson' and his orders are stored altogether, i.e. ['Strong' 'Thinkpad' 'Light' 'Macbook'], there is no distinction between the two orderlines.
By using parent child documents, the orderlines for the same client are not mixed and preserve their identity.

How to structure Elasticsearch indices/types?

How would you structure indices/types for an eshop application? Such an eshop would consist of domain objects like product, category, tag, manufacturer etc. The fulltext search results page should display intermixed list of all domain objects.
I can think of two options:
One index per whole application, every domain object as a type.
Every domain object has its own index, the type is the same - "item".
Which option will scale better?
The most of the "items" in the database are products. Some products aren't yet/anymore available. How to boost currently available products?
The fulltext should prefer to show categories/manufacturers on top of the page. How to boost certain types / objects from certain index?
For better performance i suggest first option is better one.
1)"One index per whole application, every domain object as a type."
2)Consider you create an index named "eshop".And types such as mobile,book etc
3)Because you can query according to your user input.Consider you create a shopping website like flipkart.In search user can search with plain keyword.
4)Now you can search in Elasticsearch with only mentioning index name.If user refer sum filter like mobile,range 1000-10000.you need to search inside mobile type,moreover we can easily filter in Elasticsearch.it will reduce your execution memory and CPU.
To boost available products.Add a field called "available" in your document.And while searching mentions boost value for available product.Example:
{
"query": {
"term": {
"available": true
}
}
"boost": 1.5
}
For more Boosting refer
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html
http://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/

Resources