Elasticsearch sequential dates - elasticsearch

I am creating a hotel booking system using Elasticsearch and am trying to find a way to return hotels that have a variable number of sequential dates available (for example 7 days) across a range of dates
I am currently storing dates and prices as a child document to the hotel but am unsure how to undertake the search or if it is even possible with my current setup?
Edit: added mappings
Hotel Mapping
{
"hotel":{
"properties":{
"city":{
"type":"string"
},
"hotelid":{
"type":"long"
},
"lat":{
"type":"double"
},
"long":{
"type":"double"
},
"name":{
"type":"multi_field",
"fields":{
"name":{
"type":"string"
},
"name.exact":{
"type":"string",
"index":"not_analyzed",
"omit_norms":true,
"index_options":"docs",
"include_in_all":false
}
}
},
"star":{
"type":"double"
}
}
}
}
Date Mapping
{
"dates": {
"_parent": {
"type": "hotel"
},
"_routing": {
"required": true
},
"properties": {
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"price": {
"type": "double"
}
}
}
}
I am currently using a date range to select the available dates and then a field query to match the city - the other fields will be used later

I had to to something similar and i ended up storing every day the hotel(room) was booked during indexing. (So a list of [2014-02-15, 2014-02-16, 2014-02-17, etc])
After that is was fairy trivial to write a query to find all hotel rooms that were free during a certain date range.
Still seems like there should be am ore elegant solution, but this ended up working great for me.

In my project we have limited period of stays, so I created an object in mapping, and specified available booking dates for each period, like:
list of available booking dates for 2 days stay
list of available booking dates for 3 days stay
....
list of available booking dates for 10 days stay

Related

Store Hotel Availabilies with daily informations

I have to store some millions of hotel rooms with some requirements:
Hotel gives the numbers of identical rooms available - daily
Price can change daily, this data are only stored in es, not indexed
The index will only be used for search (no for monitoring) using the hotel's Geolocation
Size: Let s say about 50k hotels, 10 rooms each, 1 year+ Availability => 200 millions
So we have to manage on a "daily" level.
Each time a room is booked, on our application, the numbers of rooms should be updated, we also store "cache" from the partner (other hotel providers) working worldwide, we request them at a regular interval to update our cache.
I am pretty familiar with the elastic search, but I still hesitate between 2 mappings, I removed some fields (breakfast, amenities, smoking...) to keep it simple:
The first one, 1 document by room, each of them contains 365 children (one by day)
{
"mappings": {
"room": {
"properties": {
"room_code": {
"type": "keyword"
},
"hotel_id": {
"type": "keyword"
},
"isCancellable": {
"type": "boolean"
},
"location": {
"type": "geo_point"
}
"price_summary": {
"type": "keyword",
"index": false
}
}
},
"availability": {
"_parent": {
"type": "room"
},
"properties": {
"date": {
"type": "date",
"format": "date"
},
"number_available": {
"type": "integer"
},
"overwrite_price_summary": {
"type": "keyword",
"index": false
}
}
}
}
}
pros:
Update, reindex will be isolated on the child level
Only one index
Adding future availabilities is easy (just adding child documents in a room)
cons:
Query will be a little slower, because of the join (looping of availability children)
Childs AND parents need to be returned, so the query would have to include an inner_hits.
A lot of hotels create temporary rooms (for vacation, local event...), only available 1 month a year, for example, this add useless rooms for the 11 remaining months in the index.
The second: I create one index by month (Jan, Feb...) using nested documents instead of children.
{
"mappings": {
"room": {
"properties": {
"room_code": {
"type": "keyword"
},
"hotel_id": {
"type": "keyword"
},
"isCancellable": {
"type": "boolean"
},
"location": {
"type": "geo_point"
}
"price_summary": {
"type": "keyword",
"index": false
},
"availability": {
"type": "nested"
}
}
},
"availability": {
"properties": {
"day_of_month": {
"type": "integer"
},
"number_available": {
"type": "integer"
},
"overwrite_price_summary": {
"type": "keyword",
"index": false
}
}
}
}
}
pros:
Faster, no join, smaller index
Resolve the issue of the temporary room, thanks to the 12 monthly index
cons:
Update, booking a room for 1 night will make reindex the room documents (of the matching month)
If a customer is looking for a room with a check-in on the 31st March, for example, we will have to query 2 index, March and April
For the search/query, the second option is better in theory.
The main problem is about the updates of the rooms:
According to my production, about 30 million daily availabilities change / 24 hours.
I also have to read/compare and update if needed, cache from the partner, about 130 million of reading / possible update every (one update for 10 reads) 12 hours (in means).
I have 6 other indexed fields in my mappings on room level, this is not a lot, so maybe a nested solution is ok...
So, which one is the best?
note: I read this How to store date range data in elastic search (aws) and search for a range?
But my case is a little different because of the daily information.
Any help/advice is welcome.

How to index date ranges with ElasticSearch 5.1

I have documents that I want to index/search with ElasticSearch. These documents may contain multiple dates, and in some cases, the dates are actually date ranges. I'm wondering if someone can help me figure out how to write a query that does the right thing (or how to properly index my document so I can query it).
An example is worth a thousand words. Suppose the document contains two marriage date ranges: 2005-05-05 to 2007-07-07 and 2012-12-012 to 2014-03-03.
If I index each date range in start and end date fields, and write a typical range query, then a search for 2008-01-01 will return this record because one marriage will satisfy one of the inequalities and the other will satisfy the other. I don't know how to get ES to keep the two date ranges separate. Obviously, having marriage1 and marriage2 fields would resolve this particular problem, but in my actual data set I have an unbounded number of dates.
I know that ES 5.2 supports the date_range data type, which I believe would resolve this issue, but I'm stuck with 5.1 because I'm using AWS's managed ES.
Thanks in advance.
You can use nested objects for this purpose.
PUT /records
{
"mappings": {
"record": {
"properties": {
"marriage": {
"type": "nested",
"properties": {
"start": { "type": "date" },
"end": { "type": "date" },
"person1": { "type": "string" },
"person2": { "type": "string" }
}
}
}
}
}
}
PUT /records/record/1
{
"marriage": [ { "start" : "2005-05-05","end" :"2007-07-07" , "person1" : "","person2" :"" },{"start": "2012-12-12","end": "2014-03-03","person1" : "","person2" :"" }]
}
POST /records/record/_search
{
"query": {
"nested": {
"path": "marriage",
"query": {
"range": {
"marriage.start": { "gte": "2008-01-01", "lte": "2015-02-03"}
}
}
}
}

Full Text Search as well as Terms Search on same filed of Elasticsearch

I'm from MySql background. So I don't know much about elasticsearch and it's working.
Here is my requirements
There will be table of resulted records with sorting option on all the column. There will be filter option from where user will select multiple values for multiple columns (e.g, City should be from City1, City2, City3 and Category should be from Cat2, Cat22, Cat6). There will be also search bar where user will enter some text and full text search will be applied on some fields (i.e, City, Area etc).
This image will give better understanding.
Where I'm facing problem is Full Text Search. I have tried some mapping but every time I have to compromise either on Full Text Search or Terms Search. So I think there is no any way to apply both search on same field. But as I told, I don;t know much about elasticsearch. So if any one have solution, it will be appreciated.
Here is what I have applied currently which makes sorting and Terms Searching enable but Full Text Search is not working.
{
"mappings":{
"my_type":{
"properties":{
"city":{
"type":"string",
"index":"not_analyzed"
},
"category":{
"type":"string",
"index":"not_analyzed"
},
"area":{
"type":"string",
"index":"not_analyzed"
},
"zip":{
"type":"string",
"index":"not_analyzed"
},
"state":{
"type":"string",
"index":"not_analyzed"
}
}
}
}
}
You can update the mapping with multifields with two mappings one for full text and another for terms search. Here's a sample mapping for city.
{
"city": {
"type": "string",
"index": "not_analyzed",
"fields": {
"fulltext": {
"type": "string"
}
}
}
}
Default mapping is for terms search, so when terms search is required, you could simple query in "city" field. But, you need full-text search, query must be performed on "city.fulltext". Hope this helps.
Full-text search won't work on not_analyzed fields and sorting won't work on analyzed fields.
You need to use multi-fields.
It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations:
For example :
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
} ...
}
}
}
}
Use the dot notation to sort by city.raw :
{
"query": {
"match": {
"city": "york"
}
},
"sort": {
"city.raw": "asc"
}
}

How to create a month over month new companies bar chart

Lets say I create an index in elasticsearch and import data
PUT test
{
"mappings": {
"orders": {
"properties": {
"devices": {
"type": "integer"
},
"location": {
"type":"geo_point"
},
"time" : {
"type":"date",
"format": "epoch_second"
},
"company" : {
"type":"keyword"
}
}
}
}
}
What is fairly simple to do in kibana is getting unique count of companies per month, which is fine, but not good enough to get a number of companies that had the first order in that month. Is this possible in kibana or timelion ? If not any idea how I should save data to elastic so I could get this number ? I am using Kibana 5.1.2
I have added a property type in index called sequence, which is the number of the sequential order, so if it is a first order I gave it 1, if second 2 and so on...
Mapping looks like this
PUT test
{
"mappings": {
"orders": {
"properties": {
"devices": {
"type": "integer"
},
"email": {
"type":"keyword"
},
"company": {
"type":"keyword"
},
"location": {
"type":"geo_point"
},
"sequence":{
"type":"integer"
},
"time": {
"type":"date",
"format": "strict_date_optional_time||epoch_second"
}
}
}
}
}
So for this operation i choose Timelion which is cool and lets you calculate values. The formula for MoM % of new users is...
(new users this month / cumulative users till previous month)
This look in timelion expression like this
.es(q=sequence:1,index=index,timefield=time,metric=cardinality:company)
.divide(.es(index=index,timefield=time,metric=cardinality:company,offset=-1M)
.cusum()).bars()
Note that this query is all one query I only cut it in 3 pieces for easier readibility. And i added bars() just for fun.

Date_histogram Elasticsearch facet can't find field

I am using the date_histogram facet to find results based on a Epoch timestamp. The results are displayed on a histogram, with the date on the x-axis and count of events on the y-axis. Here is the code that I have that doesn't work:
angular.module('controllers', [])
.controller('FacetsController', function($scope, $http) {
var payload = {
query: {
match: {
run_id: '9'
}
},
facets: {
date: {
date_histogram: {
field: 'event_timestamp',
factor: '1000',
interval: 'second'
}
}
}
}
It works if I am using
field: '#timestamp'
which is in ISO8601 format; however, I need it to now work with Epoch timestamps.
Here is an example of what's in my Elasticsearch, maybe this can lead to some answers:
{"#version":"1",
"#timestamp":"2014-07-04T13:13:35.372Z","type":"automatic",
"installer_version":"0.3.0",
"log_type":"access.log","user_id":"1",
"event_timestamp":"1404479613","run_id":"9"}
},
When I run this, I receive this error:
POST 400 (Bad Request)
Any ideas as to what could be wrong here? I don't understand why I'd have such a difference from using the two different fields, as the only difference is the format. I researched as best I could and discovered I should be using 'factor', but that didn't seem to solve my problem. I am probably making a silly beginner mistake!
You need to set the indexing initially. Elasticsearch is good at defaults but it is not possible for it to determine if the provided value is a timestamp, integer or string. So its your job to tell Elasticsearch about the same.
Let me explain by example. Lets consider the following document is what you are trying to index:
{
"#version": "1",
"#timestamp": "2014-07-04T13:13:35.372Z",
"type": "automatic",
"installer_version": "0.3.0",
"log_type": "access.log",
"user_id": "1",
"event_timestamp": "1404474613",
"run_id": "9"
}
So initially you don't have an index and you index your document by making an HTTP request like so:
POST /test/date_experiments
{
"#version": "1",
"#timestamp": "2014-07-04T13:13:35.372Z",
"type": "automatic",
"installer_version": "0.3.0",
"log_type": "access.log",
"user_id": "1",
"event_timestamp": "1404474613",
"run_id": "9"
}
This creates a new index called test and a new doc type in index test called date_experiments.
You can check the mapping of this doc type date_experiments by doing so:
GET /test/date_experiments/_mapping
And what you get in the result is an auto-generated mapping that was generated by Elasticsearch:
{
"test": {
"date_experiments": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"event_timestamp": {
"type": "string"
},
"installer_version": {
"type": "string"
},
"log_type": {
"type": "string"
},
"run_id": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
}
}
}
Notice that the type of event_timestamp field is set to string. Which is why your date_histogram is not working. Also notice that the type of #timestamp field is already date because you pushed the date in the standard format which made easy for Elasticsearch to recognize your intention was to push a date in that field.
Drop this mapping by sending a DELETE request to /test/date_experiments and lets start from the beginning.
This time instead of pushing the document first, we will make the mapping according to our requirements so that our event_timestamp field is considered as a date.
Make the following HTTP request:
PUT /test/date_experiments/_mapping
{
"date_experiments": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "string"
},
"event_timestamp": {
"type": "date"
},
"installer_version": {
"type": "string"
},
"log_type": {
"type": "string"
},
"run_id": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "string"
}
}
}
}
Notice that I have changed the type of event_timestamp field to date. I have not specified a format because Elasticsearch is good at understanding a few standard formats like in the case of #timestamp field where you pushed a date. In this case, Elasticsearch will be able to understand that you are trying to push a UNIX timestamp and convert it internally to treat it as a date and allow all date operations on it. You can specify a date format in the mapping just in case the dates you are pushing are not in any standard formats.
Now you can start indexing your documents and starting running your date queries and facets the same way as you were doing earlier.
You should read more about mapping and date format.

Resources