Issue with Elasticsearch sort - sorting

We have 3 sorting options in our application:
Recommended, Distance and Rating.
User can search by a Service and/or a Location. By default, if location is filled we show results that are within 50 miles of the search location. if service is filled, we show all professionals that provide that service.
When both are selected we query all professional within 50 miles of the location that provide that service.
Now the problem:
I want to show the results from query above, assign weightage to one of the parameters(recommended, distance, rating)
Eg:
Recommended sort: 50% weightage Recommended, 30% rating, 20% distance
Distance sort: 70% weightage distance, 20% recommended, 10% rating
Rating sort: 70% rating, 20% distance, 10% recommended

Related

AWS Quicksight - Top10 offices by REVENUE divided by REVENUE

Is there a way to make a metric in Quicksight that is the ratio between TOP10 offices by REVENUE divided by REVENUE.
Thanks
The problem is that if I apply a filter to select the TOP10 offices by revenue in the numerator, the same filter is the apply to the denominator.
It was not clear from your question exactly what you meant by "ratio between TOP10 offices by REVENUE divided by REVENUE" but I have assumed you wanted the TOP N real total revenue by Store alongside the "ratio" or percent of total revenue of ALL revenue, not just the TOP N stores revenue.
To do this you can use the following calculated fields.
Make a calculated filler field to 'partition' by; that is you can use it to make a single partition of the whole data set, e.g. "single_partition_filler":
ifelse(isNotNull(store),1,0)
Make the ratio calculation you want, "Revenue over Total Revenue". The trick here is to use the "PRE_FILTER" aggregation level in the Table calculations so you are getting the sum of revenue by store PRE_FILTER divided by the sum of revenue by all stores (using the filler column) PRE_FILTER:
sumOver(revenue,[store],PRE_FILTER) / sumOver(revenue, [{single_partition_filler}], PRE_FILTER)
Make a table with "Store","Revenue (Sum)" and "Revenue over Total Revenue (Min)" and using a TOP N Filter for Store by Revenue (Sum). See Quicksight example below:
Compare with the same table unfiltered below:
Dataset used:
store,revenue
A,100
B,50
C,40
D,70
E,60
A,35
C,80

Algorithm for price computing based on periods

I'm creating system for a company renting apartments. All pricing setup is based on some periods. For example an apartment in category 'Junior Studio' there are price periods:
30.05.2016 - 31.01.2017: 3000 EUR
01.02.2017 - Infinity: 4000 EUR
There are also additional periods like: taxes, seasonal price(plus/minus some percent value), and fees based on other periods. So prices can vary often, for example:
31.05.2016 - 30.06.2016 (3500EUR because of some seasonal price period)
01.07-31.08.2016 (5000EUR other seasonal price period)
01.09.2016 - 31.01.2017 (3000 EUR)
01.02.2017 - 4000 EUR.
Also, if someone wants to rent an apartment, for example less than 15 days, there is additional fee, let's say 15% - all this is set up dynamically.
Now the problem is on our page we should let user find apartments based on their price. For example some users want to find only apartments where the price varies between 3000 - 4000 EUR and rent an apartment for 6 months. As I said price can change for example 5 times on those periods so I'm looking to calculate an average price.
Have you any idea how implement this algorithm to incorporate all the specified periods? We assume there can be for example 500 possible records so computing this dynamically could probably cause performance issues.
UPDATE
Here is some code to take periods related to one apartment category for one building:
private RentPriceAggregatedPeriodsDto prepareRentPriceAggregator(Long buildingId, Long categoryId, LocalDate dateFrom, LocalDate dateTo, Integer duration) {
List<CategoryPricePeriod> pricePeriods = categoryPricePeriodRepository.findCategoryPricePeriods(buildingId, categoryId, dateFrom, dateTo);
List<SeasonalPricePeriod> seasonalPricePeriods = seasonalPricePeriodRepository.findSeasonalPricePeriods(buildingId, categoryId, dateFrom, dateTo);
List<LastMinuteRatePeriod> lastMinuteRatePeriods = lastMinuteRatePeriodRepository.findLastMinuteRatePeriods(buildingId, categoryId, dateFrom, dateTo);
List<TaxesDefinitionPeriodDto> taxesDefinition = taxesDefinitionService.findTaxPeriodsForBuildingAndCategory(buildingId, categoryId, TaxTypeCode.VAT,
dateFrom, dateTo);
Optional<SurchargePolicy> surcharge = surchargePolicyRepository.findForDurationAndRentalObjectCategoryIds(categoryId, buildingId, duration);
return new RentPriceAggregatedPeriodsDto(pricePeriods, seasonalPricePeriods, lastMinuteRatePeriods, taxesDefinition, surcharge);
}
Based on all those periods I prepare list of unique price periods: dateFrom, dateTo, currency, value. After those steps I have list of unique prices for one category. Then I need to compute how many days of booking is in each of those unique price periods and multiply it, maybe round + multiply by tax and sum it to have final price for booking. Now re-run those steps, let's say, 500 times (multiple categories in multiple buildings).
As mentioned in the comments, averaging 6 numbers 500 times on the fly should not cause any performance issues.
Even then, if you'd want O(1) performance on computation of price (i.e. the calculation should not depend on the number of price switches in the mentioned period), you could preprocess by defining a date as day 0, and computing the amount of total rent that would be required for all days beyond that. When a user requests the average rent between a period, subtract the total rent till day zero from the two days, giving you the rent for the period in between. Dividing this by the number of days will give you the average rent. You can also add suitable multipliers depending on duration of stay (to add the 15% charge), etc. This is similar to finding the sum of values between two indices in an array in O(1). This is not a memory friendly suggestion, although one can modify it to use less memory.
The advantage is that the computation to give results will not depend on the number of price switches. However, every additional change in apartment rents will cause some amount of preprocessing.
I think you actually need two algorithms. One for representing and querying the object price at any given time. And another one for computing the price for renting an object for a given time period.
As for the representation of the object price, you should make a decision about the temporal granularity you want to support, e.g., days or months. Then create a lookup table or a decision tree, a neural network or anything to lookup the price at the given day or month for the given object or object class. You can factor in all the variables you'd like to have in there. If you want to support special prices for renting full calendar months, have another data structure for this different granularity, which you query with months instead of dates.
Then, given a period of time, you need to generate the corresponding series of dates or months, query for the individual daily or monthly prices and then compute the sum to get the total price. If you want to, you can then compute an average daily/monthly price.
I don't think performance will be an issue here. At least no issue you should address before coming up with an actual solution (because, premature optimization). If it is, consider scaling up your database.

Scoring categories from web logs

I am building a scorer for individual scoring for categories on a website.
Input : userid, category
Output : user id, score_cat_1, score_cat_2 etc...
The score are given on 10.
My plan is to first count for each user how many clicks for each categories, then to divide the results in quantile (maybe a thousand), to finally use a cluster algorithm for each categories quantiles to clsuter them in 10 clusters, who will be ordered, and give the rate.
The idea is to group the quantiles who are close together in a same cluster and get a more interesting score than only saying "the 10% best clickers get a 10, the next 10% get a 9 etc...
My problems are following:
1- do you think it is a good idea? Is there more natural and accurate way to do it?
2- the cluster may be too small, and I can't guarantee the cardinal on each cluster.

How to implement a real estate recommendation engine?

I am talking about something like movie/item recommendation, but it seems that real estate is more tricky. When visiting a web-site and doing some search for RE, the user should be presented with some suggestions. Let's separate the task in two tasks:
a) the user has still not entered any personal info - item based recommendation
b) the user has already entered his/hers details such as income, location, etc. - item/user based recommendation
The first thing that comes to my mind for task a) is to start modeling RE features, but using some ranges instead of exact values. For example:
Area in m2
40 - 50 we can mark it for "1"
50 - 70 is "2"
etc ...
Price:
20 - 30 thousands € will be marked as 1
30 - 40 will be 2
etc ...
Proximity to city center:
1 for the RE being within the city center
2 for Zone 2 or up to 2/3 kilometers from center
3 for Zone 3 or 7 kilometers from center
So having ranges lets us assign a vector to each RE property which will allows us to use: Euclidean distance, Pearson correlation and some nearest neighbor algorithms.
Please comment on my approach or suggest a new one.
If you already have a website with enough traffic, you can try a pure collaborative filtering approach, i.e people who viewed this property also viewed these other properties. You could use the Pearson correlation there for good results.
Similarity between 2 RE can be defined as
number of people who viewed both RE1 and RE2
sim = ---------------------------------------------
number of people who viewed either 1 or both
When a user is viewing property RE you can sort all other RE properties based on the similarity score with the property being shown and show the top few.
You could add some obvious filters on top of this like the location of the property, the price range etc.
You can also define the similarity as you have suggested and mix the results from both for good representation from new RE entries which do not have a high chance of getting in if a pure collaborative filtering algorithm is used.

How to calculate scores?

This question is more related to logic than any programming language. If the question is not apt for the forum please do let me know and I will delete this.
I have to write a logic to calculate scores for blogs for a Blog Award website. A blog may be nominated for multiple award categories and is peer-reviewed or rated by a Jury on a -1 to 5 scale (-1 to indicate a blog they utterly dislike). Now, a blog can be rated by one or more Jurors. One criterion while calculating final score for a blog is that if a blog is rated positively by more people it should get more weightage (and vice-versa). Similarly a blog rated -1 even by one Juror should have its score affected (-1 is sort of a Veto here). Lastly, I also want to have an additional score based on the Technorati rank of the blog (so that the final score is based on a mix of Juror rating + Technorati ranking).
Example: A blog is rated in category A by total 6 Jurors. 2 rate it at 3, 3 rate it at 2 and 1 rate it at 4. (I used to calculate the score as (2*3 + 3*2 + 1*4)/6 = 16/6 = 2.67 to get weighted average but I am not satisfied with this, primarily because it doesn't work well when a Juror rating is -1. Moreover, I need to add the Technorati ranking ranking criteria too) .
Could you help me decide the best way to calculate the final scores (keeping the rating method same as above as that cannot be changed now)?
If you want to weight the effect of a -1 rating more strongly, use the same average score calculation but substitute -10 whenever you see -1. You can choose a value other than -10 if you don't want a negative rating to weight as strongly.
You might look at using the lower bound of the Wilson score interval for your ratings.
See http://www.evanmiller.org/how-not-to-sort-by-average-rating.html for more details. Although, there, it is used for the simpler Bernoulli case.
The gist is if you have a lot of ratings you have a higher degree of confidence in your scoring. You can then combine the scores from your local ratings and the Technorati ratings, by weighting the scores by the number of voters locally and on Technorati.
As for wanting a single -1 vote to have high impact, just remap it to a large negative value proportional to your desired impact before feeding it into your scoring formula.
Calculating a score based on votes will be pretty easy. Adding the technorati rank will be the tricky part.
I made a quick script that calculates some scores based on this algorithm
score = ( vote_sum - ( vetos * veto_weight ) ) / number_of_votes
you can change the url paramters to get different values
There are a lot of ties, so maybe you could use technorati blog rank as a tie breaker
you could internally work with scores from 0 to 6. Just do a shift by one, calculate the score and shift back. I guess the -1 has some disrupting effekt on your calculation.

Resources