I'm hoping that this is a fairly simple solution, but I'm fairly new to Dynamics 365 development and the documentation and prior questions I've seen so far haven't been too helpful.
I need to create a specifically rounded calculated field within an entity. Currently the field has been set up as a decimal type and I have the precision value set to 0 to produce a whole number.
The calculation I am currently using in the field calculation editor is simply x + y / 100. However whatever result comes from that needs to be always rounded up to the next whole number rather than the nearest one. Is there a way to force the field logic to always round upwards?
If a direct answer isn't available, any resources would be appreciated.
I would say it is working as expected since less than 0.5 will round off to lower nearest number (floor) and greater than 0.5 will round off to higher nearest number (ceiling).
To handle your scenario, to always round off to higher nearest number (ceiling) - I may recommend this: Add 49 like (x + y + 49)/100, in other words 0.49 to formula. This quirk came from top of my head but this may be the best option in calculated field perspective.
Related
I am in the process of designing an algorithm that will calculate regions in a candlestick chart where strong areas of support exist. An "area of support" in this case is defined as an area in the chart where the price of a stock rises by a large amount in a short period of time. (Please see the diagram below, the blue dots represent these strong areas of support)
The data I am working with is a list of over 6000 TOHLC (timestamp, open price, high price, low price, close price) values. For example, the first entry in this list of data is:
[1555286400, 83.7, 84.63, 83.7, 84.27]
The way I have structured the algorithm to work is as follows:
1.) The list of 6000+ TOHLC values are split into sub-lists of 30 TOHLC values (30 is a number that I arbitrarily chose). The lowest low price (LLP) is then obtained from each of these sub-lists. The purpose behind using this method is to find areas in the chart where prices dip.
2.) The next step is to determine how high the price rose from each of these lows. For this, I take the next 30 candlestick values from the low and determine what the highest high price (HHP) is. Then, if HHP / LLP >= 1.03, the low price is accepted, otherwise it is discarded. Again, 1.03 is a value that I arbitrarily chose, by analysing the stock chart manually and determining how much the price rose on average from these lows.
The blue dots in the chart above represent the accepted areas of support by the algorithm. It appears to be working well, in terms of that I am trying to achieve.
So the question I have is: does anyone have any improvements they can suggest for this algorithm, or point out any faults in it?
Thanks!
I may have understood wrong, however, from your explanation it seems like you are doing your calculation in separate 30-ish sub lists and then combining them together.
So, what if the LLP is the 30th element of sublist N and HHP is 1st element of sublist N+1 ? If you have taken that into account, then it's fine.
If you haven't taken that into account, I would suggest doing a moving-window type of approach in reading those data. So, you would start from 0th element of 6000+ TOHLC and start with a window size of 30 and slide it 1 by 1. This way, you won't miss any values.
Some of the selected blue dots have higher dip than others. Why is that? I would separate them into another classifier. If you will store them into an object, store the dip rate as well.
Floating point numbers are not suggested in finance. If possible, I'd use a different approach and perhaps classifier, solely using integers. It may not bother you or your project as of now, but surely, it will begin to create false results when the numbers add up in the future.
I'm creating an app to monitor water quality. The temperature data is updated every 2 min to firebase real-time database. App has two requirements
1) It should alert the user when temperature exceed 33 degree or drop below 23 degree - This part is done
2) It should alert user when it has big temperature fluctuation after analysing data every 30min - This part i'm confused.
I don't know what algorithm to use to detect big temperature fluctuation over a period of time and alert the user. Can someone help me on this?
For a period of 30 minutes, your app would give you 15 values.
If you want to figure out a big change in this data, then there is one way to do so.
You can use implement the following method:
Calculate the mean and the standard deviation of the values.
Subtract the data you have from the mean and then take the absolute value of the result.
Compare if the absolute value is greater than one standard deviation, if it is greater then you have a big data.
See this example for better understanding:
Lets suppose you have these values for 10 minutes:
25,27,24,35,28
First Step:
Mean = 27 (apprx)
One standard deviation = 3.8
Second Step: Absolute(Data - Mean)
abs(25-27) = 2
abs(27-27) = 0
abs(24-27) = 3
abs(35-27) = 8
abs(28-27) = 1
Third Step
Check if any of the subtraction is greater than standard deviation
abs(35-27) gives 8 which is greater than 3.8
So, there is a big fluctuation. If all the subtracted results are less than standard deviation, then there is no fluctuation.
You can still improvise the result by selecting two or three standard deviation instead of one standard deviation.
Start by defining what you mean by fluctuation.
You don't say what temperature scale you're using. Fahrenheit, Celsius, Rankine, or Kelvin?
Your sampling rate is a new data value every two minutes. Do you define fluctuation as the absolute value of the difference between the last point and current value? That's defensible.
If the max allowable absolute value is some multiple of your 33-23 = 10 degrees you're in business.
I have some pages on a website and I have to create an ordering based on "popularity"/"activity"
The parameters that I have to use are:
views to the page
comments made on the page (there is a form at the bottom where uses can make comments)
clicks made to the "like it" icon
Are there any standards for what a formula for popularity would be? (if not opinions are good too)
(initially I thought of views + 10*comments + 10*likeit)
Actually there is an accepted best way to calculate this:
http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
You may need to combine 'likes' and 'comments' into a single score, assigning your own weighting factor to each, before plugging it into the formula as the 'positive vote' value.
from the link above:
Score = Lower bound of Wilson score confidence interval for a
Bernoulli parameter
We need to balance the proportion of positive ratings with
the uncertainty of a small number of observations. Fortunately, the
math for this was worked out in 1927 by Edwin B. Wilson. What we want
to ask is: Given the ratings I have, there is a 95% chance that the
"real" fraction of positive ratings is at least what? Wilson gives the
answer. Considering only positive and negative ratings (i.e. not a
5-star scale), the lower bound on the proportion of positive ratings
is given by:
(Use minus where it says plus/minus to calculate the lower bound.)
Here p̂ is the observed fraction of positive ratings, zα/2 is the
(1-α/2) quantile of the standard normal distribution, and n is the
total number of ratings. The same formula implemented in Ruby:
require 'statistics2'
def ci_lower_bound(pos, n, confidence)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-(1-confidence)/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end
pos is the number of positive ratings, n is the total number of
ratings, and confidence refers to the statistical confidence level:
pick 0.95 to have a 95% chance that your lower bound is correct, 0.975
to have a 97.5% chance, etc. The z-score in this function never
changes, so if you don't have a statistics package handy or if
performance is an issue you can always hard-code a value here for z.
(Use 1.96 for a confidence level of 0.95.)
The same formula as an SQL query:
SELECT widget_id, ((positive + 1.9208) / (positive + negative) -
1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) /
(positive + negative)) / (1 + 3.8416 / (positive + negative))
AS ci_lower_bound FROM widgets WHERE positive + negative > 0
ORDER BY ci_lower_bound DESC;
There is no standard formula for this (how could there be?)
What you have looks like a fairly normal solution, and would probably work well. Of course, you should play around with the 10's to find values that suit your needs.
Depending on your requirements, you might also want to add in a time factor (i.e. -X points per week) so that old pages become less popular. Alternatively, you could change your "page views" to "page views in the last month". Again, this depends on your needs, it may not be relevant.
You could do something like what YouTube does - just have it sorted by largest count per category. For example - most viewed, most commented, most liked. In each category a different page could come first, though the rankings might likely be correlated. If you only need a single ranking, then you would have to come up with a formula of some sort, preferably derived empirically by analyzing a bunch of data you already have and deciding what should be calculated as good/bad, and working backwards to fit an equation that fits your decision.
You could even attempt a machine learning approach to "learn" what a good weighting is for combining each of these numbers as in your example formula. Doing it manually might also not be too hard.
I use,
(C*comments + L*likeit)*100/views
where you must use C and L depending on how much you value each attribute.
I use C=1 and L=1.
This gives you the percentage of views that generated a positive action, making the items with
higher percentage the most "popular".
I like this because it makes it possible for newer items to be very popular at first, showing up first and getting more views and thus becoming less popular (or more) until stabilizing.
Anyway,
i hope it helps.
PS: Of it would work just the same without the "*100" but i like percentages.
I would value comments more than 'like it's if the content invites a discussion. If it's just stating facts, an equal ration for comments and the like count seems ok (though 10 is a bit too much, I think...)
Does visit take into account the time the user spent somehow? You might use that, as well, as a 2 second view means less than a 3 minute one.
Java code for Anentropic's answer:
public static double getRank(double thumbsUp, double thumbsDown) {
double totalVotes = thumbsUp + thumbsDown;
if (totalVotes > 0) {
return ((thumbsUp + 1.9208) / totalVotes -
1.96 * Math.sqrt((thumbsUp * thumbsDown) / totalVotes + 0.9604) /
totalVotes) / (1 + (3.8416 / totalVotes));
} else {
return 0;
}
}
I need an algorithm that allows me to determine an appropriate <priority> field for my website's sitemap based on the page's views and comments count.
For those of you unfamiliar with sitemaps, the priority field is used to signal the importance of a page relative to the others on the same website. It must be a decimal number between 0 and 1.
The algorithm will accept two parameters, viewCount and commentCount, and will return the priority value. For example:
GetPriority(100000, 100000); // Damn, a lot of views/comments! The returned value will be very close to 1, for example 0.995
GetPriority(3, 2); // Ok not many users are interested in this page, so for example it will return 0.082
You mentioned doing this in an SQL query, so I'll give samples in that.
If you have a table/view Pages, something like this
Pages
-----
page_id:int
views:int - indexed
comments:int - indexed
Then you can order them by writing
SELECT * FROM Pages
ORDER BY
(0.3+LOG10(10+views)/LOG10(10+(SELECT MAX(views) FROM Pages))) +
(0.7+LOG10(10+comments)/LOG10(10+(SELECT MAX(comments) FROM Pages)))
I've deliberately chosen unequal weighting between views and comments. A problem that can arise with keeping an equal weighting with views/comments is that the ranking becomes a self-fulfilling prophecy - a page is returned at the top of the list, so it's visited more often, and thus gets more points, so it's shown at the stop of the list, and it's visited more often, and it gets more points.... Putting more weight on on the comments reflects that these take real effort and show real interest.
The above formula will give you ranking based on all-time statistics. So an article that amassed the same number of views/comments in the last week as another article amassed in the last year will be given the same priority. It may make sense to repeat the formula, each time specifying a range of dates, and favoring pages with higher activity, e.g.
0.3*(score for views/comments today) - live data
0.3*(score for views/comments in the last week)
0.25*(score for views/comments in the last month)
0.15*(score for all views/comments, all time)
This will ensure that "hot" pages are given higher priority than similarly scored pages that haven't seen much action lately. All values apart from today's scores can be persisted in tables by scheduled stored procedures so that the database isn't having to aggregate many many comments/view stats. Only today's stats are computed "live". Taking it one step further, the ranking formula itself can be computed and stored for historical data by a stored procedure run daily.
EDIT: To get a strict range from 0.1 to 1.0, you would motify the formula like this. But I stress - this will only add overhead and is unecessary - the absolute values of priority are not important - only their relative values to other urls. The search engine uses these to answer the question, is URL A more important/relevant than URL B? It does this by comparing their priorities - which one is greatest - not their absolute values.
// unnormalized - x is some page id
un(x) = 0.3*log(views(x)+10)/log(10+maxViews()) +
0.7*log(comments(x)+10)/log(10+maxComments())
// the original formula (now in pseudo code)
The maximum will be 1.0, the minimum will start at 1.0 and move downwards as more views/comments are made.
we define un(0) as the minimum value, i.e. (where views(x) and comments(x) are both 0 in the above formula)
To get a normalized formula from 0.1 to 1.0, you then compute n(x), the normalized priority for page x
(1.0-un(x)) * (un(0)-0.1)
n(x) = un(x) - ------------------------- when un(0) != 1.0
1.0-un(0)
= 0.1 otherwise.
Priority = W1 * views / maxViewsOfAllArticles + W2 * comments / maxCommentsOfAllArticles
with W1+W2=1
Although IMHO, just use 0.5*log_10(10+views)/log_10(10+maxViews) + 0.5*log_10(10+comments)/log_10(10+maxComments)
What you're looking for here is not an algorithm, but a formula.
Unfortunately, you haven't really specified the details of what you want, so there's no way we can provide the formula to you.
Instead, let's try to walk through the problem together.
You've got two incoming parameters, the viewCount and the commentCount. You want to return a single number, Priority. So far, so good.
You say that Priority should range between 0 and 1, but this isn't really important. If we were to come up with a formula we liked, but resulted in values between 0 and N, we could just divide the results by N-- so this constraint isn't really relevant.
Now, the first thing we need to decide is the relative weight of Comments vs Views.
If page A has 100 comments and 10 views, and page B has 10 comments and 100 views, which should have a higher priority? Or, should it be the same priority? You need to decide what's right for your definition of Priority.
If you decide, for example, that comments are 5 times more valuable than views, then we can begin with a formula like
Priority = 5 * Comments + Views
Obviously, this can be generalized to
Priority = A * Comments + B * Views
Where A and B are relative weights.
But, sometimes we want our weights to be exponential instead of linear, like
Priority = Comment ^ A + Views ^ B
which will give a very different curve than the earlier formula.
Similarly,
Priority = Comment ^ A * Views ^ B
will give higher value to a page with 20 comments and 20 views than one with 1 comment and 40 views, if the weights are equal.
So, to summarize:
You really ought to make a spreadsheet with sample values for Views and Comments, and then play around with various formulas until you get one that has the distribution that you are hoping for.
We can't do it for you, because we don't know how you want to value things.
I know it has been a while since this was asked, but I encountered a similar problem and had a different solution.
When you want to have a way to rank something, and there are multiple factors that you're using to perform that ranking, you're doing something called multi-criteria decision analysis. (MCDA). See: http://en.wikipedia.org/wiki/Multi-criteria_decision_analysis
There are several ways to handle this. In your case, your criteria have different "units". One is in units of comments, the other is in units of views. Futhermore, you may want to give different weight to these criteria based on whatever business rules you come up with.
In that case, the best solution is something called a weighted product model. See: http://en.wikipedia.org/wiki/Weighted_product_model
The gist is that you take each of your criteria and turn it into a percentage (as was previously suggested), then you take that percentage and raise it to the power of X, where X is a number between 0 and 1. This number represents your weight. Your total weights should add up to one.
Lastly, you multiple each of the results together to come up with a rank. If the rank is greater than 1, than the numerator page has a higher rank than the denominator page.
Each page would be compared against every other page by doing something like:
p1C = page 1 comments
p1V = page 1 view
p2C = page 2 comments
p2V = page 2 views
wC = comment weight
wV = view weight
rank = (p1C/p2C)^(wC) * (p1V/p2V)^(wV)
The end result is a sorted list of pages according to their rank.
I've implemented this in C# by performing a sort on a collection of objects implementing IComparable.
What several posters have essentially advocated without conceptual clarification is that you use linear regression to determine a weighting function of webpage view and comment counts to establish priority.
This technique is pretty easy to implement for your problem, and the basic concept is described well in this Wikipedia article on linear regression models.
A quick summary of how to apply it to your problem is:
Determine the parameters of the line which best fits the view and comment count data for all your site's webpages, i.e., use linear regression.
Use the line parameters to derive your priority function for the view/count parameters.
Code examples for basic linear regression should not be hard to track down if you don't want to implement it from scratch from basic math formulas (use the web, Numerical Recipes, etc.). Also, any general math software package like Matlab, R, etc., comes with linear regression functions.
The most naive approach would be the following:
Let v[i] the views of page i, c[i] the number of comments for page i, then define the relative view weight for page i to be
r_v(i) = v[i]/(sum_j v[j])
where sum_j v[j] is the total of the v[.] over all pages. Similarly define the relative comment weight for page i to be
r_c(i) = c[i]/(sum_j c[j]).
Now you want some constant parameter p: 0 < p < 1 which indicates the importance of views over comments: p = 0 means only comments are significant, p = 1 means only views are significant, and p = 0.5 gives equal weight.
Then set the priority to be
p*r_v(i) + (1-p)*r_c(i)
This might be over-simplistic but its probably the best starting point.
Background: I receive a long and lat as parameters to a web service. They are typically up to 6 decimal places. When a new request is received, I calculate the distance between the last recorded loc and the long/lat in the params of the request. If the distance is greater than a certain threshold of miles apart, I update the current loc.
Problem: I use the geokit gem/plugin to calculate the distance between the locs. Very rarely, a bug shows up (the zero distance bug mentioned on the author's site - I'm using 1.4.1 which claims the bug is fixed, but I still see it occurring shrug) that causes the distance calc to return something wildly inaccurate when calculating the distance between two points that are identical (this occurs if the user is not moving). This is causing updates to the user's current loc that should not be happening. You're probably wondering - well if it's just updating the loc to be exactly the same coordinates, who cares? Well, the answer is that a bunch of other crap occurs when the loc is updated that makes it an actual issue.
Attempted Solution: I tried to add in logic to manually check if the two locs are identical before calculating the distance and then just skipping the calc and not updating if that is the case. The incoming parameters are long/lats with 6 decimal precision; whereas, in my database, I store the values as floats, which appears to only store 4 decimal places. This is causing my float comparison to always fail and the inappropriate loc updates continue to occur.
Phew, ok so the actual question is: How should I perform this comparison? Should I truncate 2 of the decimal places from the incoming lat/longs, round up somehow so the fourth digit is correct and then compare? Or, should I do a "within a certain range" sort of comparison (e.g. reported_loc.long > current_loc.long - .0001 && reported_loc.long < current_loc.long + .0001)? Also any recommendations for existing ruby gem/plugins or built in functions to do this sort of thing would be much appreciated.
Here is sample output from the log:
[update_loc] Last location history record at lat: 41.5024, long: -81.6816
[update_loc] Current loc at lat: 41.502467, long: -81.681623
[update_loc] Distance from current loc and last loc history: 5795.10615113555 miles
[update_loc] Locs not identical and distance greater than threshold, inserting new loc history
[update_loc] Location update complete
Thank you
Tom
The usual way to test if two numbers are close is to use abs, ie,
(reported_loc.long - current_loc.long).abs <= tol
where tol is some pre-specified tolerance, eg, 0.0001.
A GPS receiver can give you a location with a precision of many decimals, but that doesn't mean that the measurement is actually that accurate.
Usually about 95% of the measurements lie within a circle of a couple of meters, which is about the same accuracy that you can store with a 32bits float.
However, you will clearly notice rounding errors when you plot a gps log as floats on a map.
Anyway, to do this comparison, I'd use a range instead of rounding the incoming value and comparing that to the database value. You won't be able to detect the smallest movements anymore, but at least you won't get false positives anymore either.
As you're using floats to store stuff, you clearly don't care about millimeters or centimeters anyway.