Suggest an algorithm/method for finding a proper value - algorithm

I have a bunch of values, for example: [1,2,14,51,100,103,107,110,300,505,1034].
And I have a pattern values, for example [1,10,20,100,500,1000].
I need to get the best 'suitable' value FROM pattern. In my example it is 100. How can I detect this value?
Example from life. The app has a bunch of distances between user position and some objects. The app also has a preset filter by distance: [1 meter, 10 meters, 20 meters, 100 meters]. I heed to set the filter by default not just to the first value (1 meter in my example), but to the value which match the bunch of distances the best way(100 meter in my example). I need to detect one value.
Thank you for help and any ideas.

I would say create a function like this (this is not real code) :
var ratio1 = 0.66
var ratio2 = 1.5
function Score(currentPatternValue, arrayOfValues)
{
count = 0
for each value in arrayOfValues <br>
if value > ratio1 * currentPatternValue AND value < ratio2 * currentPatternValue<br>
count++<br>
return count
}
then you run this for each value in your pattern values and pick the one with the highest score returned from that function

Related

How do I add noise/variability to a dataset in Python, given the CV?

Given a dataset of blood results, say cholesterol level, and knowing that the instrument that produced those results is subject to a known degree of variability, how would I add that variability back into the dataset? i.e. I want to assume the result in the original dataset is the true/mean value, and then produce new results that are subject to the known variability of the instrument.
In Excel you use =NORM.INV(RAND(), mean, std_dev), where RAND() provides a random value between 0 and 1, "mean" will be the original value and I have the CV so I can calculate the SD. NORM.INV then provides the inverse of the cumulative normal distribution function.
I've done the following to create a new column with my new values, but would like to know if it is valid (i.e., will each row have a different random number between 0 and 1 as the probability? and is this formula equivalent to NORM.INV?
df8000['HDL_1'] = norm.ppf(random(), loc = df8000['HDL_0'], scale = TAE_df.loc[0,'HDL'])
Thanks in advance!

ElasticSearch Script Field return incorrect value

I have a document with cd field is 44.4
When i using "script_fields as below :
"doc['cd'].value + 1"
script_fields value will return 45.400001525878906
Please help me to use script_fields value return to 45.4
For rounding up your floating point values use Math.round() function.
The below will return the value after two decimal places.
Math.round(doc['your_custom_type_var'].value * 100.0)/100.0
If you want to round up after 3 decimal places then change the value as like:
Math.round(doc['your_custom_type_var'].value * 1000.0)/1000.0
For your case do the followings :
Math.round((doc['cd'].value + 1) * 10.0 - 0.5 )/10.0 // -0.5 for getting the correct result. For this 45.401 and 45.601 both will return 45.6
Notes
Math.round() function returns the closest int to the argument. For example
Math.round(45.40000152) // will return the value 45
Math.round(45.60000152) // will return the value 46
To get the correct answer you can substitute 0.5 to the actual number and then rounding up.Then it will return the value which we want to get.
First we multiple the value with 10.0 for moving the decimal place one unit right(for the above value 454.0000152). After rounding this cuts of the floating point values(for the above value 454), and so we divide the whole numbers by 10.0 for getting the value which rounded up by one decimal place from the actual value(for the above value 45.4).
Think, it will help.

Hashing a long integer ID into a smaller string

Here is the problem, where I need to transform an ID (defined as a long integer) to a smaller alfanumeric identifier. The details are the following:
Each individual on the problem as an unique ID, a long integer of size 13 (something like 123123412341234).
I need to generate a smaller representation of this unique ID, a alfanumeric string, something like A1CB3X. The problem is that 5 or 6 character length will not be enough to represent such a large integer.
The new ID (eg A1CB3X) should be valid in a context where we know that only a small number of individuals are present (less than 500). The new ID should be unique within that small set of individuals.
The new ID (eg A1CB3X) should be the result of a calculation made over the original ID. This means that taking the original ID elsewhere and applying the same calculation, we should get the same new ID (eg A1CB3X).
This calculation should occur when the individual is added to the set, meaning that not all individuals belonging to that set will be know at that time.
Any directions on how to solve such a problem?
Assuming that you don't need a formula that goes in both directions (which is impossible if you are reducing a 13-digit number to a 5 or 6-character alphanum string):
If you can have up to 6 alphanumeric characters that gives you 366 = 2,176,782,336 possibilities, assuming only numbers and uppercase letters.
To map your larger 13-digit number onto this space, you can take a modulo of some prime number slightly smaller than that, for example 2,176,782,317, the encode it with base-36 encoding.
alphanum_id = base36encode(longnumber_id % 2176782317)
For a set of 500, this gives you a
2176782317P500 / 2176782317500 chance of a collision
(P is permutation)
Best option is to change the base to 62 using case sensitive characters
If you want it to be shorter, you can add unicode characters. See below.
Here is javascript code for you: https://jsfiddle.net/vewmdt85/1/
function compress(n) {
var symbols = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïð'.split('');
var d = n;
var compressed = '';
while (d >= 1) {
compressed = symbols[(d - (symbols.length * Math.floor(d / symbols.length)))] + compressed;
d = Math.floor(d / symbols.length);
}
return compressed;
}
$('input').keyup(function() {
$('span').html(compress($(this).val()))
})
$('span').html(compress($('input').val()))
How about using some base-X conversion, for example 123123412341234 becomes 17N644R7CI in base-36 and 9999999999999 becomes 3JLXPT2PR?
If you need a mapping that works both directions, you can simply go for a larger base.
Meaning: using base 16, you can reduce 1 to 16 to a single character.
So, base36 is the "maximum" that allows for shorter strings (when 1-1 mapping is required)!

calculate standard deviation of daily data within a year

I have a question,
In Matlab, I have a vector of 20 years of daily data (X) and a vector of the relevant dates (DATES). In order to find the mean value of the daily data per year, I use the following script:
A = fints(DATES,X); %convert to financial time series
B = toannual(A,'CalcMethod', 'SimpAvg'); %calculate average value per year
C = fts2mat(B); %Convert fts object to vector
C is a vector of 20 values. showing the average value of the daily data for each of the 20 years. So far, so good.. Now I am trying to do the same thing but instead of calculating mean values annually, i need to calculate std annually but it seems there is not such an option with function "toannual".
Any ideas on how to do this?
THANK YOU IN ADVANCE
I'm assuming that X is the financial information and it is an even distribution across each year. You'll have to modify this if that isn't the case. Just to clarify, by even distribution, I mean that if there are 20 years and X has 200 values, each year has 10 values to it.
You should be able to do something like this:
num_years = length(C);
span_size = length(X)/num_years;
for n = 0:num_years-1
std_dev(n+1,1) = std(X(1+(n*span_size):(n+1)*span_size));
end
The idea is that you simply pass the date for the given year (the day to day values) into matlab's standard deviation function. That will return the std-dev for that year. std_dev should be a column vector that correlates 1:1 with your C vector of yearly averages.
unique_Dates = unique(DATES) %This should return a vector of 20 elements since you have 20 years.
std_dev = zeros(size(unique_Dates)); %Just pre allocating the standard deviation vector.
for n = 1:length(unique_Dates)
std_dev(n) = std(X(DATES==unique_Dates(n)));
end
Now this is assuming that your DATES matrix is passable to the unique function and that it will return the expected list of dates. If you have the dates in a numeric form I know this will work, I'm just concerned about the dates being in a string form.
In the event they are in a string form you can look at using regexp to parse the information and replace matching dates with a numeric identifier and use the above code. Or you can take the basic theory behind this and adapt it to what works best for you!

Conditional Count inside of Group in .rdlc?

I have a .rdlc report, grouped.
Inside each group, I have an Id. Some of them will be positives, and others will be negative.
I need the difference between de quantity of positives Id's and negatives Id's
Something like
=CountDistinct(Fields!Id.Value) where Fields!Id.Value > 0 - CountDistinct(Fields!Id.Value) where Fields!Id.Value < 0
How Can I do that ? I'm thinking on a function, but I want to know if there is a simply way
Edit: An Id can be more than once time in each group, that's why I use CountDistinct
You can try this:
CountDistinct(IIf(Fields!Id.Value > 0, Fields!Id.Value, Nothing))
create 2 global variables. one for positive and one for negative.
Then create a new formula that counts them like the following:
WhilePrintingRecords;
IF (GroupName ({your_group_name}) > 0) THEN
Positive var = Positive var + 1;
ELSE
Negative var = Negative var + 1;
You can actually look for your group in the formulas and drag it to the editor while writing the formula.
Since its a operation in group level, the records should be read first. Thats why we use whilePrintingRecords rather than whileReadingRecords.
Hope I understood your question right.

Resources