Negative Boost in Solr - sorting

I have a 'charges' field in my index . I want to boost results whose charges value is not equal to 0 . I tried using the bq parameter for this , but it did not work out .
&bq=charges:"0"^-1
I tried using the above , however I got a 400 error report .

In addition to the answer by #harmstyler
Instead of Boosting negatively, you can boost the the no zero values positively (if charges is an integer field) e.g.
bq=charges:[1 TO *]^10

This is old post but not quite updated, Negative boost is currently supported.
Below from Solr Documentation on negative boost:
Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too.
Part resulting in negative scores for matching documents might not always be true, as explained below.
Example usage: Considering your collection name is product_collection and you want to bury(negative boost) product with specific brand:
http://localhost:8983/solr/product_collection/select?q=shoes&bq=brand:puma^-2&defType=dismax
This query will be parsed to:
"parsedquery_toString": "+((keyword:shoes)^1.0) () (brand:puma)^-2.0"
In this case, -2 factor will be multiplied to tf-idf score of (brand:puma) match, resulting in lower score for documents containing brand puma.
But, adding negative factor in boost query does not mean, it will always produce negative final score for document. For instance, if documents tf-idf score for keyword:shoes match is 3.0 and tf-idf score of brand:puma results in -1.5, still the overall result will be 1.5(positive). So, use negative boost factor accordingly.
One such example from my own collection:
"\n3.4329534 = sum of:\n 6.151505 = weight(keyword:shoes in 5786) [SchemaSimilarity], result of:\n 6.151505 = score(doc=5786,freq=1.0 = termFreq=1.0\n), product of:\n 4.2804184 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 199.0 = docFreq\n 14417.0 = docCount\n 1.437127 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 7.7978773 = avgFieldLength\n 2.0 = fieldLength\n -2.7185516 = weight(brand:puma in 5786) [SchemaSimilarity], result of:\n -2.7185516 = score(doc=5786,freq=1.0 = termFreq=1.0\n), product of:\n -2.0 = boost\n 1.3592758 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n 3704.0 = docFreq\n 14422.0 = docCount\n 1.0 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1) from:\n 1.0 = termFreq=1.0\n 1.2 = parameter k1\n 0.0 = parameter b (norms omitted for field)\n",
Score of keyword:shoes = 6.151505
Score of brand:puma = -2.7185516
Resulting in overall score positive score of 3.4329534

Negative boosts are not supported by Solr. However, with that said, you can boost your content with a very low number to give it the effect of having a negative boost. Remember ^1 is considered the default boost. So, &bq=charges:"0"^1 is the same as &bq=charges:"0".
If you want to create a 'negative boost' try &bq=charges:"0"^0.8. For full documentation view this article.

Related

How the property of modulous (A*B)%m = (A%m * B%m) %m is used to find the mod of very large numbers

I saw the property of mod where
(A*B)%m = (A%m * B%m) %m
And this property is used in the below algorithm to find the mod of very large numbers.
Get one variable to store the answer initialized to zero.
Scan the string from left to right,
every time multiply the answer by 10 and add the next number and take the modulo and store this as the new answer.
But I'm unable to understand this algorithm . How the property is connected to the algorithm here?
It will be helpful if used an example to understand the underneath math behind the algorithm , for example 12345%100
Using this algorithm, 23 % k is computed as
(2%k * 10 + 3)%k
((2%k * 10)%k + 3)%k // because (a+b)%k = (a%k + b)%k (1)
(((2%k)%k * 10%k)%k + 3)%k // because (a*b)%k = (a%k * b%k)%k (2)
((2%k * 10%k)%k + 3)%k // because (a%k)%k = a%k (trivial)
((2 * 10)%k + 3)%k // because (a%k * b%k)%k = (a*b)%k (2)
(2 * 10 + 3)%k // because (a%k + b)%k = (a+b)%k (1)
23%k
In other words, (a%k * p + b)%k = (a * p + b)%k thanks to that property (2). b is the last digit of a number in base p (p = 10 in your example), and a is the rest of the number (all the digits but the last).
In my example, a is just 2, but if you apply this recursively, you have your algorithm. The point is that a * p + b might be too big to handle, but a%k * p + b probably isn't.

Why does finding the eigenvalues of a 4*4 matrix by z3py take so much time and do not give any solutions?

I'm trying to calculate the eigenvalues of a 4*4 matrix called A in my code (I know that the eigenvalues are real values). All the elements of A are z3 expressions and need to be calculated from the previous constraints. The code below is the last part of a long code that tries to calculate matrix A, then its eigenvalues. The code is written as an entire but I've split it into two separate parts in order to debug it: part 1 in which the code tries to find the matrix A and part 2 which is eigenvalues' calculation. In part 1, the code works very fast and calculates A in less than a sec, but when I add part 2 to the code, it doesn't give me any solutions after.
I was wondering what could be the reason? Is it because of the order of the polynomial (which is 4) or what? I would appreciate it if anyone can help me find an alternative way to calculate the eigenvalues or give me some hints on how to rewrite the code so it can solve the problem.
(Note that A2 in the actusl code is a matrix with all of its elements as z3 expressions defined by previous constraints in the code. But, here I've defined the elements as real values just to make the code executable. In this way, the code gives a solution so fast but in the real situation it takes so long, like days.
for example, one of the elements of A is almost like this:
0 +
1*Vq0__1 +
2 * -Vd0__1 +
0 +
((5.5 * Iq0__1 - 0)/64/5) *
(0 +
0 * (Vq0__1 - 0) +
-521702838063439/62500000000000 * (-Vd0__1 - 0)) +
((.10 * Id0__1 - Etr_q0__1)/64/5) *
(0 +
521702838063439/62500000000000 * (Vq0__1 - 0) +
0.001 * (-Vd0__1 - 0)) +
0 +
0 + 0 +
0 +
((100 * Iq0__1 - 0)/64/5) * 0 +
((20 * Id0__1 - Etr_q0__1)/64/5) * 0 +
0 +
-5/64
All the variables in this example are z3 variables.)
from z3 import *
import numpy as np
def sub(*arg):
counter = 0
for matrix in arg:
if counter == 0:
counter += 1
Sub = []
for i in range(len(matrix)):
Sub1 = []
for j in range(len(matrix[0])):
Sub1 += [matrix[i][j]]
Sub += [Sub1]
else:
row = len(matrix)
colmn = len(matrix[0])
for i in range(row):
for j in range(colmn):
Sub[i][j] = Sub[i][j] - matrix[i][j]
return Sub
Landa = RealVector('Landa', 2) # Eigenvalues considered as real values
LandaI0 = np.diag( [ Landa[0] for i in range(4)] ).tolist()
ALandaz3 = RealVector('ALandaz3', 4 * 4 )
############# Building ( A - \lambda * I ) to find the eigenvalues ############
A2 = [[1,2,3,4],
[5,6,7,8],
[3,7,4,1],
[4,9,7,1]]
s = Solver()
for i in range(4):
for j in range(4):
s.add( ALandaz3[ 4 * i + j ] == sub(A2, LandaI0)[i][j] )
ALanda = [[ALandaz3[0], ALandaz3[1], ALandaz3[2], ALandaz3[3] ],
[ALandaz3[4], ALandaz3[5], ALandaz3[6], ALandaz3[7] ],
[ALandaz3[8], ALandaz3[9], ALandaz3[10], ALandaz3[11]],
[ALandaz3[12], ALandaz3[13], ALandaz3[14], ALandaz3[15] ]]
Determinant = (
ALandaz3[0] * ALandaz3[5] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) -
ALandaz3[1] * ALandaz3[4] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) +
ALandaz3[2] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[15] - ALandaz3[13] * ALandaz3[11]) -
ALandaz3[3] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[14] - ALandaz3[13] * ALandaz3[10]) )
tol = 0.001
s.add( And( Determinant >= -tol, Determinant <= tol ) ) # giving some flexibility instead of equalling to zero
print(s.check())
print(s.model())
Note that you seem to be using Z3 for a type of equations it absolutely isn't meant for. Z is a sat/smt solver. Such a solver works internally with a huge number of boolean equations. Integers and fractions can be converted to boolean expressions, but with general floats Z3 quickly reaches its limits. See here and here for a lot of typical examples, and note how floats are avoided.
Z3 can work in a limited way with floats, converting them to fractions, but doesn't work with approximations and accuracies as in needed in numerical algorithms. Therefore, the results are usually not what you are hoping for.
Finding eigenvalues is a typical numerical problem, where accuracy issues are very tricky. Python has libraries such as numpy and scipy to efficiently deal with those. See e.g. numpy.linalg.eig.
If, however your A2 matrix contains some symbolic expressions (and uses fractions instead of floats), sympy's matrix functions could be an interesting alternative.

15 digit floating variable calculation in microcontroller

I want to calculate an equation within a controller(Arduino)
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Now the decimal values of the coefficients are important because "x" varies in thousands so cube term cannot be ignored. I have tried manipulating the equation in excel to reduce the coefficients but R^2 is lost in the process and I would like to avoid that.
Max variable size available in Arduino is 4byte. And on google search, I was not able to find an appropriate solution.
Thank you for your time.
Since
-0.0000000104529251928664 ^ (1/3) = - 0.0021864822
0.0000928316793270531 ^ (1/2) = 0.00963491978
The formula
y = -0.0000000104529251928664x^3 + 0.0000928316793270531x^2 - 0.282333029643959x + 297.661280719026
Can be rewritten:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.282333029643959 * x + 297.661280719026
Rounding all coefficients to 10 decimal places, we get:
y = -(0.0021864822 * x)^3 + (0.00963491978 * x)^2 - 0.2823330296 * x + 297.6612807
But I don't know Arduino, I'm not sure what the correct number of decimal places is, nor do I know what the compiler will accept or refuse.

Calculate cash flows given a target IRR

I apologize if the answer for this is somewhere already, I've been searching for a couple of hours now and I can't find what I'm looking for.
I'm building a simple financial calculator to calculate the cash flows given the target IRR. For example:
I have an asset worth $18,000,000 (which depreciates at $1,000,000/year)
I have a target IRR of 10% after 5 years
This means that the initial investment is $18,000,000, and in year 5, I will sell this asset for $13,000,000
To reach my target IRR of 10%, the annual cash flows have to be $2,618,875. Right now, I calculate this by hand in an Excel sheet through guess-and-check.
There's other variables and functionality, but they're not important for what I'm trying to do here. I've found plenty of libraries and functions that can calculate the IRR for a given number of cash flows, but nothing comes up when I try to get the cash flow for a given IRR.
At this point, I think the only solution is to basically run a loop to plug in the values, check to see if the IRR is higher or lower than the target IRR, and keep calculating the IRR until I get the cash flow that I want.
Is this the best way to approach this particular problem? Or is there a better way to tackle it that I'm missing? Help greatly appreciated!
Also, as an FYI, I'm building this in Ruby on Rails.
EDIT:
IRR Function:
NPV = -(I) + CF[1]/(1 + R)^1 + CF[2]/(1 + R)^2 + ... + CF[n]/(1 + R)^n
NPV = the Net Present Value (this value needs to be as close to 0 as possible)
I = Initial investment (in this example, $18,000,000)
CF = Cash Flow (this is the value I'm trying to calculate - it would end up being $2,618,875 if I calculated it by hand. In my financial calculator, all of the cash flows would be the same since I'm solving for them.)
R = Target rate of return (10%)
n = the year (so this example would end at 5)
I'm trying to calculate the Cash Flows to within a .005% margin of error, since the numbers we're working with are in the hundreds of millions.
Let
v0 = initial value
vn = value after n periods
n = number of periods
r = annual rate of return
y = required annual net income
The one period discount factor is:
j = 1/(1+r)
The present value of the investment is:
pv = - v0 + j*y + j^2*y + j^3*y +..+ j^n*y + j^n*vn
= - v0 + y*(j + j^2 + j^3 +..+ j^n) + j^n*vn
= - v0 + y*sn + j^n*vn
where
sn = j + j^2 + j^3 + j^4 +..+ j^n
We can calulate sn as follows:
sn = j + j^2 + j^3 + j^4 +..+ j^n
j*sn = j^2 + j^3 + j^4 +..+ j^n + j^(n+1)
sn -j*sn = j*(1 - j^n)
sn = j*(1 - j^n)/(1-j)
= (1 - j^n)/[(1+r)(r/(1+r)]
= (1 - j^n)/r
Set pv = 0 and solve for y:
y*sn = v0 - vn * j^n
y = (v0 - vn * j^n)/sn
= r * (v0 - vn * j^n)/(1 - j^n)
Our Ruby method:
def ann_ret(v0, vn, n, r)
j = 1/(1+r)
(r * (v0 - vn * j**n)/(1 - j**n)).round(2)
end
With annual compounding:
ann_ret(18000000, 13000000, 5, 0.1) # => 2618987.4
With semi-annual compounding:
2 * ann_ret(18000000, 13000000, 10, 0.05) # => 2595045.75
With daily compounding:
365 * ann_ret(18000000, 13000000, 5*365, 0.10/365) # => 2570881.20
These values differ slightly from the required annual return you calculate. You should be able to explain the difference by comparing present value formulae.
There's a module called Newton in Ruby... it uses the Newton Raphson method.
I've been using this module to implement the IRR function into this library:
https://github.com/Noverde/exonio
If you need the IRR, you can use like this:
Exonio.irr([-100, 39, 59, 55, 20]) # ==> 0.28095

Suggest Ranking algorithm for Multi User Sortable Lists

I'm building a site where I am giving users the ability to drag and drop to order a list of items to rank them for their "personal view." They can optionally remove an item to hide it from their "personal view."
My question is how can I fairly implement a ranking algorithm to determine the ordering of the items for a shared view that doesn't penalize new items.
It would also help if that can also be used to rank where new items would show up in a users personal list.
So if a new item comes along that is highly ranked by other users, we could display it where we predict the user would rank it related to their other rankings.
My initial thoughts is give points to each item ranked by a user = to the position in a users ranked list. (ex. If there are 10 items, give rank 1 10 pts, 2 9 pts, etc, with negative points awarded for items hidden by the user). And the shared view would sort based on total points. But this would not work well for new items that were largely unranked, and would not easily move up the ladder.
So any thoughts on a fair algorithm that can be predictive for new items?
So I think I have a working solution. By combining the approach I mentioned in the question comment, with the lower bound of Wilson's score confidence interval for a Bernoulli parameter the score seems to align to my expectations.
So to rehash the approach from my comment: user item score = count of items + 1 - rank in the list / count of items. (1 of 3 = 1, 2 of 3 = .667, 2 of 5 = .8).
to give an overall item score I plug into the Wilson formula:
(phat + z*z/(2*n) - z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
Where phat = average of scores, n is number of rankings, z=1.96 (for a 95% confidence ranking).
I mocked up some data in Excel and played around with different scenarios and liked the results. Will move to implementation. Thanks for the help
How about implementing something similar to 9gag ranking system. You can have a shared page where highest ranking items show up and a voting page where users can see new items and rank them accordingly.
I think the important point here is to look at what other users ranking are with respect to other items as well.
"This item is often ranked 3rd" is not useful, I think, whereas "Item under consideration (which we shall call A) is ranked better than item B most of the time" is, because it allows you to create a (maybe fuzzy) ordering of your list of items under consideration.
Essentially, for a new item in a users list, you would implement a kind of insertion sort, where the comparison of two elements is determined by their average order within other peoples lists. In fact, any sort algorithm would work, as long as it depends on having an order between two given elements.
here is my Wilson's score confidence interval for a Bernoulli parameter in node.js
wilson.normaldist = function(qn) {
var b = [1.570796288, 0.03706987906, -0.0008364353589, -0.0002250947176, 0.000006841218299, 0.000005824238515, -0.00000104527497, 0.00000008360937017, -0.000000003231081277, 0.00000000003657763036, 0.0000000000006936233982];
if (qn < 0.0 || 1.0 < qn) return 0;
if (qn == 0.5) return 0;
var w1 = qn;
if (qn > 0.5) w1 = 1.0 - w1;
var w3 = -Math.log(4.0 * w1 * (1.0 - w1));
w1 = b[0];
function loop(i) {
w1 += b[i] * Math.pow(w3, i);
if (i < b.length - 1) loop(++i);
};
loop(1);
if (qn > 0.5) return Math.sqrt(w1 * w3);
else return -Math.sqrt(w1 * w3);
}
wilson.rank = function(up_votes, down_votes) {
var confidence = 0.95;
var pos = up_votes;
var n = up_votes + down_votes;
if (n == 0) return 0;
var z = this.normaldist(1 - (1 - confidence) / 2);
var phat = 1.0 * pos / n;
return ((phat + z * z / (2 * n) - z * Math.sqrt((phat * (1 - phat) + z * z / (4 * n)) / n)) / (1 + z * z / n)) * 10000;
}

Resources