How to get histogram data object from matlab - image

Lets say I have a matrix x=[ 1 2 1 2 1 2 1 2 3 4 5 ]. To look at its histogram, I can do h=hist(x).
Now, h with retrieve a matrix consisting only the number of occurrences and does not store the original value to which it occurred.
What I want is something like a function which takes a value from x and returns number of occurrences of it. Having said that, what one thing histeq does should we admire is, it automatically scales nearest values according!
How should solve this issue? How exactly people do it?
My reason of interest is in images:
Lets say I have an image. I want to find all number of occurrences of a chrominance value of image.

I'm not really shure what you are looking for, but if you ant to use hist to count the number of occurences, use:
[h,c]=hist(x,sort(unique(x)))
Otherwise hist uses ranges defined by centers. The second output argument returns the corresponding number.

hist has a second return value that will be the bin centers xc corresponding to the counts n returned in form of the first return value: [n, xc] = hist(x). You should have a careful look at the reference which describes a large number of optional arguments that control the behavior of hist. However, hist is way too mighty for your specific problem.
To simply count the number of occurrences of a specific value, you could simply use something like sum(x(:) == 42). The colon operator will linearize your image matrix, the equals operator will yield a list of boolean values with 1 for each element of x that was 42, and thus sum will yield the total number of these occurrences.

An alternative to hist / histc is to use bsxfun:
n = unique(x(:)).'; %'// values contained in x. x can have any number of dims
y = sum(bsxfun(#eq, x(:), n)); %// count for each value

Related

A greedy solution for a matrix rearrangment

I am working on something which I feel an NP-hard problem. So, I am not looking for the optimal solution but I am looking for a better heuristics. An integer input matrix (matrix A in the following example) is given as input and I have to produce an integer output matrix (matrix B in the following example) whose number of rows are smaller than the input matrix and should obey the following two conditions:
1) Each column of the output matrix should contain integers in the same order as they appear in the input matrix. (In the example below, first column of the matrix A and matrix B have the same integers 1,3 in the same order.)
2) Same integers must not appear in the same row (In the example below, first row of the matrix B contains the integers 1,3 and 2 which are different from each other.)
Note that the input matrix always obey the 2nd condition.
What a greedy algorithm looks like to solve this problem?
Example:
In this example the output matrix 'Matrix B' contains all the integers as they appear in the input matrix 'Matrix A" but the output matrix has 5 rows and the input matrix has 6 rows. So, the output 'Matrix B' is a valid solution of the input 'Matrix A'.
I would produce the output one row at a time. When working out what to put in the row I would consider the next number from each input column, starting from the input column which has the most numbers yet to be placed, and considering the columns in decreasing order of numbers yet to be placed. Where a column can put a number in the current output row when its turn comes up it should do so.
You could extend this to a branch and bound solution to find the exact best answer. Recursively try all possible rows at each stage, except when you can see that the current row cannot possibly improve on the best answer so far. You know that if you have a column with k entries yet to be placed, in the best possible case you will need at least k more rows.
In practice I would expect that this will be too expensive to be practical, so you will need to ignore some possible rows which you cannot rule out, and so cannot guarantee to find the best answer. You could try using a heuristic search such as Limited Discrepancy search.
Another non-exact speedup is to multiply the estimate for the number of rows that the best possible answer derived from a partial solution will require by some factor F > 1. This will allow you to rule out some solutions earlier than branch and bound. The answer you find can be no more than F times more expensive than the best possible answer, because you only discard possibilities that cannot improve on the current answer by more than a factor of F.
A greedy solution to this problem would involve placing the numbers column by column, top down, as they appear.
Pseudocode:
For each column c in A:
r = 0 // row index of next element in A
nextRow = 0 // row index of next element to be placed in B
while r < A.NumRows()
while r < A.NumRows() && A[r, c] is null:
r++ // increment row to check in A
if r < A.NumRows() // we found a non-null entry in A
while nextRow < A.NumRows() && ~CheckConstraints(A[r,c], B[nextRow, c]):
nextRow++ // increment output row in B
if 'nextRow' >= A.NumRows()
return unsolvable // couldn't find valid position in B
B[nextRow, c] = v // successfully found position in B
++nextRow // increment output row in B
If there are no conflicts you end up "packing" B as tightly as possible. Otherwise you greedily search for the next non-conflicting row position in B. If none can be found, the problem is unsolvable.
The helper function CheckConstraints checks backwards in columns for the same row value in B to ensure the same value hasn't already been placed in a row.
If the problem statement is relaxed such that the output row count in B is <= the row count in A, then if we are unable to pack B any tighter, then we can return A as a solution.

What is the fast way to calculate this summation in MATLAB?

So I have the following constraints:
How to write this in MATLAB in an efficient way? The inputs are x_mn, M, and N. The set B={1,...,N} and the set U={1,...,M}
I did it like this (because I write x as the follwoing vector)
x=[x_11, x_12, ..., x_1N, X_21, x_22, ..., x_M1, X_M2, ..., x_MN]:
%# first constraint
function R1 = constraint_1(M, N)
ee = eye(N);
R1 = zeros(N, N*M);
for m = 1:M
R1(:, (m-1)*N+1:m*N) = ee;
end
end
%# second constraint
function R2 = constraint_2(M, N)
ee = ones(1, N);
R2 = zeros(M, N*M);
for m = 1:M
R2(m, (m-1)*N+1:m*N) = ee;
end
end
By the above code I will get a matrix A=[R1; R2] with 0-1 and I will have A*x<=1.
For example, M=N=2, I will have something like this:
And, I will create a function test(x) which returns true or false according to x.
I would like to get some help from you and optimize my code.
You should place your x_mn values in a matrix. After that, you can sum in each dimension to get what you want. Looking at your constraints, you will place these values in an M x N matrix, where M is the amount of rows and N is the amount of columns.
You can certainly place your values in a vector and construct your summations in the way you intended earlier, but you would have to write for loops to properly subset the proper elements in each iteration, which is very inefficient. Instead, use a matrix, and use sum to sum over the dimensions you want.
For example, let's say your values of x_mn ranged from 1 to 20. B is in the set from 1 to 5 and U is in the set from 1 to 4. As such:
X = vec2mat(1:20, 5)
X =
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
vec2mat takes a vector and reshapes it into a matrix. You specify the number of columns you want as the second element, and it will create the right amount of rows to ensure that a proper matrix is built. In this case, I want 5 columns, so this should create a 4 x 5 matrix.
The first constraint can be achieved by doing:
first = sum(X,1)
first =
34 38 42 46 50
sum works for vectors as well as matrices. If you have a matrix supplied to sum, you can specify a second parameter that tells you in what direction you wish to sum. In this case, specifying 1 will sum over all of the rows for each column. It works in the first dimension, which is the rows.
What this is doing is it is summing over all possible values in the set B over all values of U, which is what we are exactly doing here. You are simply summing every single column individually.
The second constraint can be achieved by doing:
second = sum(X,2)
second =
15
40
65
90
Here we specify 2 as the second parameter so that we can sum over all of the columns for each row. The second dimension goes over the columns. What this is doing is it is summing over all possible values in the set U over all values of B. Basically, you are simply summing every single row individually.
BTW, your code is not achieving what you think it's achieving. All you're doing is simply replicating the identity matrix a set number of times over groups of columns in your matrix. You are actually not performing any summations as per the constraint. What you are doing is you are simply ensuring that this matrix will have the conditions you specified at the beginning of your post to be enforced. These are the ideal matrices that are required to satisfy the constraints.
Now, if you want to check to see if the first condition or second condition is satisfied, you can do:
%// First condition satisfied?
firstSatisfied = all(first <= 1);
%// Second condition satisfied
secondSatisfied = all(second <= 1);
This will check every element of first or second and see if the resulting sums after you do the above code that I just showed are all <= 1. If they all satisfy this constraint, we will have true. Else, we have false.
Please let me know if you need anything further.

Generate a number is range (1,n) but not in a list (i,j)

How can I generate a random number that is in the range (1,n) but not in a certain list (i,j)?
Example: range is (1,500), list is [1,3,4,45,199,212,344].
Note: The list may not be sorted
Rejection Sampling
One method is rejection sampling:
Generate a number x in the range (1, 500)
Is x in your list of disallowed values? (Can use a hash-set for this check.)
If yes, return to step 1
If no, x is your random value, done
This will work fine if your set of allowed values is significantly larger than your set of disallowed values:if there are G possible good values and B possible bad values, then the expected number of times you'll have to sample x from the G + B values until you get a good value is (G + B) / G (the expectation of the associated geometric distribution). (You can sense check this. As G goes to infinity, the expectation goes to 1. As B goes to infinity, the expectation goes to infinity.)
Sampling a List
Another method is to make a list L of all of your allowed values, then sample L[rand(L.count)].
The technique I usually use when the list is length 1 is to generate a random
integer r in [1,n-1], and if r is greater or equal to that single illegal
value then increment r.
This can be generalised for a list of length k for small k but requires
sorting that list (you can't do your compare-and-increment in random order). If the list is moderately long, then after the sort you can start with a bsearch, and add the number of values skipped to r, and then recurse into the remainder of the list.
For a list of length k, containing no value greater or equal to n-k, you
can do a more direct substitution: generate random r in [1,n-k], and
then iterate through the list testing if r is equal to list[i]. If it is
then set r to n-k+i (this assumes list is zero-based) and quit.
That second approach fails if some of the list elements are in [n-k,n].
I could try to invest something clever at this point, but what I have so far
seems sufficient for uniform distributions with values of k much less than
n...
Create two lists -- one of illegal values below n-k, and the other the rest (this can be done in place).
Generate random r in [1,n-k]
Apply the direct substitution approach for the first list (if r is list[i] then set r to n-k+i and go to step 5).
If r was not altered in step 3 then we're finished.
Sort the list of larger values and use the compare-and-increment method.
Observations:
If all values are in the lower list, there will be no sort because there is nothing to sort.
If all values are in the upper list, there will be no sort because there is no occasion on which r is moved into the hazardous area.
As k approaches n, the maximum size of the upper (sorted) list grows.
For a given k, if more value appear in the upper list (the bigger the sort), the chance of getting a hit in the lower list shrinks, reducing the likelihood of needing to do the sort.
Refinement:
Obviously things get very sorty for large k, but in such cases the list has comparatively few holes into which r is allowed to settle. This could surely be exploited.
I might suggest something different if many random values with the same
list and limits were needed. I hope that the list of illegal values is not the
list of results of previous calls to this function, because if it is then you
wouldn't want any of this -- instead you would want a Fisher-Yates shuffle.
Rejection sampling would be the simplest if possible as described already. However, if you didn't want use that, you could convert the range and disallowed values to sets and find the difference. Then, you could choose a random value out of there.
Assuming you wanted the range to be in [1,n] but not in [i,j] and that you wanted them uniformly distributed.
In Python
total = range(1,n+1)
disallowed = range(i,j+1)
allowed = list( set(total) - set(disallowed) )
return allowed[random.randrange(len(allowed))]
(Note that this is not EXACTLY uniform since in all likeliness, max_rand%len(allowed) != 0 but this will in most practical applications be very close)
I assume that you know how to generate a random number in [1, n) and also your list is ordered like in the example above.
Let's say that you have a list with k elements. Make a map(O(logn)) structure, which will ensure speed if k goes higher. Put all elements from list in map, where element value will be the key and "good" value will be the value. Later on I'll explain about "good" value. So when we have the map then just find a random number in [1, n - k - p)(Later on I'll explain what is p) and if this number is in map then replace it with "good" value.
"GOOD" value -> Let's start from k-th element. It's good value is its own value + 1, because the very next element is "good" for us. Now let's look at (k-1)th element. We assume that its good value is again its own value + 1. If this value is equal to k-th element then the "good" value for (k-1)th element is k-th "good" value + 1. Also you will have to store the largest "good" value. If the largest value exceed n then p(from above) will be p = largest - n.
Of course I recommend you this only if k is big number otherwise #Timothy Shields' method is perfect.

octave matrix for loop performance

I am new to Octave. I have two matrices. I have to compare a particular column of a one matrix with the other(my matrix A is containing more than 5 variables, similarly matrix B is containing the same.) and if elements in column one of matrix A is equal to elements in the second matrix B then I have to use the third column of second matrix B to compute certain values.I am doing this with octave by using for loop , but it consumes a lot of time to do the computation for single day , i have to do this for a year . Because size of matrices is very large.Please suggest some alternative way so that I can reduce my time and computation.
Thank you in advance.
Thanks for your quick response -hfs
continuation of the same problem,
Thank u, but this will work only if both elements in both the rows are equal.For example my matrices are like this,
A=[1 2 3;4 5 6;7 8 9;6 9 1]
B=[1 2 4; 4 2 6; 7 5 8;3 8 4]
here column 1 of first element of A is equal to column 1 of first element of B,even the second column hence I can take the third element of B, but for the second element of column 1 is equal in A and B ,but second element of column 2 is different ,here it should search for that element and print the element in the third column,and am doing this with for loop which is very slow because of larger dimension.In mine actual problem I have given for loop as written below:
for k=1:37651
for j=1:26018
if (s(k,1:2)==l(j,1:2))
z=sin((90-s(k,3))*pi/180) , break ,end
end
end
I want an alternative way to do this which should be faster than this.
You should work with complete matrices or vectors whenever possible. You should try commands and inspect intermediate results in the interactive shell to see how they fit together.
A(:,1)
selects the first column of a matrix. You can compare matrices/vectors and the result is a matrix/vector of 0/1 again:
> A(:,1) == B(:,1)
ans =
1
1
0
If you assign the result you can use it again to index into matrices:
I = A(:,1) == B(:,1)
B(I, 3)
This selects the third column of B of those rows where the first column of A and B is equal.
I hope this gets you started.

How to balance the number of items across multiple columns

I need to find out a method to determine how many items should appear per column in a multiple column list to achieve the most visual balance. Here are my criteria:
The list should only be split into multiple columns if the item count is greater than 10.
If multiple columns are required, they should contain no less than 5 (except for the last column in case of a remainder) and no more than 10 items.
If all columns cannot contain an equal number of items
All but the last column should be equal in number.
The number of items in each column should be optimized to achieve the smallest difference between the last column and the other column(s).
Well, your requirements and your examples appear a bit contradictory. For instance, your second example could be divided into two columns with 11 items in each, and satisfy your criteria. Let's assume that for rule #2 you meant that there should be <= 10 items / column.
In addition, I think you need to add another rule to make the requirements sensible:
The number of columns must not be greater than what is required to accomodate overflow.
Otherwise, you will often end up with degenerate solutions where you have far more columns than you need. For example, in the case of 26 items you probably don't want 13 columns of 2 items each.
If that's case, here's a simple calculation that should work well and is easy to understand:
int numberOfColumns = CEILING(numberOfItems / 10);
int numberOfItemsPerColumn = CEILING(numberOfItems / numberOfColumns);
Now you'll create N-1 columns of items (having `numberOfItemsPerColumn each) and the overflow will go in the last column. By this definition, the overflow should be minimized in the last column.
If you want to automatically determine the appropriate number of columns, and have no restrictions on its limits, I would suggest the following:
Calculate the square root of the total number of items. That would make an squared layout.
Divide that number by 1.618, and assign that to the total number of rows.
Multiply that same number by 1.618, and assign that to the total number of columns.
All columns but the right most one will have the same number of items.
By the way, the constant 1.618 is the Golden Ratio. That will achieve a more pleasant layout than a squared one.
Divide and multiply the other way round for vertical displays.
Hope this algorithm helps anyone with a similar problem.
Here's what you're trying to solve:
minimize y - z where n = xy + z and 5 <= y <= 10 and 0 <= z <= y
where you have n items split into x full columns of y items and one remainder column of z items.
There is almost certainly a smart way of doing this, but given these constraints a brute force implementation exploring all 6 + 7 + 8 + 9 + 10 = 40 possible combinations for y and z would take no time at all (only assignments where (n - z) mod y = 0 are solutions).
I think a brute force solution is easy, given the constraint on the number of items per columns: let v be the number of items per column (except the last one), then v belongs to [5,10] and can thus take a whooping 6 different values.
Evaluating 6 values is easy enough. Python one-liner (or not so far) to prove it:
# compute the difference between the number of items for the normal columns
# and for the last column, lesser is better
def helper(n,v):
modulo = n % v
if modulo == 0: return 0
else: return v - modulo
# values can only be in [5,10]
# we compute the difference with the last column for each
# build a list of tuples (difference, - number of items)
# (because the greater the value the better, it means less columns)
# extract the min automatically (in case of equality, less is privileged)
# and then pick the number of items from the tuple and re-inverse it
def compute(n): return - min([(helper(n,v), -v) for v in [5,6,7,8,9,10]])[1]
For 77 this yields: 7 meaning 7 items per columns
For 22 this yields: 8 meaning 8 items per columns

Resources