How to calculate precision and recall - precision-recall

The database and the classification rule, how to calculate precision and recall?
MinSupp=3% và MinConf=30%
No. outlook temperature humidity windy play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
3 overcast hot high FALSE yes
4 rainy mild high FALSE yes
5 rainy cool normal FALSE yes
6 rainy cool normal TRUE no
7 overcast cool normal TRUE yes
8 sunny mild high FALSE no
9 sunny cool normal FALSE yes
10 rainy mild normal FALSE yes
11 sunny mild normal TRUE yes
12 overcast mild high TRUE yes
13 overcast hot normal FALSE yes
14 rainy mild high TRUE no
Rule found:
1: (outlook,overcast) -> (play,yes)
[Support=0.29 , Confidence=1.00 , Correctly Classify= 3, 7, 12, 13]
2: (humidity,normal), (windy,FALSE) -> (play,yes)
[Support=0.29 , Confidence=1.00 , Correctly Classify= 5, 9, 10]
3: (outlook,sunny), (humidity,high) -> (play,no)
[Support=0.21 , Confidence=1.00 , Correctly Classify= 1, 2, 8]
4: (outlook,rainy), (windy,FALSE) -> (play,yes)
[Support=0.21 , Confidence=1.00 , Correctly Classify= 4]
5: (outlook,sunny), (humidity,normal) -> (play,yes)
[Support=0.14 , Confidence=1.00 , Correctly Classify= 11]
6: (outlook,rainy), (windy,TRUE) -> (play,no)
[Support=0.14 , Confidence=1.00 , Correctly Classify= 6, 14]
Thanks.

I think that all you need to know about precision and recall can be found here.
In plain English, the precision is how many actually correct results your system retrieved / how many results your system pointed as correct. In the same way, the recall would be how many actually correct results your system retrieved / the total number of actually correct results available in your dataset.

Related

algorithm to maximize sum of unique set with multiple contributors

Im looking for an approach to maximize the value of a common set comprised of contributions from multiple sources with a fixed number of contributions from each.
Example problem: 3 people each have a hand of cards. Each hand contains a unique set, but the 3 sets may overlap. Each player can pick three cards to contribute to the middle. How can I maximize the sum of the 9 contributed cards where
each player contributes exactly 3 cards
all 9 cards are unique (when possible)
solution which can scale in the range of 200 possible "cards", 40
contributors and 6 contributions each.
Integer-programming sounds like a viable approach. Without guaranteeing it, this problem also feels NP-hard, meaning: there is no general algorithm beating brute-force (without assumptions about the possible input; IP-solvers actually do assume a lot / are tuned for real-world problems).
(Alternative off-the-shelf approaches: Constraint-programming and SAT-solvers; CP: easy to formulate, faster in regards to combinatorial-search but less good using branch-and-bound style in terms of maximization; SAT: hard to formulate as counters need to build, very fast combinatorial-search and again: no concept of maximization: needs decision-problem like transform).
Here is some python-based complete example solving this problem (in the hard-constraint version; each player has to play all his cards). As i'm using cvxpy, the code is quite in math-style and should be easy to read despite not knowing python or the lib!
Before presenting the code, some remarks:
General remarks:
The IP-approach is heavily dependent on the underlying solver!
Commercial solvers (Gurobi and co.) are the best
Good open-source solvers: CBC, GLPK, lpsolve
The default-solver in cvxpy is not ready for this (when increasing the problem)!
In my experiment, with my data, commercial solvers scale very well!
A popular commercial-solver needs a few seconds for:
N_PLAYERS = 40 , CARD_RANGE = (0, 400) , N_CARDS = 200 , N_PLAY = 6
Using cvxpy is not best-practice as it's created for very different use-cases and this induces some penalty in times of model-creation time
I'm using it because i'm familiar with it and i love it
Improvements: Problem
We are solving the each-player-plays-exactly-n_cards here
Sometimes there is no solution
Your model-description does not formally describe how to handle this
General idea to improve the code:
bigM-style penalty-based objective: e.g. Maximize(n_unique * bigM + classic_score)
(where bigM is a very big number)
Improvements: Performance
We are building all those pairwise-conflicts and use a classic not-both constraint
The number of conflicts, depending on the task can grow a lot
Improvement idea (too lazy to add):
Calculate the set of maximal cliques and add these as constraints
Will be much more powerful, but:
For general conflict-graphs this problem should be NP-hard too, so an approximation algorithm needs to be used
(opposed to other applications like time-invervals, where this set can be calculated in polynomial time as the graphs will be chordal)
Code:
import numpy as np
import cvxpy as cvx
np.random.seed(1)
""" Random problem """
N_PLAYERS = 5
CARD_RANGE = (0, 20)
N_CARDS = 10
N_PLAY = 3
card_set = np.arange(*CARD_RANGE)
p = np.empty(shape=(N_PLAYERS, N_CARDS), dtype=int)
for player in range(N_PLAYERS):
p[player] = np.random.choice(card_set, size=N_CARDS, replace=False)
print('Players and their cards')
print(p)
""" Preprocessing:
Conflict-constraints
-> if p[i, j] == p[x, y] => don't allow both
Could be made more efficient
"""
conflicts = []
for p_a in range(N_PLAYERS):
for c_a in range(N_CARDS):
for p_b in range(p_a + 1, N_PLAYERS): # sym-reduction
if p_b != p_a:
for c_b in range(N_CARDS):
if p[p_a, c_a] == p[p_b, c_b]:
conflicts.append( ((p_a, c_a), (p_b, c_b)) )
# print(conflicts) # debug
""" Solve """
# Decision-vars
x = cvx.Bool(N_PLAYERS, N_CARDS)
# Constraints
constraints = []
# -> Conflicts
for (p_a, c_a), (p_b, c_b) in conflicts:
# don't allow both -> linearized
constraints.append(x[p_a, c_a] + x[p_b, c_b] <= 1)
# -> N to play
constraints.append(cvx.sum_entries(x, axis=1) == N_PLAY)
# Objective
objective = cvx.sum_entries(cvx.mul_elemwise(p.flatten(order='F'), cvx.vec(x))) # 2d -> 1d flattening
# ouch -> C vs. Fortran storage
# print(objective) # debug
# Problem
problem = cvx.Problem(cvx.Maximize(objective), constraints)
problem.solve(verbose=False)
print('MIP solution')
print(problem.status)
print(problem.value)
print(np.round(x.T.value))
sol = x.value
nnz = np.where(abs(sol - 1) <= 0.01) # being careful with fp-math
sol_p = p[nnz]
assert sol_p.shape[0] == N_PLAYERS * N_PLAY
""" Output solution """
for player in range(N_PLAYERS):
print('player: ', player, 'with cards: ', p[player, :])
print(' plays: ', sol_p[player*N_PLAY:player*N_PLAY+N_PLAY])
Output:
Players and their cards
[[ 3 16 6 10 2 14 4 17 7 1]
[15 8 16 3 19 17 5 6 0 12]
[ 4 2 18 12 11 19 5 6 14 7]
[10 14 5 6 18 1 8 7 19 15]
[15 17 1 16 14 13 18 3 12 9]]
MIP solution
optimal
180.00000005500087
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 0. 1. 0.]
[ 1. 0. 0. -0. -0.]
[ 1. -0. 1. 0. 1.]
[ 0. 1. 1. 1. 0.]
[ 0. 1. 0. -0. 1.]
[ 0. -0. 1. 0. 0.]
[ 0. 0. 0. 0. -0.]
[ 1. -0. 0. 0. 0.]
[ 0. 0. 0. 1. 1.]]
player: 0 with cards: [ 3 16 6 10 2 14 4 17 7 1]
plays: [ 6 10 7]
player: 1 with cards: [15 8 16 3 19 17 5 6 0 12]
plays: [ 8 19 17]
player: 2 with cards: [ 4 2 18 12 11 19 5 6 14 7]
plays: [12 11 5]
player: 3 with cards: [10 14 5 6 18 1 8 7 19 15]
plays: [14 18 15]
player: 4 with cards: [15 17 1 16 14 13 18 3 12 9]
plays: [16 13 9]
Looks like a packing problem, where you want to pack 3 disjoint subsets of your original sets, each of size 3, and maximize the sum. You can formulate it as an ILP. Without loss of generality, we can assume the cards represent natural numbers ranging from 1 to N.
Let a_i in {0,1} indicate if player A plays card with value i, where i is in {1,...,N}. Notice that if player A doesn't have card i in his hand, a_i is set to 0 in the beginning.
Similarly, define b_i and c_i variables for players B and C.
Also, similarly, let m_i in {0,1} indicate if card i will appear in the middle, i.e., one of the players will play a card with value i.
Now you can say:
Maximize Sum(m_i . i), subject to:
For each i in {1,...N,}:
a_i, b_i, c_i, m_i are in {0, 1}
m_i = a_i + b_i + c_i
Sum(a_i) = 3, Sum(b_i) = 3, Sum(c_i) = 3
Discussion
Notice that constraint 1 and 2, force the uniqueness of each card in the middle.
I'm not sure how big of a problem can be handled by commercial or non-commercial solvers with this program, but notice that this is really a binary linear program, which might be simpler to solve than the general ILP, so it might be worth trying for the size you are looking for.
Sort each hand, dropping duplicate values. Delete anything past the 10-th highest card of any hand (3 hands * 3 cards/hand, plus 1): nobody can contribute a card that low.
For accounting purposes, make a directory by card value, showing which hands hold each value. For instance, given players A, B, C and these hands
A [1, 1, 1, 6, 4, 12, 7, 11, 13, 13, 9, 2, 2]
B [13, 2, 3, 1, 5, 5, 8, 9, 11, 10, 5, 5, 9]
C [13, 12, 11, 10, 6, 7, 2, 4, 4, 12, 3, 10, 8]
We would sort and de-dup the hands. 2 is the 10th-highest card of hand c, so we drop all values 2 and below. Then build the directory
A [13, 12, 11, 9, 7, 6]
B [13, 11, 10, 9, 8, 5, 3]
C [13, 12, 11, 10, 8, 7, 6, 4, 3]
Directory:
13 A B C
12 A C
11 A B C
10 B C
9 A B
8 B C
7 A B
6 A C
5 B
4 C
3 B C
Now, you need to implement a backtracking algorithm to choose cards in some order, get the sum of that order, and compare with the best so far. I suggest that you iterate through the directory, choosing a hand from which to obtain the highest remaining card, backtracking when you run out of contributors entirely, or when you get 9 cards.
I recommend that you maintain a few parameters to allow you to prune the investigation, especially when you get into the lower values.
Make a maximum possible value, the sum of the top 9 values in the directory. If you hit this value, stop immediately, as you've found an optimum solution.
Make a high starting target: cycle through the hands in sequence, taking the highest usable card remaining in the hand. In this case, cycling A-B-C, we would have
13, 11, 12, 9, 10, 8, 7, 5, 6 => 81
// Note: because of the values I picked
// this happens to provide an optimum solution.
// It will do so far a lot of the bridge-hand problem space.
Keep count of how many cards have been contributed by each hand; when one has given its 3 cards, disqualify it in some way: have a check in the choice code, or delete it from the local copy of the directory.
As you walk down the choice list, prune the search any time the remaining cards are insufficient to reach the best-so-far total. For instance, if you have a total of 71 after 7 cards, and the highest remaining card is 5, stop: you can't get to 81 with 5+4.
Does that get you moving?

Euler18 dynamic algorithm

Given the array [5, 4, 12, 3, 11, 7, 2, 8, 1, 9] that forms a triangle like so:
5
4 12
3 11 7
2 8 1 9
Result should be 5 + 12 + 7 + 9 = 31.
Write a function that will traverse the triangle and find the largest possible sum of values when you can go from one point to either directly bottom left, or bottom right.
Refering to the dynamic algorithm in that link:
http://www.mathblog.dk/project-euler-18/
Result is 36.
5
4 12
3 11 7
2 8 1 9
5
4 12
11 19 16
5
23 31
36
Where is my mistake ??
The description of Problem 18 starts with an example where the optimal path is “left-right-right”. So you get a new choice of direction after every step, which means that after taking the first step to the right, you are still free to take the second step to the left and eventually come up with 5+12+11+8=36 as the optimal solution in your example, larger than the 31 you assumed. So the computation is correct in solving the problem as described. Your assumption about choosing a direction only once and then sticking with that choice would lead to a different (and rather boring) problem.

How to solve this ILP/CP matrix puzzle

I'm studying about algorithms and recently found the interesting challenge.
It will give us some row/column, and our mission is to fill table with integer 1~N which displays only once and their row and column sums are equal to given row/column.
The challenge simple example:
[ ] [ ] [ ] 13
[ ] [ ] [ ] 8
[ ] [ ] [ ] 24
14 14 17
answer:
[2] [6] [5] 13
[3] [1] [4] 8
[9] [7] [8] 24
14 14 17
Thanks
As far as I know there is no straightforward algorithm to solve this specific problem more efficiently than using a backtracking approach.
You can however do this more intelligently than simply enumerating over all possible solutions. An efficient way to do this is Constraint Programming (CP) (or derived paradigms like Constraint Logic Programming (CLP)). Basically it comes down on reasoning about the constraints you have put on your problem trying to reduce the domain of the variables.
After reducing the domains, you make a choice on which you can later backtrack. After making such choice you again reduce domains and possibly have to make additional choices.
You can for instance use ECLiPSe (not the IDE, but a constraint logic programming tool) for this:
:- lib(ic).
:- import alldifferent/1 from ic_global.
:- import sumlist/2 from ic_global.
solve(Problem) :-
problem(Problem,N,LA,LB),
puzzle(N,LA,LB,Grid),
print_Grid(Grid).
puzzle(N,LA,LB,Grid) :-
N2 is N*N,
dim(Grid,[N,N]),
Grid[1..N,1..N] :: 1..N2,
(for(I,1,N), param(N,Grid,LA,LB) do
Sc is nth1(I,LA),
Lc is Grid[1..N,I],
sumlist(Lc,Sc),
Sr is nth1(I,LB),
Lr is Grid[I,1..N],
sumlist(Lr,Sr)
),
term_variables(Grid,Vars),
alldifferent(Vars),
labeling(Vars).
print_Grid(Grid) :-
dim(Grid,[N,N]),
( for(I,1,N), param(Grid,N) do
( for(J,1,N), param(Grid,I) do
X is Grid[I,J],
( var(X) -> write(" _") ; printf(" %2d", [X]) )
), nl
), nl.
nth1(1,[H|_],H) :- !.
nth1(I,[_|T],H) :-
I1 is I-1,
nth1(I1,T,H).
problem(1,3,[14,14,17],[13,8,24]).
The program is vaguely based on my implementation for multi-sudoku. Now you can solve the problem using ECLiPSe:
ECLiPSe Constraint Logic Programming System [kernel]
Kernel and basic libraries copyright Cisco Systems, Inc.
and subject to the Cisco-style Mozilla Public Licence 1.1
(see legal/cmpl.txt or http://eclipseclp.org/licence)
Source available at www.sourceforge.org/projects/eclipse-clp
GMP library copyright Free Software Foundation, see legal/lgpl.txt
For other libraries see their individual copyright notices
Version 6.1 #199 (x86_64_linux), Sun Mar 22 09:34 2015
[eclipse 1]: solve(1).
lists.eco loaded in 0.00 seconds
WARNING: module 'ic_global' does not exist, loading library...
queues.eco loaded in 0.00 seconds
ordset.eco loaded in 0.00 seconds
heap_array.eco loaded in 0.00 seconds
graph_algorithms.eco loaded in 0.03 seconds
max_flow.eco loaded in 0.00 seconds
flow_constraints_support.eco loaded in 0.00 seconds
ic_sequence.eco loaded in 0.01 seconds
ic_global.eco loaded in 0.05 seconds
2 5 6
3 1 4
9 8 7
Yes (0.05s cpu, solution 1, maybe more) ? ;
5 2 6
1 3 4
8 9 7
Yes (0.05s cpu, solution 2, maybe more) ? ;
2 6 5
3 1 4
9 7 8
Yes (0.05s cpu, solution 3, maybe more) ? ;
3 6 4
2 1 5
9 7 8
Yes (0.05s cpu, solution 4, maybe more) ? ;
6 2 5
1 3 4
7 9 8
Yes (0.05s cpu, solution 5, maybe more) ? ;
6 3 4
1 2 5
7 9 8
Yes (0.05s cpu, solution 6, maybe more) ? ;
2 6 5
4 1 3
8 7 9
Yes (0.05s cpu, solution 7, maybe more) ? ;
4 6 3
2 1 5
8 7 9
Yes (0.05s cpu, solution 8, maybe more) ?
6 2 5
1 4 3
7 8 9
Yes (0.05s cpu, solution 9, maybe more) ? ;
6 4 3
1 2 5
7 8 9
Yes (0.05s cpu, solution 10, maybe more) ? ;
No (0.06s cpu)
One simply queries solve(1) and the constraint logic programming tool does the rest. There are thus a total of 10 solutions.
Note that the program works for an arbitrary N, although - since worst case this program performs backtracking - evidently the program can only solve the problems for a reasonable N.
Oh, I just love it when these little optimisation problems pop up. They always remind me of that one time in my very first year when I built a thing that would solve Sudoku's and had a ton of fun with it! You may guess how many sudoku's I've solved ever since :).
Now, your problem is an ILP (Integer Linear Program). Even before you read up on that article, you should take note that ILP's are hard. Restricting the solution space to N or Z is severely limiting and oftentimes, such a solution does not exist!
For your problem, the task at hand essentially boils down to solving this,
Minimise 0 (arbitrary objective function)
Subject to,
x1 + x2 + x3 = 13
x4 + x5 + x6 = 8
x7 + x8 + x9 = 24
x1 + x4 + x7 = 14
x2 + x5 + x8 = 14
x3 + x6 + x9 = 17
And,
x_i in N, x_i distinct.
In matrix form, these equations become,
|1 1 1 0 0 0 0 0 0|
|0 0 0 1 1 1 0 0 0|
A = |0 0 0 0 0 0 1 1 1|
|1 0 0 1 0 0 1 0 0|
|0 1 0 0 1 0 0 1 0|
|0 0 1 0 0 1 0 0 1|
And,
|13|
| 8|
B = |24|
|14|
|14|
|17|
Such that the constraints reduce to A*x = B. So the problem we want to solve can now equivalently be written as,
Minimise 0
Subject to,
A * x = B
And,
x in N^7, x_i distinct.
Does this look hard to you? If not, think about this: the real line is huge, and on that line, every once in a while, is a tiny dot. That's an integer. We need some of those. We do not know which ones. So essentially, a perfect analogy would be looking for a needle in a haystack.
Now, do not despair, we are surprisingly good at finding these ILP needles! I just want you to appreciate the nontrivial difficulty of the field this problem stems from.
I want to give you working code, but I do not know your preferred choice of language/toolkit. If this is just a hobbyist approach, even Excel's solver would work beautifully. If it is not, I do not think I could've phrased it any better than Willem Van Onsem already has, and I would like to direct you to his answer for an implementation.
Below is another Constraint Programming model, using a similar approach as Willem Van Onsem's solution, i.e. using the global constraint "all_different", which is an efficient method to ensure that the numbers in the matrix are assigned only once. (The concept of "global constraints" is very important in CP and there is much research finding fast implementations for different kind of common constraints.)
Here's a MiniZinc model: http://hakank.org/minizinc/matrix_puzzle.mzn
include "globals.mzn";
% parameters
int: rows;
int: cols;
array[1..rows] of int: row_sums;
array[1..cols] of int: col_sums;
% decision variables
array[1..rows,1..cols] of var 1..rows*cols: x;
solve satisfy;
constraint
all_different(array1d(x)) /\
forall(i in 1..rows) (
all_different([x[i,j] | j in 1..cols]) /\
sum([x[i,j] | j in 1..cols]) = row_sums[i]
)
/\
forall(j in 1..cols) (
all_different([x[i,j] | i in 1..rows]) /\
sum([x[i,j] | i in 1..rows]) = col_sums[j]
);
output [
if j = 1 then "\n" else " " endif ++
show_int(2,x[i,j])
| i in 1..rows, j in 1..cols
];
% Problem instance
rows = 3;
cols = 3;
row_sums = [13,8,24];
col_sums = [14,14,17];
Here are the first two (of 10) solutions:
2 5 6
3 1 4
9 8 7
----------
5 2 6
1 3 4
8 9 7
----------
...
An additional comment: A fun thing with CP - as well as an important concept - is that it is possible to generate new problem instances using almost the identical model: http://hakank.org/minizinc/matrix_puzzle2.mzn
The only difference is the following lines, i.e. change "row_sums" and "col_sums" to decision variables and comment the hints.
array[1..rows] of var int: row_sums; % add "var"
array[1..cols] of var int: col_sums; % add "var"
% ...
% row_sums = [13,8,24];
% col_sums = [14,14,17];
Here are three generated problem instances (of 9!=362880 possible):
row_sums: [21, 15, 9]
col_sums: [19, 15, 11]
5 9 7
8 4 3
6 2 1
----------
row_sums: [20, 16, 9]
col_sums: [20, 14, 11]
5 8 7
9 4 3
6 2 1
----------
row_sums: [22, 14, 9]
col_sums: [18, 15, 12]
5 9 8
7 4 3
6 2 1
----------
I think the backtracking alghorithm would work very well here.
Altough the backtracking is still "brute-force", it can be really fast in average case. For example solving SUDOKU with backtracking usually takes only 1000-10000 iterations (which is really fast considering that the O-complexity is O(9^n), where n are empty spaces, therefore average sudoku have about ~ 9^60 possibilities, which would take years on average computer to finish).
This task has a lot of rules (uniqueness of numbers and sum at rows/cols) which is quite good for bactracking. More rules = more checking after each step and throwing away branches that cant bring a solution.
This can help : https://en.wikipedia.org/wiki/Sudoku_solving_algorithms

codejam 2015 qualification round: Infinite House of Pancakes

The problem description and solutions after contest analysis
There is one case I can't figure out: If there is one plate, 9 pancakes, that is the test case
1
9
the "correct answer" is 5
But how? Here is my "faulty" thinking:
9 -> {4, 5} -> {4,3,2} -> {3,2,2,2}
So it totals 3 + 3 = 6 minutes, not 5
Anything obvious that I misunderstood?
I managed to fail on this one during the competition too because I assumed that the best way was to split things in half (to get the maximum height reduction possible).
But, in viewing your question I can see a way that would do better then halving:
9 -> {6, 3} -> {3, 3, 3}
Two swaps plus three minutes to eat: 5 minutes
This has already been answered here:
Infinite House of Pancakes
Basically this is a case where dividing unevenly will give a better solution, for example:
9
3 6
3 3 3
2 2 2
1 1 1
0 0 0
Which is better than dividing evenly

applying linear filters to images ( implementing imfilter) without influencing speed and performance

Based on this code and this one I'm familiar with implementing imfilter function.But as you know these kind of codes( using sequential for loops) would be very slow in matlab specially for high resolution images and they are more efficient in other programming languages. In matlab it's better to vectorize your code as much as possible.
Can anyone suggest me a way for vectorizing imfilter implementation?
Note: I know that I can use edit('imfilter') to study the code that developers have used for implementing imfilter function but it's pretty hard for me. I don't understand much of the codes. I'm pretty new to matlab.
Note: I know that some part of the codes introduced as an example could be vectorized very easily fore example padding section in this code could be implemented more easily.
But I'm thinking of a away for vectorizing the main part of the code (part of applying the filter). I mean the parts that are shown in the pictures:
I don't know how to vectorize these parts?
Oh, I forgot to tell that I have written the accepted answer for this question. Is there any way if I also don't want to use conv2 function?
There is a function just for you... it's called im2col. See http://www.mathworks.com/help/images/ref/im2col.html for a description. It allows you to turn "blocks" of the image into "columns" - if you are looking for 3x3 blocks to filter, each column will be 9 elements long. After that, the filter operation can be very simple. Here is an example:
n = 20; m = 30
myImg = rand(n, m)*255;
myImCol = im2col(myImg, [3 3], 'sliding');
myFilter = [1 2 1 2 4 2 1 2 1]';
myFilter = myFilter / sum(myFilter(:)); % to normalize
filteredImage = reshape( myImCol' * myFilter, n-2, m-2);
Didn't use conv2, and didn't use any explicit loops. This does, however, create an intermediate matrix which is a good deal bigger than the image (in this case, almost 9x). That could be a problem in its own right.
Disclaimer: I usually test Matlab code before posting, but could not connect to the license server. Let me know if you run into issues!
edit some further clarifications for you
1) Why reshape with n-2 and m-2? Well - the im2col function only returns "complete" columns for the blocks that it can create. When I create 3x3 blocks, the first one I can make is centered on (2,2), and the last one on (end-1, end-1). Thus the result is a bit smaller than the original image- it's "like padding". This is in fact the exact opposite of what happens when you use conv2 - in that case things get expanded. If you want to avoid that you could first expand your image with
paddedIm = zeros(n+2, m+2);
paddedIm(2:end-1, 2:end-1) = myImg;
and run the filter on the padded image.
2) The difference between 'sliding' and 'distinct' is best explained with an example:
>> M = magic(4)
M =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> im2col(M,[2 2], 'distinct')
ans =
16 9 3 6
5 4 10 15
2 7 13 12
11 14 8 1
xx-- --xx ---- ----
xx-- --xx ---- ----
---- ---- xx-- --xx
---- ---- xx-- --xx
>> im2col(M,[2 2], 'sliding')
ans =
16 5 9 2 11 7 3 10 6
5 9 4 11 7 14 10 6 15
2 11 7 3 10 6 13 8 12
11 7 14 10 6 15 8 12 1
xx-- ---- ---- -xx-
xx-- xx-- ---- -xx- ... etc ...
---- xx-- xx-- ----
---- ---- xx-- ----
As you can see, the 'distinct' option returns non-overlapping blocks: the 'sliding' option returns "all blocks that fit" even though some will overlap.
3) The implementation of conv2 is likely some lower level code for speed - you may know about .mex files which allow you to write your own C code that can be linked with Matlab and gives you a big speed advantage? This is likely to be something like that. They do claim on their website that they use a "straightforward implementation" - so the speed is most likely just a matter of implementing in a fast manner (not "efficient Matlab").
The two inner loops can be vectorized by-
orignalFlip=flipud(fliplr(orignal(i-1:i+1,j-1:j+1)));
temp=orignalFlip.*filter;
but what the problem with 'conv2' ? seems that exactly what you need...
However, you should not doing 4 nested-loops in matlab.

Resources