How these matrices are working? - matrix

I am reading a article about pattern reconigation. I am not understanding how this 8 column coming from. and how its output is generating.
I tried to get the concept but i am not getting how first matrix have 8 column ? and how its calculating output ?
The network of figure 1 is trained to recognise the patterns T and H.
The associated patterns are all black and all white respectively as
shown below.
If we represent black squares with 0 and white squares with 1 then the
truth tables for the 3 neurones after generalisation are;
enter image description here

Each table represents one line of you image.
Each column(Xij) of your table represents possible combinations of those pixels in one line(of your input image) and OUT represents if those combinations evaluate to true or false.
There are 8 columns because there are 8 possibilities of combining 3 values of 1 and 0 (2 at the power of 3).
I think it's easier if you look at those tables on vertical(transpose).

Related

Convolutional Neural Networks - Theory

I am sorry for asking this stupid question, but after a bit thinking, I still don't get it yet:
According to Jordi Torres (see here), if we look at an image with 28x28 = 784 pixels, then one way to implement this is to let one neuron of a hidden layer learn about 5x5 = 25 pixels of the input layer:
However, as he explains it:
Analyzing a little bit the concrete case we have proposed, we note that, if we have an input of 28×28 pixels and a window of 5×5, this defines a space of 24×24 neurons in the first hidden layer because we can only move the window 23 neurons to the right and 23 neurons to the bottom before hitting the right (or bottom) border of the input image. We would like to point out to the reader that the assumption we have made is that the window moves forward 1 pixel away, both horizontally and vertically when a new row starts. Therefore, in each step, the new window overlaps the previous one except in this line of pixels that we have advanced.
I really don't get why we need a space of 24x24 neurons in the first hidden layer? Since I take 5x5 windows (which have 25 pixels out of 784 in them), I thought we would need 785/25 = 32 neurons at all. I mean, doesn't one neuron of the hidden layer learn the property of 25 pixels?
Apparently not, but I am really confused.
You're assuming non-overlapping 5x5 segments, but that's not the case. In this example, the first output is derived from rows 1-5, columns 1-5 of the input. The next one uses rows 1-5, columns 2-6, on to rows 1-5, columns 24-28, then rows 2-6, columns 1-5, etc. etc. until rows 24-28, columns 24-28. This is referred to as a "stride" of 1.

how to create an image in matlab (if possible) from matrix? [duplicate]

This question already has an answer here:
How to create an array according to row and column number [duplicate]
(1 answer)
Closed 5 years ago.
The problem is the following: I have a .txt file containing 3 columns of numbers. The first 2 columns are the coordinate x,y of the points. The third columnn (z vector) is made of numbers that express the luminosity of each point. (The .txt files have been generated by a software that is used to study the pictures of a combustion process). Each vector (x,y,z) is made of 316920 elements (all integer numbers). Now: is there a way to create from these 3 vectors an image in matlab relating the luminosity value to the coordiantes of the point?
Thanks for your time!
consider a file image.txt contains y x and intensity values separated line. like this.
1 1 0
1 2 12
1 3 10
....
....
255 255 0
open the text file using fopen function
fid = fopen(image.txt,'r');
im=[];
and read a string-line of characters by fgetl function, convert string-line into vector using sscanf and put intensity value into y and x coordinates of a image matrix, im.
tline=fgetl(fid) ;
rd=sscanf(tline,'%d');
im(rd(1),rd(2))=rd(3);
The same process is iterated up-to end of file.
at last close file-handle fid
I am going to assume that the three columns in your text file are comma separated( The code will need to be a bit different if they are not comma separated) . Since you said all numbers are integers, I am going to assume that you have all the data needed to fill a 2D grid using your x and y coloumns is present. I am not assuming that it is in a ordered form. With these assumptions the code will look like
data = csvread(filename)
for i=1:length(data)
matrix(data(i,2)+1,data(i,1)+1)=data(i,3) // +1 is added since there maybe a index starting from 0 and matlab needs them to start from 1
end
image(matrix)
For other delimiters use
data = dlmread(filename,delimiter)

Feature Vector Representation Neural Networks

Objective: Digit recognition by using Neural Networks
Description: images are normalized into 8 x 13 pixels. For each row ever black pixel is represented by 1and every white white 0. Every image is thus represented by a vector of vectors as follows:
Problem: is it possible to use a vector of vectors in Neural Networks? If not how should can the image be represented?
Combine rows into 1 vector?
Convert every row to its decimal format. Example: Row1: 11111000 = 248 etc.
Combining them into one vector simply by concatenation is certainly possible. In fact, you should notice that arbitrary reordering of the data doesn't change the results, as long as it's consistent between training and classification.
As to your second approach, I think (I am really not sure) you might lose some information that way.
To use multidimensional input, you'd need multidimensional neurons (which I suppose your formalism doesn't support). Sadly you didn't give any info on your network structure, which i think is your main source of problems an confusion. Whenever you evaluate a feature representation, you need to know how the input layer will be structured: If it's impractical, you probably need a different representation.
Your multidimensional vector:
A network that accepts 1 image as input has only 1 (!) input node containing multiple vectors (of rows, respectively). This is the worst possible representation of your data. If we:
flatten the input hierarchy: We get 1 input neuron for every row.
flatten the input hierarchy completely: we get 1 input neuron for every pixel.
Think about all 3 approaches and what it does to your data. The latter approach is almost always as bad as the first approach. Neural networks work best with features. Features are not restructurings of the pixels (your row vectors). They should be META-data you can gain from the pixels: Brightness, locations where we go from back to white, bounding boxes, edges, shapes, masses of gravity, ... there's tons of stuff that can be chosen as features in image processing. You have to think about your problem and choose one (or more).
In the end, when you ask about how to "combine rows into 1 vector": You're just rephrasing "finding a feature vector for the whole image". You definitely don't want to "concatenate" your vectors and feed raw data into the network, you need to find information before you use the network. This is critical for pre-processing.
For further information on which features might be viable for OCR, just read into some papers. The most successful network atm is Convolutional Neural Network. A starting point for the topic feature extraction is here.
1 ) Yes combine into one vector is suitable i use this way
http://vimeo.com/52775200
2) No it is not suitable because after normalization from rang ( 0-255 ) -> to range ( 0 - 1 ) differt rows gives aprox same values so lose data

Matlab 3-layered Image Matrix

I'm currently learning Matlab and having troubles understand the image section. I have this powerpoint slide about matlab image:
****Image Matrices*
3-layered image matrices -- Read ‘rainbow.jpg’ into im
Subset im to im2 – e.g.( 155:184 , 145:164, : )
*1 layer of an image – Get the red layer in im2****
I would like to ask what does(155:184, 145:164, :) represent? What does each value in parenthesis represent? Also, what does the semi-colon represent?
Thank you!
Say you have a 3 dimensional matrix A and you are indexing your matrix. I will use the example above:
A(155:184,145:164,:)
155:184 in the first entry means take rows 155 to 184
145:164 in the second entry means take columns 145 to 164
and the semi-colon in the last entry means take EVERY element along the 3rd dimension. So if you have 200x200x3, the semi-colon will take the 3 arrays along that 3rd dimension.

identifying custom shapes in a 2D grid

Dear stackoverflowers.
So, let's say there's a grid, whose values at certain x and y represent whether there is a tile there (1) or it's missing (0).
For example,
100110100010
100100111110
111110000000
000010000000
And there are some already known shapes A, B and C, for example,
(A) (B) (C)
1 1 1
111, 111, 11
So what I am trying to achieve is to identify which 1's on a grid belong to which shape.
All of 1's should be used up, exact number of shapes is known, but rotation (no mirroring) is allowed, so I guess it's better to add rotated versions and think, that some shapes won't be found on grid.
So, the expected result would be (it's known that it should be exactly 1xA, 2xB, 2xC):
A00CC0B000C0
A00C00BBBCC0
AABBB0000000
0000B0000000
If there are several possible matches, any would suit, as long as every tile gets allocated to it's own shape.
Moreover, finding out whether a tile is present or not ("uncovering") is an expensive operation (but results are cached, tiles don't appear out of nowhere), so I am actually seeking for a way to identify them with as minimum number of "uncoverings" as possible.
(It's okay if it's not optimum, just identifying shapes would be great).
Obviously, set of known shapes might change (but it will be known by the time of implementing and it will stay constant, so it's possible to tune up code for a particular set of tiles or develop some search strategies), but it won't be large (~5-6) and grid is quite small too (~15x15).
Thanks!
Using the ideas here and/or here (I guess using this one, the object types would be 0 and 1), one way to do it might be to try and match your own patterns against the catalog of collected objects. To take you own example,
100110100010
100100111110
111110000000
000010000000
Shapes A, B and C:
(A) (B) (C)
1 1 1 111 1 11
111 or 1 111 or 1 11 or 1
11
The first collected object might be,
1 11
1 1
11111
1
=> represented as a set of numbers: [(0,0),(0,1),(0,2),(1,2)..etc]
(the objects need not start or include (0,0) but object
bounds seem needed to calibrate the pattern matching)
Testing object A against the top left of the object would match [(0,0),(0,1),(0,2),(1,2)]. After A is matched, the program must find a way to calibrate the remaining points - the bottom right corner will effectively be measured as (2,3) rather than (4,3) - testing the bottom right of the remaining points in the object would match object B. Continue in a similar vein to match all, trying different combinations if a total match is not found.

Resources