Is it still considered a generative adversarial network if the random vector z is not used? - random

I am using a model very similar to the pix2pix model (https://arxiv.org/pdf/1611.07004.pdf). I don't feed a random noise vector to the "generator", as it works just fine without. But I condition on some input parameters.
Is it wrong to call it a generative model and the first network a generator when I don't use this vector, as usually generative models are defined by this vector? Should it be instead called adversarial training?
I use dropouts in the generator network, which could be considered a source of stochasticity. And also, in my mind, it is called a generator, as it is the network that generates fakes images, as opposed to the discriminator. I would be happy to hear what you think.
It is more of a theoretical question.

Related

Multiple inputs to a convolutional neural network with one output?

Is there a standard way to input multiple images into a CNN and condense the information down into a singular image in the end? I'm working on an HDR dataset where you have multiple images of the same scene and put them together to form an image with less noise.
What I have tried to do is set the separate images as channels, but I'm unsure if this is appropriate as the outputs were weird.
Is this implementable?
Yes! Any neural network, regardless of number or type of layers, is nothing more than some function f: I → O, so as long as your different inputs belong to the same type you can freely pass as many of them through your network as you please, getting equally many outputs on the other side of the black box. These can be in turn used in any way you desire through any other function g: O x O → O' (could be addition, tensor multiplication, concatenation or something fancier like another neural network).
Does this make sense?
No answer here; this largely depends on what you are expecting your function to be doing, what your data are and what your end-goal is. Are you assuming that all of your inputs obey the same statistical properties or have similar underlying patterns? In that case, you could argue that the same function can model them. Keep in mind that by applying them in parallel (i.e. asynchronously) you're keeping your function blind to potential cross-dependencies between the two inputs, which in some cases makes complete sense. To make this a bit clearer, if your different inputs are the different channels of a single image, I think this is the wrong approach; each channel may convey different information, and keeping your network unaware of how each channel affects each other, while forcing it to create meaningful abstractions using the same function over all of them doesn't sound like a great idea. On the other hand, if your different images are e.g. photos of an object from different angles and the sub-network you're applying them to is some sort of classifier over their features (already obtained through another CNN, for instance), then it could make sense to model both using the same function.

Layers and Neurons of a Neural Network

I would like to know a bit more about Neural Network, I'm developing a C++ program to make a NN but I'm stuck with the BackPropagation algorithm, sorry for not offering some working code.
I know that there are so many libraries for creating a NN in many languages, but I prefer to make one from my self. The point is that I don't know how many layers and how many neurons should be necessary for achieving a particular goal such as pattern recognition, or functions approximations, or whatever.
My questions are: if I'd like to recognize some particulars patterns, like in image detection, how many layers and neurons-per-layer should be necessary? Let's say my images are all 8x8 pixels, I would start naturally with an input layer of 64 neurons, but I don't have any idea of how many neurons I have to put in hidden layers, and also in output layer. Let's say I have to distinguish from cats and dogs, or whatever you may think, how could be the output layer? I can imagine an output layer with only-one neuron outputting a value between 0 and 1 with the classical logistic function (1/(1+exp(-x)) and when it is near 0 the input was a cat and when approaches 1 it was a dog, but ... is it correct? What if I add a new pattern like a fish? and what if the input contains a dog and a cat ( ..and a fish)? This make me thinking that the logistic function in the output layer is not very suitable for pattern recognition like this, only because 1/(1+exp(-x)) has a range in (0,1). Do I have to change the activation function or maybe add some other neurons to the output layer? Are there some other activations function more accurate to do this? Do every neurons in every layers have the same activation function, or it is different from layer to layer?
Sorry for all of this questions, but this topic is not very clear to me.
I read a lot around internet, and I found libraries all-yet-implemented and hard to read from, and many explanations to what a NN can do, but not how it can do.
I read a lot from https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/ and http://neuralnetworksanddeeplearning.com/chap1.html, and here I understood how to approximate a function (because every neurons in a layer can be thought as a step-function with a particular step for weights and bias) and how back-propagation algorithm works, but other tutorials and similars were more focused on preexisting libraries. I also read this question Determining the proper amount of Neurons for a Neural Network but I would like to involve also the activation functions of a NN, which is the best and for what is the best.
Thanks in advance for your answers!
Your questions are quite general, so I can only give some general recommendations:
The number of layers you need depends on the complexity of the problem you want to solve. The more calculation is required to obtain an output from a given input, the more layers you need.
Only very simple problems can be solved with a single layer network. These are called linearly separable and are usually trivial. With two layers it gets better and with three layers, at least in theory, all kinds of classification tasks can be performed if you have enough cells within the layers. In practice, however it is often better to add a 4th or 5th layer to the network while reducing the number of cells within a single layer.
Be aware that the standard backpropagation algorithm performs badly with more than 4 or 5 layers. If you need more layers, have a look at Deep Learning.
The numbers of cells within each layer mainly depends on the number of inputs and, if you solve a classification task, the number of classes you want to detect. In practice it is quite common to reduce the number of cells from layer to layer, but there are exceptions.
Concerning your question about the output function: In most cases you should stick with one type of sigmoid function. The case you describe is not really an issue because you could add another output cell for your "fish" class. The choice of a specific activation function is not that critical. Basically you use one whose values and derivative can be calculated efficiently.
#Frank Puffer has already provided some nice information, but let me add my two cents. First off, much of what you're asking is in the area of hyperparameter optimization. Although there are various "rules of thumb", the reality is that determining the optimal architecture (number/size of layers, connectivity structure, etc.) and other parameters like the learning rate typically requires extensive experimentation. The good news is that the parameterization of these hyperparameters is among the simplest aspects of the implementation of a neural network. So I would recommend focusing on building your software such that the number of layers, size of layers, learning rate, etc., are all easily configurable.
Now you specifically asked about detecting patterns in an image. It's worth mentioning that using standard multi-layer perceptrons (MLPs) to perform classification on raw image data can be computationally expensive, especially for larger images. It's common to use architectures that are designed to extract useful, spacially-local features (i.e.: Convolutional Neural Networks or CNNs).
You could still use standard MLPs for this, but the computational complexity can make it an untenable solution. The sparse connectivity of CNNs for example dramatically reduce the number of parameters requiring optimization and simultaneously build a conceptual hierarchy of representations better suited for classification of images.
Regardless, I would recommend implementing backpropagation using stochastic gradient descent for optimization. This is still the approach typically used for training neural nets, CNNs, RNNs, etc.
Regarding the number of output neurons, this is one question that does have a simple answer: use "one-hot" encoding. For each class you want to recognize, you have an output neuron. In your example of the dog, cat, and fish classes, you have three neurons. For an input image representing a dog, you would expect a value of 1 for the "dog" neuron, and 0 for all the others. Then, during inference, you can interpret the output as a probability distribution reflecting the confidence of the NN. For example, if you get output dog:0.70, cat:0.25, fish:0.05, then you have a 70% confidence that the image is a dog, and so on.
For activation functions, the most recent research I've seen seems to indicate that Rectified Linear Units are generally a good choice since they're easy to differentiate and compute, and they avoid a problem that plagues deeper networks called the "vanishing gradient problem".
Best of luck!

Artificial neural network image transformation

I have a pairs of images (input-output) but I don't know the transformation to going from A (input) to B (output). I want to record image A and get image B. Physically I can change the setup to get A or B, but I want to do it by software.
If I understood well, a trained Artificial Neural Network is able to do that, having an input can give the corresponding output, is it right?
Is there any software/ANN that just "training" it with entering a number of input-output pairs will be able to provide the correct output if the input is a new (but similar to the others) image?
Thanks
If you have some relevant amount of image pairs (input/output pair) and you don't know transformation between input and output you could train ANN on that training set to imitate that unknown transformation. You will be able to well train your ANN only if you have sufficient amount of training image pairs, but it could be pretty impossible when that unknown transformation is complicated.
For example if that transformation simply increases intensity values of pixels at input image by given value, ANN will very fast learn to imitate that behavior, but if that unknown transformation is some complicated convolution or few serial convolutions or something more complicated it will be very hard, near impossible to train ANN to imitate that transformation. So, more complex transformation will need bigger training set and more complex ANN design.
There are plenty of free opensource ANN libraries implemented in many languages. You could start for example with that tutorial: http://www.codeproject.com/Articles/13091/Artificial-Neural-Networks-made-easy-with-the-FANN
What you are asking is possible in principle -- in theory, an ANN with sufficiently many hidden units can learn an arbitrary function to map inputs to outputs. However, as the comments and other answers have mentioned, there may be many technical issues with your particular problem that could make it impractical. I would classify these problems as (a) mapping complexity, (b) model complexity, (c) scaling complexity, and (d) implementation complexity. They are all somewhat related, but hopefully this is a useful way to break things down.
Mapping complexity
As mentioned by Springfield762, there are many possible functions that map from one image to another image. If the relationship between your input images and your output images is relatively simple -- like increasing the intensity of each pixel by a constant amount -- then an ANN would be able to learn this mapping without much difficulty. There are probably many more transformations that would be similarly easy to learn, such as skewing, flipping, rotating, or translating an image -- basically any affine transformation would be easy to learn. Other, nonlinear transformations could also be feasible, such as squaring the intensity of each pixel.
As a general rule, the more complicated the relationship between your input and output images, the more difficult it will be to get a model to learn this mapping for you.
Model complexity
The more complex the mapping from inputs to outputs, the more complex your ANN model will be to be able to capture this mapping. Models with many hidden layers have been shown in the past 10 years to perform quite well on tasks that people had previously thought impossible, but often these state-of-the-art models have millions or even billions of parameters and take weeks to train on GPU hardware. A simple model can capture many simple mappings, but if you have a complex input-output map to learn, you'll need a large, complex model.
Scaling complexity
Yves mentioned in the comments that it can be difficult to scale models up to typical image sizes. If your images are relatively small (currently the state of the art is to model images on the order of 100x100 pixels), then you can probably just throw a bunch of raw pixel data at an ANN model and see what happens. But if you're using 6000x4000 images from your shiny Nikon DSLR, it's going to be quite difficult to process those in a reasonable amount of time. You'd be better off compressing your image data somehow (PCA is a common technique) and then trying to learn the mapping in the compressed space.
In addition, larger images will have a larger space of possible mappings between them, so you'll need more of your larger images as training data than you would if you had small images.
Springfield762 also mentioned this: If the mapping between your input and output images is simple, then you'll only need a few examples to learn the mapping successfully. But if you have a complicated mapping, then you'll need much more training data to have a chance at learning the mapping properly.
Implementation complexity
It's unlikely that a tool already exists that would let you just throw image data into an ANN model and have a mapping appear. Most likely you'll need, at a minimum, to implement some code that will pre-process your image data. In addition, if you have lots of large images you'll probably need to write code to handle loading data from disk, etc. (There are a lot of "big data" tools for things like this, but they all require some amount of work to get set up.)
There are many, many open source ANN toolkits out there nowadays. FANN (already mentioned) is a popular one in C++ with bindings in other languages. Caffe is quite popular, and is also implemented in C++ with bindings. There seem to be many toolkits that use Python and Theano or some other GPU acceleration library -- Keras, Lasagne, Hebel, Pylearn2, neon, and Theanets (I wrote this one). Many people use Torch, written in Lua. Matlab has at least one neural network toolbox. I'm less familiar with other ecosystems, but Java seems to have Deeplearning4j, C# has Accord, and even R has darch.
But with any of these neural network toolkits, you're going to have to write some code to load the data, process it into the appropriate input format, construct (or load) a network model, train the model, etc.
The problem you're trying to solve is a canonical classification problem that neural networks can help you solve. You treat the B images as a set of labels that you match to A, and once trained, the neural network will be able to match the B images to new input based on where the network locates new input in a high-dimensional vector space. I assume you'd use some combination of convolutional networks to create your features, and softmax for multinomial classification on the output layer. More here: http://deeplearning4j.org/convolutionalnets.html
Since this has been written there has been a lot of work in the realm of cgans ( conditional generative adversarial networks ) please refer to:
https://arxiv.org/pdf/1611.07004.pdf

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

What are some algorithms for symbol-by-symbol handwriting recognition?

I think there are some algorithms that evaluate difference between drawn symbol and expected one, or something like that. Any help will be appreciated :))
You can implement a simple Neural Network to recognize handwritten digits. The simplest type to implement is a feed-forward network trained via backpropagation (it can be trained stochastically or in batch-mode). There are a few improvements that you can make to the backpropagation algorithm that will help your neural network learn faster (momentum, Silva and Almeida's algorithm, simulated annealing).
As far as looking at the difference between a real symbol and an expected image, one algorithm that I've seen used is the k-nearest-neighbor algorithm. Here is a paper that describes using the k-nearest-neighbor algorithm for character recognition (edit: I had the wrong link earlier. The link I've provided requires you to pay for the paper; I'm trying to find a free version of the paper).
If you were using a neural network to recognize your characters, the steps involved would be:
Design your neural network with an appropriate training algorithm. I suggest starting with the simplest (stochastic backpropagation) and then improving the algorithm as desired, while you train your network.
Get a good sample of training data. For my neural network, which recognizes handwritten digits, I used the MNIST database.
Convert the training data into an input vector for your neural network. For the MNIST data, you will need to binarize the images. I used a threshold value of 128. I started with Otsu's method, but that didn't give me the results I wanted.
Create your network. Since the images from MNIST come in an array of 28x28, you have an input vector with 784 components and 1 bias (so 785 inputs), to your neural network. I used one hidden layer with the number of nodes set as per the guidelines outlined here (along with a bias). Your output vector will have 10 components (one for each digit).
Randomly present training data (so randomly ordered digits, with random input image for each digit) to your network and train it until it reaches a desired error-level.
Run test data (MNIST data comes with this as well) against your neural network to verify that it recognizes digits correctly.
You can check out an example here (shameless plug) that tries to recognize handwritten digits. I trained the network using data from MNIST.
Expect to spend some time getting yourself up to speed on neural network concepts, if you decide to go this route. It took me at least 3-4 days of reading and writing code before I actually understood the concept. A good resource is heatonresearch.com. I recommend starting with trying to implement neural networks to simulate the AND, OR, and XOR boolean operations (using a threshold activation function). This should give you an idea of the basic concepts. When it actually comes down to training your network, you can try to train a neural network that recognizes the XOR boolean operator; it's a good place to start for an introduction to learning algorithms.
When it comes to building the neural network, you can use existing frameworks like Encog, but I found it to be far more satisfactory to build the network myself (you learn more that way I think). If you want to look at some source, you can check out a project that I have on github (shameless plug) that has some basic classes in Java that help you build and train simple neural-networks.
Good luck!
EDIT
I've found a few sources that use k-nearest-neighbors for digit and/or character recognition:
Bangla Basic Character Recognition Using Digital Curvelet Transform (The curvelet coefficients of an
original image as well as its morphologically altered versions are used to train separate k–
nearest neighbor classifiers. The output values of these classifiers are fused using a simple
majority voting scheme to arrive at a final decision.)
The Homepage of Nearest Neighbors and Similarity Search
Fast and Accurate Handwritten Character Recognition using Approximate Nearest Neighbors Search on Large Databases
Nearest Neighbor Retrieval and Classification
For resources on Neural Networks, I found the following links to be useful:
CS-449: Neural Networks
Artificial Neural Networks: A neural network tutorial
An introduction to neural networks
Neural Networks with Java
Introduction to backpropagation Neural Networks
Momentum and Learning Rate Adaptation (this page goes over a few enhancements to the standard backpropagation algorithm that can result in faster learning)
Have you checked Detexify. I think it does pretty much what you want
http://detexify.kirelabs.org/classify.html
It is open source, so you could take a look at how it is implemented.
You can get the code from here (if I do not recall wrongly, it is in Haskell)
https://github.com/kirel/detexify-hs-backend
In particular what you are looking for should be in Sim.hs
I hope it helps
Addendum
If you have not implemented machine learning algorithms before you should really check out: www.ml-class.org
It's a free class taught by Andrew Ng, Director of the Stanford Machine Learning Centre. The course is an entirely online-taught course specifically on implementing a wide range of machine learning algorithms. It does not go too much into the theoretical intricacies of the algorithms but rather teaches you how to choose, implement, use the algorithms and how diagnose their performance. - It is unique in that your implementation of the algorithms is checked automatically! It's great for getting started in machine learning at you have instantaneous feedback.
The class also includes at least two exercises on recognising handwritten digits. (Programming Exercise 3: with multinomial classification and Programming Exercise 4: with feed-forward neural networks)
The class has started a while ago but it should still be possible to sign up. If not, a new run should start early next year. If you want to be able to check your implementations you need to sign up for the "Advanced Track".
One way to implement handwriting recognition
The answer to this question depends on a number of factors, including what kind of resource constraints you have (embedded platform) and whether you have a good library of correctly labelled symbols: i.e. different examples of a handwritten letter for which you know what letter they represent.
If you have a decent sized library, implementation of a quick and dirty standard machine learning algorithm is probably the way to go. You can use multinomial classifiers, neural networks or support vector machines.
I believe a support vector machine would be fastest to implement as there are excellent libraries out there who handle the machine learning portion of the code for you, e.g. libSVM. If you are familiar with using machine learning algorihms, this should take you less than 30 minutes to implement.
The basic procedure you would probably want to implement is as follows:
Learning what symbols "look like"
Binarise the images in your library.
Unroll the images into vectors / 1-D arrays.
Pass the "vector representation" of the images in your library and their labels to libSVM to get it to learn how the pixel coverage relates to the represented symbol for the images in the library.
The algorithm gives you back a set of model parameters which describe the recognition algorithm that was learned.
You should repeat 1-4 for each character you want to recognise to get an appropriate set of model parameters.
Note: steps 1-4 you only have to carry out once for your library (but once for each symbol you want to recognise). You can do this on your developer machine and only include the parameters in the code you ship / distribute.
If you want to recognise a symbol:
Each set of model parameters describes an algorithm which tests whether a character represents one specific character - or not. You "recognise" a character by testing all the models with the current symbol and then selecting the model that best fits the symbol you are testing.
This testing is done by again passing the model parameters and the symbol to test in unrolled form to the SVM library which will return the goodness-of-fit for the tested model.

Resources