Optimization with only non-linear objective and all linear constraints - solver

I am using Lindo API to solve a non-linear optimization scenario with non-linearity in only the objective. I am loading the constraint coefficients using LSloadLPData and calculating the value of objective using the CallBack function set via LSsetFuncalc. Is it necessary to call LSloadNLPData? If yes, what should the values be for indexes of non-linear variables in each column? (since all constraints are linear)

You should call LSloadNLPData to load indices of nonlinear variables in the objective function. In your LINDO API installation folder, see the following sample under installation folder.
lindoapi\samples\c\ex_nlp9_uc

Related

Question about activation function of image task in Deep Learning

Let me ask you about the image task of Deep Learning (here, image identification).
DeepLearning recognizes that it can be classified into three layers: input layer, intermediate layer, and output layer.
① Input layer → Intermediate layer
② Intermediate layer → Output layer
I understand that it is normal for ① and ② to use the activation function.
I recognize as follows.
Regarding (1), the ReLU function and sigmoid function are used.
Regarding (2), the softmax function is used.
I would like to know why (1) and (2) each use a specific function by convention.
Also, are there cases where the activation function is used, and are there any results evaluated by various functions?
If anyone knows anything about the above, please let me know.
Also, if you have a reference web page or treatise, please let me know.
The choice of activation function in the hidden layer will control how well the network model learns the training dataset. The choice of activation function in the output layer will define the type of predictions the model can make.
An activation function in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. Many activation functions are nonlinear and may be referred to as the “nonlinearity” in the layer or the network design. Nonlinear activation functions are preferred as they allow the nodes to learn more complex structures in the data.
Hidden Layer
ReLU (rectified linear units) activation function, is now-a-days the most common function used for hidden layers because it is both simple to implement and effective at overcoming the limitations of other previously popular activation functions, such as Sigmoid and Tanh. Specifically, it is less susceptible to vanishing gradients that prevent deep models from being trained, although it can suffer from other problems like saturated or “dead” units.
A general problem with both the sigmoid and tanh functions is that they saturate. This means that large values snap to 1.0 and small values snap to -1 or 0 for tanh and sigmoid respectively. Further, the functions are only really sensitive to changes around their mid-point of their input, such as 0.5 for sigmoid and 0.0 for tanh.
The limited sensitivity and saturation of the function happen regardless of whether the summed activation from the node provided as input contains useful information or not. Once saturated, it becomes challenging for the learning algorithm to continue to adapt the weights to improve the performance of the model.
Because rectified linear units are nearly linear, they preserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make linear models generalize well.
Because the rectified function is linear for half of the input domain and nonlinear for the other half, it is referred to as a piecewise linear function or a hinge function. However, the function remains very close to linear, in the sense that is a piecewise linear function with two linear pieces.
Outer Layer
Common activation functions to consider for use in the output layer are: Linear, Logistic (Sigmoid) and Softmax.
The linear activation function is also called “identity” (multiplied by 1.0) or “no activation.” This is because the linear activation function does not change the weighted sum of the input in any way and instead returns the value directly.
The softmax function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class membership. It is related to the argmax function that outputs a 0 for all options and 1 for the chosen option. Softmax is a “softer” version of argmax that allows a probability-like output of a winner-take-all function. As such, the input to the function is a vector of real values and the output is a vector of the same length with values that sum to 1.0 like probabilities.
Choose the activation function for your output layer based on the type of prediction problem that you are solving. Specifically, the type of variable that is being predicted.
For example, you may divide prediction problems into two main groups, predicting a categorical variable (classification) and predicting a numerical variable (regression).
If your problem is a regression problem, you should use a linear activation function.
Regression: One node, linear activation.
If your problem is a classification problem, then there are three main types of classification problems and each may use a different activation function.
Predicting a probability is not a regression problem; it is classification. In all cases of classification, your model will predict the probability of class membership (e.g. probability that an example belongs to each class) that you can convert to a crisp class label by rounding (for sigmoid) or argmax (for softmax).
If there are two mutually exclusive classes (binary classification), then your output layer will have one node and a sigmoid activation function should be used. If there are more than two mutually exclusive classes (multiclass classification), then your output layer will have one node per class and a softmax activation should be used. If there are two or more mutually inclusive classes (multilabel classification), then your output layer will have one node for each class and a sigmoid activation function is used.
Binary Classification: One node, sigmoid activation.
Multi-class Classification: One node per class, softmax activation.
Multi-label Classification: One node per class, sigmoid activation.
The softmax function is used as the activation function in the output layer of neural network models that predict a multinomial probability distribution. That is, softmax is used as the activation function for multi-class classification problems where class membership is required on more than two class labels.
The function can be used as an activation function for a hidden layer in a neural network, although this is less common. It may be used when the model internally needs to choose or weight multiple different inputs at a bottleneck or concatenation layer.
Reference: machinelearningmastery.com
relu and leakyrelu and tanh activation functions in the input and hidden are used for numeric function prediction. leakyrelu and tanh find signal better for equations and linear trends. I used leakyrelu and tanh for linear problems. sigmoid activation is used for classification and binary cross entropy problems. tanh worked well for binary cross entropy problem in credit loan risk.
you can use softmax when your outputting multiple labels as a probability. In this example, I use the ufo text description to output the probable shape of the ufo.
https://github.com/dnishimoto/python-deep-learning/blob/master/UFO%20.ipynb
['Egg','Cross','Sphere','Triangle','Disk','Oval','Rectangle','Teardrop']
softmax returns a probability for each output label
model=Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(len(LABELS), activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
egg shaped ufos are the most common sightings

How do I intuitively interpret a sigmoidal neural network model?

There are multiple sources, but they explain at a bit too high a level for me a to actually understand.
Here is my knowledge of how this model works;
We feed-forward information in prior layer's nodes using the weight * value. We do NOT use the sigmoid function here. This is because any hidden layers will force the value to be POSITIVE if we use the sigmoid function here. If it is always positive, then subsequent values can never be less than 0.5.
When we have fed forward to the output, we then use the sigmoid function on the output.
So in total we only use the sigmoid function on the output layer values only.
I will try to include a hopefully not terrible diagram
https://imgur.com/a/4EzkpH5
I have tested with my own code, and evidently it should not be the sigmoid function on every value and weight, but I am unsure if it is just the sum of weight*value
So basically you have a set of features for your model. These features are independent variables which will be responsible for producing of the output. So features are the inputs and the predicted values are the outputs. This is indeed a function.
It is easy to understand neural networks if we study them in terms of functions.
First multiply the feature vector with the vector of weights. Meaning, the dot product of the both vectors must be produced.
The dot product is a scalar if you have a single node ( neuron ). Apply sigmoid function on the product. The output is the final prediction.
The whole model could be expressed as a single composite function like,
y = sigmoid( dot( w , x ) )
Also understanding back propogation ( gradient descent ) for NN makes some intuition if we treat NN as functions.
In the above function,
sigmoid : applies sigmoid activation function to the argument.
dot : returns the dot product of two vectors.
Also, use vector notation as far as possible. It saves you from the confusion related with summations.
Hope it helps.
Activation functions serve an important role in neural network models: they can, given the choice of activation function, grant the network the capability to model non-linear datasets.
The example illustrated in the figure you posted (rendered below) will be limited to model linear problems where the output value is between 0 and 1 (the range of the sigmoidal function). However, the model would support non-linear datasets if the sigmoidal was applied to the two nodes in the middle. StackOverflow is not the place to discuss the theoretic foundation of why this works, instead I recommend looking into some light reading like this ebook: Neural Networks and Deep Learning (no affiliation).
As a side note: the final, output layer of a network are sometimes instantiated as a simple sum, or a ReLU. This will widen the range of the network's output.

Optimal parameters for genetic algorithm

I am solving an optimization problem in matlab. The optimization takes for 10 variables with search space consisting of (30*21*30*21*15*21*15*21*13*13= 6.6e12) combinations.
I have currently set the following parameters for ga optimization.
CrossoverFraction=0.4;
PopulationSize=500;
EliteCount=4;
Generations=25;
Rest of the values are set to default taken from gaoptimset as follows :
options=gaoptimset('PopInitRange',Bound,'PopulationSize',PopulationSize,...
'EliteCount',EliteCount, 'Generations',Generations,'StallGenL',25,...
'Display','iter');
Now I understand the search space is large but given the limitation by time due to number of times I have to run this GA algorithm for various instruments, I cannot increase (PopulationSize*Generations). I am running the optimization as a single threaded application, hence I am not using migration options.
Please suggest ways to improve the optimisation capability of my problem by tweaking other parameters in the options. Alternative ways of optimization are also welcome.
To increase the speed of the algorithm, try specifying bounds of your 10 variables. This forces the algorithm to explore values for your variables within a smaller data set and leads to a faster convergence to a suitable answer. You will have to make educated guesses for these values based on your specific problem.
This leaves you with additional time to try and increase other parameters such as the generations etc.
One way to specify bounds is when calling the ga function:
nvars = 10; // 10 Variables
lower = [0,0,0,0,0,0,0,0,0,0]; // Lower bounds for each variable
upper = [10,10,10,10,10,10,10,10,10,10]; // Upper bounds for each variable
[x fval] = ga(#objectiveFunction, nvars, [],[],[],[],lower, upper,[], integers, options)

Performant approach to modifying specific Incanter matrix elements?

I have a problem where I look at a row of elements, and if there are no non-zero elements in the row, I want to set one to a random value. My difficulty is the update strategy. I just attempted to get a version working that used slice from Clatrix, but that does not work with Incater matrices. So should I construct another matrix of "modifications" and perform elementwise addition? Or build some other form of matrix and perform a multiplication (not sure how to build this one yet). Or maybe I could somehow map the rows? My trouble there is that assoc does not work Incanter matrices, which is apparently what sel returns.
It seems that there are no easy ways to do this in Incanter, so I ended up directly using set from Clatrix and avoiding Incanter usage.

Principle Component Analysis with very big dimension of data

I have a set of samples (vectors) each have a dimension about of M (10000) and the size of the set is also about N(10000), and i want to find first (with biggest eiegenvalues) 10 PC of this set. Due to the big dimension of samples i cannot calculate covariation matrix in reasonable time. Are there any methods to select PC without calculation of full cov matrix or methods that can effectively handle big dimension of data or something like this? So these methods should require less operations than O(M*M*N).
NIPALS -- Non-linear iterative partial least squares
see for example here: http://en.wikipedia.org/wiki/NIPALS
guys, maybe it could help somehow, i have found solution in family of EM-PCA methods (see for example this, http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/papers/PCA_RoweisEMPCA.pdf)

Resources