I know Pregel's graph computation framework is based on BSP and GraphLab is based on GAS. What is the difference between BSP and GAS? and Advantage and disadvantage of each one?
Related
I am a beginner in convolution networks. I use digits to implement them and facing with few doubts.
While trying out a basic classification problem of images, how do we decide on the number of layers - how many conv layers/ fully connected layer, etc.
In digits we have 3 standard papers implemented, for a particular dataset is there any way to find out which architecture to use – or when should we use our own architecture.
How can the hidden layers be helpful in solving the problems – i.e. what possible decisions can we take by looking at the results in the hidden layer
Deciding on how many layers or neurons is needed or the best architecture for building neural network was never clear or possible. the main procedure was taken before is to try building on some parameters and then measure the performance on training set and testing set not bias or to over fit the data and decide on the best parameters, or try some other algorithm like genetic algorithm.
conclusion either you start from scratch every time to measure the network performance or apply other algorithms which doesn't need to start from scratch and can build incrementally by applying transfer learning and fine tuning on the network architecture.
The core philosophy that makes deep learning so democratic and amazing is simple "Don't be a Hero".
What it means is that in most cases the best deep learning models take millions of data points and weeks to train, something most of us cannot achieve with our low performance PC's (yes a single GPU system is low performance). So why would you want to waste your time in building and training NN architectures. Simple you don't.
Transfer learning is your solution!! try to find models that are trained on data similar to your problem and use their pre-trained weights to fine tune your data set. Doing this not only do you get an already proven NN architecture but also a major head start in training.
The best place to find pre-trained models is the caffe model zoo so go have a look at it.
I'm studying Reinforcement Learning and reading Sutton's book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I'm reading about policy gradient methods and genetic algorithms for the resolution of decision problems.
I have never had experience before in this topic and I'm having problems understanding when a technique should be preferred over another. I have a few ideas, but I'm not sure about them. Can someone briefly explain or tell me a source where I can find something about typical situation where a certain methods should be used? As far as I understand:
Dynamic Programming and Linear Programming should be used only when the MDP has few actions and states and the model is known, since it's very expensive. But when DP is better than LP?
Monte Carlo methods are used when I don't have the model of the problem but I can generate samples. It does not have bias but has high variance.
Temporal Difference methods should be used when MC methods need too many samples to have low variance. But when should I use TD and when Q-Learning?
Policy Gradient and Genetic algorithms are good for continuous MDPs. But when one is better than the other?
More precisely, I think that to choose a learning methods a programmer should ask himlself the following questions:
does the agent learn online or offline?
can we separate exploring and exploiting phases?
can we perform enough exploration?
is the horizon of the MDP finite or infinite?
are states and actions continuous?
But I don't know how these details of the problem affect the choice of a learning method.
I hope that some programmer has already had some experience about RL methods and can help me to better understand their applications.
Briefly:
does the agent learn online or offline? helps you to decide either using on-line or off-line algorithms. (e.g. on-line: SARSA, off-line: Q-learning). On-line methods have more limitations and need more attention to pay.
can we separate exploring and exploiting phases? These two phase are normally in a balance. For example in epsilon-greedy action selection, you use an (epsilon) probability for exploiting and (1-epsilon) probability for exploring. You can separate these two and ask the algorithm just explore first (e.g. choosing random actions) and then exploit. But this situation is possible when you are learning off-line and probably using a model for the dynamics of the system. And it normally means collecting a lot of sample data in advance.
can we perform enough exploration? The level of exploration can be decided depending on the definition of the problem. For example, if you have a simulation model of the problem in memory, then you can explore as you want. But real exploring is limited to amount of resources you have. (e.g. energy, time, ...)
are states and actions continuous? Considering this assumption helps to choose the right approach (algorithm). There are both discrete and continuous algorithms developed for RL. Some of "continuous" algorithms internally discretize the state or action spaces.
Do you know where I can find source code(any language) to program an information retrieval system based on the probabilistic model?
I tried to search it on the web and found an algorithm named bm25 or bmf25, but I don't know if it is useful.
Basically I´m trying to compare the performance of 3 IR algorithms: Vector space model, boolean model and the probabilistic model. Right now I have found the vector space and the boolean models. Depending on the results we need to use the best of them to develop a question-answering system
Thanks in advance
If you are looking for an IR engine that have BM25 implemented, you can try Terrier IR Platform
The language is Java. You can either use the engine itself or look into the source code for implementations of BM25 or other term weighting models.
The confusion here is that there are several probabilistic IR models (e.g. 2-Poisson, Binary Independence Model, language modeling variants), so the question is ambiguous. But in my experience, when people say "the probabilistic model" they usually mean some variant of the Binary Independence model due to Robertson and Sparch-Jones. BM25 (quite roughly) approximates this model, and that's what I'd use in this case. A canonical implementation of BM25 is included in the Lemur Toolkit. See:
http://www.lemurproject.org/doxygen/lemur/html/OkapiRetMethod_8hpp-source.html
Is there any faster alternatives for calcOpticalFlowSF? Its just sooo slow and wanna run this thing with a sequence of frames coming from a video. How can I do that?
There several methods for optical flow based motion estimation but you have to consider several things:
are you restricted to CPU implementation / GPU's implementation could decrease drastically the run-time
do you need dense motion fields or just a set of sparse motion vectors / sparse OF methods are more scalable and thus need less run-time
accuracy / the very high accuracy of dense methods is most only critical on motion boundaries. In many application you could approximate a dense motion field by a grid of sparse motion vectors and thus can use sparse methods as the pyramidal Lucas Kanade (OpenCV)
current libaries / methods are:
Dense Methods:
OpenCV 2.4.4 provides on GPU BroxOpticalFlow that is fast too
the FlowLib of the GPU4Vision Group provides a high accurate GPU implementation
GPU implementation of the TV-L1 on GPU is provided by
Sparse Methods:
OpenCV since 2.4.2 provides the pyramidal Lucas Kanade on GPU /earlier verions also very fast implementation on the CPU
the RLOFLib provides an more accurate implementation for GPU / CPU and Matlab
the Gain Adaptive Lucas Kanade / KLT is also available for GPU
You could also take a look to the current Optical Flow benchmarks, where the researcher sometimes provides Link. Common Optical Flow benchmarks are the Middlebury and the KITTI
For example I have array of (x,y) points and I want to organize them in kd-tree
Building kd-tree includes sorting and computing bounding boxes. These algorithms work fine on CUDA, but is there any way to build kd-tree utilizing as many threads as possible?
I think there should be some tricks:
Usually, kd-tree is implemented with recursion, but as far as I know, CUDA processors don't have hardware stack, so recursion should be avoided.
How can I build kd-tree in Cuda effectively?
You might want to have a look at the following papers:
Stackless KD-Tree Traversal for High Performance GPU Ray Tracing
Real-Time KD-Tree Construction on Graphics Hardware
They might help you along. Google them and you'll find them available online.