Map Reduce OR other distributed/parallel design pattern? - parallel-processing

I have this code that concatenates/combines a collection of images. I want to restructure this sequential code into a parallel/distributed application as my image collection is quite large ( big data :-) ). I'm contemplating Map/Reduce but not sure if this is possible under Map/Reduce.
#Sequential Code
Result.Image <- NULL
foreach(Image in Image.Collection) {
Result.Image <- CombineImage(Result.Image, Image)
}
Note: order does not matter; Combining Images 1,2,3,4,5 is as good as combining Images 2,3,1,4,5.
Ideally I would like something like this ( looks more like a classic divide-et-impera than like map/reduce ):
1,2,3,4 are the original images. One node concatenates image #1 and image #2 into a new image called image #5. A second node concatenates image #3 and image #4 into image #6 and finally a node concatenates image #5 and image #6 into the final result.
Any ideas on what framework / parallel or distributed design pattern I should use to do things like this ?
Cheers !!

From your initial description (foreach code) seems that you cannot process image #3 until you have processed #1 and #2 since you accumulate intermediate results in the Result.Image. Now, your graph shows a different story, that sibling nodes can be processed in parallel, and I am wondering if even random nodes can be combined in parallel. Regardless, I think you can put all the initial images in a FIFO queue and throw at it as many processors (threads or machines or nodes) that you can afford. Each processor picks up two images, combines them and puts the result back in the queue. You process like this until you get 1 image in the queue.

Related

Getting started with screenshot inventory recognition

I kinda wanna work on something for personal interest but I've hit a bit of a brick wall on the theory aspect due to lack of experience and would appreciate any help with it.
(I marked the main questions with 1) and 2) since it got a bit messy writing this, I apologize for that)
Here's what I want to do:
Load up a screenshot from a phone game inventory, which will have multiple items in squares and below it a count of how many you own.
Divide all of the items into smaller images, compare those images with item images on my PC and if it matches, add the count of the item into a container with the item name.
So the end result would be logging the inventory I have in the game, into a file on my pc which I can then use from then on..
I've had a basic course in coding before so I think I can do the value comparison, loop to compare the processed smaller images and then saving it etc.
What I'm stumped on however is the initial process of loading up the image, then cutting that image up into multiple smaller ones based on rectangles and then comparing those smaller ones with images I prepared beforehand of the same items..
) Not so much on the process itself but moreso on what tools could I use? What Libraries, already existing functions etc that could help with that?
I would appreciate any hints towards stuff that could be used for this.
If it helps I have some familiarity with Java, JS, C and Python.. though I'm not really opposed to picking up something new if it would help me here
So the process, in my head would look something akin to:
Add screenshot -> run function to cut up image into smaller images based on rectangles (top left to bottom right) -> save smaller images to something like an array -> via loop compare array of cut up items with array of item images on pc -> if match, add it into an exportable list along with its Name and Count which I want to do some processing with later..
(process on the side, via OCR presumably? Add all the item count numbers into an array too which will then be fed into the final list at the end of it to the corresponding item)
) Would this be feasible? Would precision of image comparison be a problem when doing this?
(maybe its my way of googling but results that came up seemed to be more about just full image comparison rather than dividing one image into multiple and then comparing those smaller ones..)

CONVLSTM2D to predict the second image from the first image

I have sequences of images (2 images in each sequence). I am trying to use CONVLSTM2D to train on this sequence.
Question:
Can I train LSTM model on just 2 images per sequence? The goal would be, prediction of second image from the first image.
Thanks!
You can, but is this the best to do? (I don't know either).
I think that using a sequence of two steps won't bring a lot of extra intelligence, it's just an input -> output pair in the end.
You could also simply put one image as input and the other as output in a sort of U-Net.
But many of these things must be tested for our surprise. Maybe the way things are made inside the LSTM, with gates and such could add some interesting behavior?

Performing different tasks for different data items in OpenCL?

In summary, I'm looking for ways to deal with a situation where the very first step in the calculation is a conditional branch between two computationally expensive branches.
I'm essentially trying to implement a graphics filter that operates on an image and a mask - the mask is a bitmap array the same size as the image, and the filter performs different operations according to the value of the mask. So I basically want to do something like this for each pixel:
if(mask == 1) {
foo();
} else {
bar();
}
where both foo and bar are fairly expensive operations. As I understand it, when I run this code on the GPU it will have to calculate both branches for every pixel. (This gets even more expensive if there are more than two possible values for the mask.) Is there any way to avoid this?
One option I can think of would be to, in the host code, sort all the pixels into two 1-dimensional arrays based on the value of the mask at that point, and then entirely different kernels on them; then reconstruct the image from the two datasets afterwards. The problem with this is that, in my case, I want to run the filter iteratively, and both the image and the mask change with each iteration (the mask is actually calculated from the image). If I'm splitting the image into two buckets in the host code, I have to transfer each iteration of the image and mask from the GPU, and then the new buckets back to the GPU, introducing a new bottleneck to replace the old one.
Is there any other way to avoid this bottleneck?
Another approach might be to do a simple bucket sort within each work-group using the mask.
So add a local memory array and atomic counter for each value of mask. First read a pixel (or set of pixels might be better) for each work item, increment the appropriate atomic count and write the pixel address into that location in the array.
Then perform a work-group barrier.
Then as a final stage assign some set of work-items, maybe a multiple of the underlying vector size, to each of those arrays and iterate through it. Your operations will then be largely efficient, barring some loss at the ends, and if you look at enough pixels per work-item you may have very little loss of efficiency even if you assign the entire group to one mask value and then the other in turn.
Given that your description only has two values of the mask, fitting two arrays into local memory should be pretty simple and scale well.
Push demanding task of a thread to shared/local memory(synchronization slows the process) and execute light ones untill all light ones finish(so the slow sync latency is hidden by this), then execute heavier ones.
if(mask == 1) {
uploadFoo();//heavy, upload to __local object[]
} else {
processBar(); // compute until, then check for a foo() in local memory if any exists.
downloadFoo();
}
using a producer - consumer approach maybe.

Building a histogram faster

I am working with a large dataset that I need to build a histogram of. I feel like my method of just going through the entire list and marking in a second array the frequency is a slow approach. Any suggestions on how to speed the process up?
Given that a histogram is a graph containing the counts of all items in each bin, you can't make one without visiting all the items.
However, you can:
Create the histogram as you collect the data. Then it takes no time to generate.
Break up the data into N parts, and work on each part in parallel. When each part is done counting, just sum the results for each bin. (You can also combine this with #1)
Sample the data. In theory, looking at a fraction of your data, you should be able to estimate the rest of it. The Math.

matrix-vector-multiplication with hadoop: vector and matrix in different files

I want to do matrix-vector-multiplication with hadoop. i've got a small working example now: there is only one input file containing the rows of the matrix always followed by the vector it is multiplied with. So each map-task gets one row and the vector from this single file.
Now I would like to have two input files. One file should contain the matrix and another one the Vector. but I cant think of a hadoop way to let the mapper access both files.
What would be the best approach here?
Thanks for your help!
The easiest and most efficient solution is to read the vector into memory in the Mapper directly from HDFS (not as map() input). Presumably it is not so huge that it can't fit in memory. Then, map() only the matrix data by row. As you receive each row, dot it with the vector to produce one element of the output. Emit (index,value) and then construct the vector in the Reducer (if needed).

Resources