I have sequences of images (2 images in each sequence). I am trying to use CONVLSTM2D to train on this sequence.
Question:
Can I train LSTM model on just 2 images per sequence? The goal would be, prediction of second image from the first image.
Thanks!
You can, but is this the best to do? (I don't know either).
I think that using a sequence of two steps won't bring a lot of extra intelligence, it's just an input -> output pair in the end.
You could also simply put one image as input and the other as output in a sort of U-Net.
But many of these things must be tested for our surprise. Maybe the way things are made inside the LSTM, with gates and such could add some interesting behavior?
Related
I kinda wanna work on something for personal interest but I've hit a bit of a brick wall on the theory aspect due to lack of experience and would appreciate any help with it.
(I marked the main questions with 1) and 2) since it got a bit messy writing this, I apologize for that)
Here's what I want to do:
Load up a screenshot from a phone game inventory, which will have multiple items in squares and below it a count of how many you own.
Divide all of the items into smaller images, compare those images with item images on my PC and if it matches, add the count of the item into a container with the item name.
So the end result would be logging the inventory I have in the game, into a file on my pc which I can then use from then on..
I've had a basic course in coding before so I think I can do the value comparison, loop to compare the processed smaller images and then saving it etc.
What I'm stumped on however is the initial process of loading up the image, then cutting that image up into multiple smaller ones based on rectangles and then comparing those smaller ones with images I prepared beforehand of the same items..
) Not so much on the process itself but moreso on what tools could I use? What Libraries, already existing functions etc that could help with that?
I would appreciate any hints towards stuff that could be used for this.
If it helps I have some familiarity with Java, JS, C and Python.. though I'm not really opposed to picking up something new if it would help me here
So the process, in my head would look something akin to:
Add screenshot -> run function to cut up image into smaller images based on rectangles (top left to bottom right) -> save smaller images to something like an array -> via loop compare array of cut up items with array of item images on pc -> if match, add it into an exportable list along with its Name and Count which I want to do some processing with later..
(process on the side, via OCR presumably? Add all the item count numbers into an array too which will then be fed into the final list at the end of it to the corresponding item)
) Would this be feasible? Would precision of image comparison be a problem when doing this?
(maybe its my way of googling but results that came up seemed to be more about just full image comparison rather than dividing one image into multiple and then comparing those smaller ones..)
Foreword: I am aware there is another question like this, however mine has very specific restrictions. I have done my best to make this question applicable to many, as it is a generic grid issue, but if it still does not belong here, then I am sorry, and please be nice about it. I have found in the past stackoverflow to be a very picky and hostile environment to question askers, but I'm hoping that was just a bad couple people.
Goal(abstract): Check all connected grid squares in a 3D grid that are of the same type and touching on one face.
Goal(specific/implementation): Create a "fill bucket" tool in Minecraft with command blocks.
Knowledge of Minecraft not really necessary to answer, this is more of an algorithm question, and I will be staying away from Minecraft specifics.
Restrictions: I can do this in code with recursive functions, but in Minecraft there are some limitations I am wondering if are possible to get around. 1: no arrays(data structure) permitted. In Minecraft I can store an integer variable and do basic calculations with it (+,-,*,/,%(mod),=,==), but that's it. I cannot dynamically create variables or have the program create anything with a name that I did not set out ahead of time. I can do "IF" and "OR" statements, and everything that derives from them. I CANNOT have multiple program pointers - that is, I can't have things like recursive functions, which require a program to stop executing, execute itself from beginning to end, and then resume executing where it was - I have minimal control over the program flow. I can use loops and conditional exits (so FOR loops). I can have a marker on the grid in 3D space that can move regardless of the presence of blocks (I'm using an armour stand, for those who know), and I can test grid squares relative to that marker.
So say my grid is full of empty spaces only. There are separate clusters of filled squares in opposite corners, not touching each other. If I "use" my fillbucket tool on one block / filled grid square, I want it to use a single marker to check and identify all the connected grid squares - basically, I need to be sure that it traverses the entire shape, all the nooks and crannies, but not the squares that are not connected to that shape. So in the end, one of the two clusters, from me only selecting a single square of it, will be erased/replaced by another kind of block, without affecting the other blocks around it.
Again, apologies if this doesn't belong here. And only answer this if you WANT to tackle the challenge - it's not important or anything, I just want to do this. You don't have to answer it if you don't want to. Or if you can solve this problem for a 2D grid, that would be helpful as well, as I could possibly extend that to work for 3D.
Thank you, and if I get nobody degrading me for how I wrote this post or the fact that I did, then I will consider this a success :)
With help from this and other sources, I figured it out! It turns out that, since all recursive functions (or at least most of them) can be written as FOR loops, that I can make a recursive function in Minecraft. So I did, and the general idea of it is as follows:
For explaining the program, you may assuming the situation is a largely empty grid with a grouping of filled squares in one part of it, and the goal is to replace the kind of block that that grouping is made of with a different block. We'll say the grouping currently consists of red blocks, and we want to change them to blue blocks.
Initialization:
IDs - A objective (data structure) for holding each marker's ID (score)
numIDs - An integer variable for holding number of IDs/markers active
Create one marker at selected grid position with ID [1] (aka give it a score of 1 in the "IDs" objective). This grid position will be a filled square from which to start replacing blocks.
Increment numIDs
Main program:
FOR loop that goes from 1 to numIDs
{
at marker with ID [1], fill grid square with blue block
step 1. test block one to the +x for a red block
step 2. if found, create marker there with ID [numIDs]
step 3. increment numIDs
[//repeat steps 1 2 and 3 for the other five adjacent grid squares: +z, -x, -z, +y, and -y]
delete stand[1]
numIDs -= 1
subtract 1 from every marker's ID's, so that the next marker to evaluate, which was [2], now has ID [1].
} (end loop)
So that's what I came up with, and it works like a charm. Sorry if my explanation is hard to understand, I'm trying to explain in a way that might make sense to both coders and Minecraft players, and maybe achieving neither :P
This question already has an answer here:
save high resolution figures with parfor in matlab
(1 answer)
Closed 8 years ago.
I've got a ~1600 line program that reads in images (either tiff or raw), performs a whole bunch of different mathematical and statistical analyses, and then outputs graphs and data tables at the end.
Almost two-thirds of my processing time is due to looping 16 times over the following code:
h = figure('Visible','off','units','normalized','outerposition',[0 0 1 1]);
set(h,'PaperPositionMode','auto');
imagesc(picdata); colormap(hot);
imgtmp = hardcopy(h,'-dzbuffer','-r0');
imwrite(imgtmp,hot,'picname.png');
Naturally, 'picname.png' and picdata are changing each time around.
Is there a better way to invisibly plot and save these pictures? The processing time mostly takes place inside imwrite, with hardcopy coming second. The whole purpose of the pictures is just to get a general idea of what the data looks like; I'm not going to need to load them back into Matlab to do future processing of any sort.
Try to place the figure off-screen (e.g., Position=[-1000,-1000,500,500]). This will make it "Visible" and yet no actual rendering will need to take place, which should make things faster.
Also, try to reuse the same figure for all images - no need to recreate the figure and image axes and colormap every time.
Finally, try using my ScreenCapture utility rather than hardcopy+imwrite. It uses a different method for taking a "screenshot" which may possibly be faster.
I have a long time series with some repeating and similar looking signals in it (not entirely periodical). The length of the time series is about 60000 samples. To identify the signals, I take out one of them, having a length of around 1000 samples and move it along my timeseries data sample by sample, and compute cross-correlation coefficient (in Matlab: corrcoef). If this value is above some threshold, then there is a match.
But this is excruciatingly slow (using 'for loop' to move the window).
Is there a way to speed this up, or maybe there is already some mechanism in Matlab for this ?
Many thanks
Edited: added information, regarding using 'xcorr' instead:
If I use 'xcorr', or at least the way I have used it, I get the wrong picture. Looking at the data (first plot), there are two types of repeating signals. One marked by red rectangles, whereas the other and having much larger amplitudes (this is coherent noise) is marked by a black rectangle. I am interested in the first type. Second plot shows the signal I am looking for, blown up.
If I use 'xcorr', I get the third plot. As you see, 'xcorr' gives me the wrong signal (there is in fact high cross correlation between my signal and coherent noise).
But using "'corrcoef' and moving the window, I get the last plot which is the correct one.
There maybe a problem of normalization when using 'xcorr', but I don't know.
I can think of two ways to speed things up.
1) make your template 1024 elements long. Suddenly, correlation can be done using FFT, which is significantly faster than DFT or element-by-element multiplication for every position.
2) Ask yourself what it is about your template shape that you really care about. Do you really need the very high frequencies, or are you really after lower frequencies? If you could re-sample your template and signal so it no longer contains any frequencies you don't care about, it will make the processing very significantly faster. Steps to take would include
determine the highest frequency you care about
filter your data so higher frequencies are blocked
resample the resulting data at a lower sampling frequency
Now combine that with a template whose size is a power of 2
You might find this link interesting reading.
Let us know if any of the above helps!
Your problem seems like a textbook example of cross-correlation. Therefore, there's no good reason using any solution other than xcorr. A few technical comments:
xcorr assumes that the mean was removed from the two cross-correlated signals. Furthermore, by default it does not scale the signals' standard deviations. Both of these issues can be solved by z-scoring your two signals: c=xcorr(zscore(longSig,1),zscore(shortSig,1)); c=c/n; where n is the length of the shorter signal should produce results equivalent with your sliding window method.
xcorr's output is ordered according to lags, which can obtained as in a second output argument ([c,lags]=xcorr(..). Always plot xcorr results by plot(lags,c). I recommend trying a synthetic signal to verify that you understand how to interpret this chart.
xcorr's implementation already uses Discere Fourier Transform, so unless you have unusual conditions it will be a waste of time to code a frequency-domain cross-correlation again.
Finally, a comment about terminology: Correlating corresponding time points between two signals is plain correlation. That's what corrcoef does (it name stands for correlation coefficient, no 'cross-correlation' there). Cross-correlation is the result of shifting one of the signals and calculating the correlation coefficient for each lag.
Can anyone clarify as to how mutiple image descriptors can be combined together. I mean , if I do a normal SIFT , then it gives me a 128xN matrix, where N is the number of descriptors. Now to add the HOG descriptor matrix which can be of a different dimension, what is the procedure (because simply concatenating them does not sound meaningful) ?. The final output of the combination would be used to create the bag of words model using k-means clustering.
Concatenating features does not sound meaningful but you should try. It is called "early fusion". And it can works.
Usually late fusion works better (learning the features separately and then merging the results/output of the two machine learning).
I tested it for combining BoVW and BoW, you should have a look in the paper, at section II, part C "multimodal fusion techniques".