In a normal image classification using cnn's? what should be the value of the units in the dense layer? - image

I am just creating a normal image classifier for rock-paper-scissors.I am using my local gpu itself and it isnt a high end gpu. When i began training the model it kept giving the error:
ResourceExhaustedError: OOM when allocating tensor with shape.
I googled this error and they suggested I decrease my batch size which i did. It still did not solve anything however later I changed my image size to 50*50 initially it was 200*200 and then it started training with an accuracy of 99%.
Later i wanted to see if i could do it with 150*150 sized images as i found a tutorial on the official tensorflow channel on youtube I followed their exact code and it still did not work. I reduced the batch size, still no solution. Later I changed the no. of units in the dense layer initially it was 512 and then i decreased it to 200 and it worked fine but now the accuracy is pretty trash. I was just wondering is there anyway I could tune my model according to my gpu without affecting my accuracy? So I was just wondering how does the no. of units in the dense layer matter? It would really help me alot.
i=Input(shape=X_train[0].shape)
x=Conv2D(64,(3,3),padding='same',activation='relu')(i)
x=BatchNormalization()(x)
x=Conv2D(64,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x=Dense(512,activation='relu')(x)
x=Dropout(0.2)(x)
x=Dense(3,activation='softmax')(x)
model=Model(i,x)
okay now when I run this with image size of 150*150 it throws that error,
if I change the size of the image to 50*50 and reduce batch size to 8 it works and gives me an accuracy of 99. but if I use 150*150 and reduce the no. of units in the dense layer to 200(random) it works fine but accuracy is very very bad.
I am using a low end nvidia geforce mx 230 gpu.
And my vram is 4 gigs

For 200x200 images the output of the last MaxPool has a shape of (50,50,128) which is then flattened and serves as the input of the Dense layer giving you in total of 50*50*128*512=163840000 parameters. This is a lot.
To reduce the amount of parameters you can do one of the following:
- reduce the amount of filters in the last Conv2D layer
- do a MaxPool of more than 2x2
- reduce the size of the Dense layer
- reduce the size of the input images.
You have already tried the two latter options. You will only find out by trial and error which method ultimately gives you the best accuracy. You were already at 99%, which is good.
If you want a platform with more VRAM available, you can use Google Colab https://colab.research.google.com/

Related

Test Intel Extension for Pytorch(IPEX) in multiple-choice from huggingface / transformers

I am trying out one huggingface sample with SWAG dataset
https://github.com/huggingface/transformers/tree/master/examples/pytorch/multiple-choice
I would like to use Intel Extension for Pytorch in my code to increase the performance.
Here I am using the one without training (run_swag_no_trainer)
In the run_swag_no_trainer.py , I made some changes to use ipex .
#Code before changing is given below:
device = accelerator.device
model.to(device)
#After adding ipex:
import intel_pytorch_extension as ipex
device = ipex.DEVICE
model.to(device)
While running the below command, its taking too much time.
export DATASET_NAME=swag
accelerate launch run_swag_no_trainer.py \
--model_name_or_path bert-base-cased \
--dataset_name $DATASET_NAME \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$DATASET_NAME/
Is there any other method to test the same on intel ipex?
First you have to understand, which factors actually increases the running time. Following are these factors:
The large input size.
The data structure; shifted mean, and unnormalized.
The large network depth, and/or width.
Large number of epochs.
The batch size not compatible with physical available memory.
Very small or high learning rate.
For fast running, make sure to work on the above factors, like:
Reduce the input size to the appropriate dimensions that assures no loss in important features.
Always preprocess the input to make it zero mean, and normalized it by dividing it by std. deviation or difference in max, min values.
Keep the network depth and width that is not to high or low. Or always use the standard architecture that are theoretically proven.
Always make sure of the epochs. If you are not able to make any further improvements in your error or accuracy beyond a defined threshold, then there is no need to take more epochs.
The batch size should be decided based on the available memory, and number of CPUs/GPUs. If the batch cannot be loaded fully in memory, then this will lead to slow processing due to lots of paging between memory and the filesystem.
Appropriate learning rate should be determine by trying multiple, and using that which gives the best reduction in error w.r.t. number of epochs.

Cache blocking brings no improvement for image filter on ARM

I'm experimenting with cache blocking. To do that, I implemented 2 convolution based smoothing algorithms. The gaussian kernel I'm using looks like this:
The first algorithm is just the simple double for loop, looping from left to right, top to bottom as shown below.
Image source: (https://people.engr.ncsu.edu/efg/521/f02/common/lectures/notes/lec9.html)
In the second algorithm I tried to play with cache blocking by spliting the loops into chunks, which became something like the following. I used a BLOCK size of 512x512.
Image source: (https://people.engr.ncsu.edu/efg/521/f02/common/lectures/notes/lec9.html)
I'm running the code on a raspberry pi 3B+, which has a Cortex-A53 with 32KB of L1 and 256KB of L2, I believe. I ran the two algorithms with different image sizes (2048x1536, 6000x4000, 12000x8000, 16000x12000. 8bit gray scale images). But across different image sizes, I saw the run time being very similar.
The question is shouldn't the first algorithm experience access latency which the second should not, especially when using large size image (like 12000x8000). Base on the description of cache blocking in this link, when processing data at the end of image rows using the 1st algorithm, the data at the beginning of the rows should have been evicted from the L1 cache. Using 12000x8000 size image as an example, since we are using 5x5 kernel, 5 rows of data is need, which is 12000x5=60KB, already larger than the 32KB L1 size. When we start processing data for a new row, 4 rows of previous data are still needed but they are likely gone in L1 so needs to be re-fetched. But for the second algorithm it shouldn't have this problem because the block size is small. Can anyone please tell me what am I missing?
I also profiled the algorithm using oprofile with the following data:
Algorithm 1
event
count
L1D_CACHE_REFILL
13,933,254
PREFETCH_LINEFILL
13,281,559
Algorithm 2
event
count
L1D_CACHE_REFILL
9,456,369
PREFETCH_LINEFILL
8,725,250
So it looks like the 1st algorithm does have more cache miss compared to the second, reflecting by the L1D_CACHE_REFILL counts. But it also has higher data prefetching rate, which maybe due to the simple behavior of the loop. So is the whole story of cache blocking not taking into account data prefetching?
Conceptually, you're right blocking will reduce cache misses by keeping the input window in cache.
I suspect the main reason you're not seeing a speedup is because the cache is prefetching from all 5 input rows. Your performance counters show more prefetch loads in the unblocked implementation. I suspect many textbook examples are out of date since cache prefetching has kept getting better. Intel's L2 cache can detect and prefetch from up to 16 linear streams about 10 years ago, I think.
Assume the filter takes 5 * 5 cycles. So that would be 20.8 ns = 25 / 1.2GHz on RPI3. The IO cost will be reading a 5 high column of new input pixels. The amortized IO cost will be 5 bytes / 20.8ns = 229 MiB/s, which is much less than the ~2 GiB/s DRAM bandwidth. So in theory, the relatively slow computation combined with prefetching (I'm not certain how effective) means that memory access isn't a bottleneck.
Try increasing the filter height. The cache can only detect and prefetch from a certain # streams. Or try vectorizing the computation so that memory access becomes the bottleneck.

What would be the effect of increasing the number of bytes?

One byte is used to store each of the three color channels in a pixel. This gives 256 different levels each of red, green and blue. What would be the effect of increasing the number of bytes per channel to 2 bytes?
2^16 = 65536 values per channel.
The raw image size doubles.
Processing the file takes roughly 2 times more time ("roughly", because you have more data, but then again this new data size may be better suited for your CPU and/or memory alignment than the previous sections of 3 bytes -- "3" is an awkward data size for CPUs).
Displaying the image on a typical screen may take more time (where "a typical screen" is 24- or 32-bit and would as yet not have hardware acceleration for this particular job).
Chances are you cannot use the original data format to store the image back into. (Currently, TIFF is the only file format I know that routinely uses 16 bits/channel. There may be more. Can yours?)
The image quality may degrade. (If you add bytes you cannot set them to a sensible value. If 3 bytes of 0xFF signified 'white' in your original image, what would be the comparable 16-bit value? 0xFFFF, or 0xFF00? Why? (For either choice-- and remember, you have to make a similar choice for black.))
Common library routines may stop working correctly. Only the very best libraries are data size-ignorant (and they'd still need to be rewritten to make use of this new size.)
If this is a real world scenario -- say, I just finished writing a fully antialiased graphics 2D library, and then my boss offhandedly adds this "requirement" -- it'd have a particular graphic effect on me as well.

what are the dimensions of the largest usable JPEG image in GAE?

There are a couple limits on the size of images when you start to talk about Google's App Engine:
10 MB -- the upload limit
1 MB -- the manipulation limit (I do not know what else to call this)
But, folks have reported that they have exceeded the manipulation limit while working with images that are smaller than 1MB...
So, it seems there is another limit that is coming into play. My guess is there is some limit to the size of the image after it has been transformed into 24/32 bit pixels.
here is the documentation for Java and here for Python.

How to visually represent file size

This will be a bit subjective, I'm afraid, but I'd value the advice of the Collective.
Our web application lists documents that users can download; standard file navigator stuff:
Type Name Created Size
-----------------------------------
PDF Doc 1 01/04/2010 15 KB
PDF Doc 2 01/04/2010 15 MB
Currently we list the file size as text, but I'd like to improve this by having some way of showing visually whether the file is tiny, normal or huge.
The reason for this is so that users can scan the list quickly and spot files that are likely to take a long time downloading.
My options currently are:
Bigger font sizes for bigger files (drawback: the layout can become untidy)
Icons (like a wi-fi signal strength indicator; drawback: harder to scan)
Keep all sizes in KB so the number of zeroes indicates size (drawback: users have to calculate the "friendly" size in their heads)
I know this is quite a minor thing, but I'd appreciate anyone's thoughts on the matter!
Edit: Thanks for the answers!
From what you've said, I think that:
I really like Robert's idea of telling users roughly how long it will take to download the file
As someone pointed out, if I use a bar or "signal strength" icon, that gives the impression of a "maximum" file size
I like shading the text - stronger for larger files
I'm going to go with a combination approach:
Uniform font size
Darker text for larger files
A tooltip telling users roughly how long it will take to download
A tiny piece of text, in brackets, after the size, describing how big it is, e.g.:
15 KB (tiny)
2 MB (small)
20 MB (big)
300 MB (huge)
I'll see if I can put a screenshot on here of how it looks when I've got a prototype. Again, thanks for the feedback!
If it were me, I would show the size of the file in the usual way, but also display an estimated time to download (Assume 1.5 MBit DSL for your calculations).
How about a bar whose length depends on the size. This is similar to wi-fi signal icon idea but scanning would be easier.
The colors would start out at green and go to red as the length increases.
If you have the space, I’d go with your idea of keeping all file sizes in constant units so the order of magnitude is indicated by the number of places consumed. With right-aligned numbers, that will make it easy enough to scan for a particular order of magnitude.
Keep in mind you gain about three places of space with this approach because you eliminate the units column, instead putting the units in the file size column header, so this won't be that much of a space hog. To save a little more space, consider showing sizes in MB resolved to 0.1 MB. For downloading duration with today’s broadband, once you account for server response time and variation, anything under 0.1 MB is going to seem to have about the same duration. It'll take no longer than loading a new web page, and users don't expect/need duration estimates for that. You can write it as “under 0.1” for files less than 50kB. Maybe even resolving to 1 MB is good enough if you really need the space.
A linear graphical representation of file size (e.g., a bar graph) is better for assessing relative download times. However, I can’t see it working well when your download durations span three or more orders of magnitude. Users will likely want to distinguish a 5 versus 10 minute download, so you need a visually noticeable difference of about 2 MB. I'd say you need at least 3 pixels for 2 MB for a bar graph, which pretty much rules out representing files of a GB or more.
You could try to linearly represent GB, MB, and kB with separate graphics, but such displays can be notoriously hard to read and harder to scan (e.g., multi-hand altimeters have been largely abandoned in aircraft because of reading errors). I wouldn’t try something like that unless your users get training or a lot of experience with it.
Trying to rank or categorize files sizes with icons, colors, font size, or number of symbols is problematic unless you know the proper breakpoints for your users. However, you probably can’t know because the threshold of acceptable duration is going to vary by the user, their equipment, and their situation (how much time they have). I wouldn’t use red for any file size unless you want some users thinking the file is so large that downloading might damage their computer or cause some other technical problem.
Codes good for ranking, like font size and number of symbols, may also be problematic because users might assume they are linearly related to time, when you’d probably need to use a logarithmic transform. Writing out the sizes in constant units doesn’t have this issue because it’s clear the number of places is logarithmically related to size, even for users who don't know what a logarithm is. If you want to try some sort of ranking symbol, I suggest representing size by the volume of 3-D solids (e.g., various sizes of cubes). This may help users understand that one step means a nonlinear increase in size. Of course, any graphic coding using more than one dimension can have row spacing issues in your table.
If you can’t use constant units, then graphically distinguishing the kB, MB, GB symbols is a good alternative. I’d consider using font weight for that. It’s somewhat scanable, but its real function is to increase the chance of users noticing the different units, not to help scan for files of a particular size range. This is fine if users are going to download the file anyway, but just want to be able to plan for the download time.
Actually, if the task really is about users finding files of a particular size range, sorting or filtering the list by file size (by default or as a user option) is probably the best solution.
If you only have a single range step (ie only kb or mb, or only mb or gb) then I would use size + the lowest unit. eg. 15000kb, 15kb. If you have to do, kb, mb, gb then that isn't going to work.
What about a simple + ++ +++ or $ $$ $$$ after the size to show kb, mb, gb?
What about a size indicator in the style of a progress bar?
Another way would be gray-scaling: Tiny files light-gray, big ones black.
What about using colors ? Something like :
green if the file is less than 1MB
yellow if the file is between 1MB and 10MB
red if the file is more than 10MB
or any other scale that fits the kind of files you have to deal with...
You could put those colors either on the background or text color of the line describing your file, or on an icon near the size...
There are lots and lots of ways of doing it, although a logarithmic scale is definitely going to be necessary. I suggest using a field which has a character which is larger and repeated more for each power of 1000 or 1024. Like this:
Type Name Created Size
-------------------------------------
PDF Doc 0 01/04/2010 15 B
PDF Doc 1 01/04/2010 15 KB .
PDF Doc 2 01/04/2010 15 MB ::
PDF Doc 3 01/04/2010 15 GB |||
PDF Doc 4 01/04/2010 15 TB TTTT
PDF Doc 5 01/04/2010 15 PB PPPPP
PDF Doc 6 01/04/2010 15 EB EEEEEE
I like the KB idea b/c bigger numbers will stand out. Then set limits and possibly use css colors to highlight... green is shorter times, the darker the red, the longer, etc.

Resources