what are the dimensions of the largest usable JPEG image in GAE? - image

There are a couple limits on the size of images when you start to talk about Google's App Engine:
10 MB -- the upload limit
1 MB -- the manipulation limit (I do not know what else to call this)
But, folks have reported that they have exceeded the manipulation limit while working with images that are smaller than 1MB...
So, it seems there is another limit that is coming into play. My guess is there is some limit to the size of the image after it has been transformed into 24/32 bit pixels.

here is the documentation for Java and here for Python.

Related

In a normal image classification using cnn's? what should be the value of the units in the dense layer?

I am just creating a normal image classifier for rock-paper-scissors.I am using my local gpu itself and it isnt a high end gpu. When i began training the model it kept giving the error:
ResourceExhaustedError: OOM when allocating tensor with shape.
I googled this error and they suggested I decrease my batch size which i did. It still did not solve anything however later I changed my image size to 50*50 initially it was 200*200 and then it started training with an accuracy of 99%.
Later i wanted to see if i could do it with 150*150 sized images as i found a tutorial on the official tensorflow channel on youtube I followed their exact code and it still did not work. I reduced the batch size, still no solution. Later I changed the no. of units in the dense layer initially it was 512 and then i decreased it to 200 and it worked fine but now the accuracy is pretty trash. I was just wondering is there anyway I could tune my model according to my gpu without affecting my accuracy? So I was just wondering how does the no. of units in the dense layer matter? It would really help me alot.
i=Input(shape=X_train[0].shape)
x=Conv2D(64,(3,3),padding='same',activation='relu')(i)
x=BatchNormalization()(x)
x=Conv2D(64,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x=Dense(512,activation='relu')(x)
x=Dropout(0.2)(x)
x=Dense(3,activation='softmax')(x)
model=Model(i,x)
okay now when I run this with image size of 150*150 it throws that error,
if I change the size of the image to 50*50 and reduce batch size to 8 it works and gives me an accuracy of 99. but if I use 150*150 and reduce the no. of units in the dense layer to 200(random) it works fine but accuracy is very very bad.
I am using a low end nvidia geforce mx 230 gpu.
And my vram is 4 gigs
For 200x200 images the output of the last MaxPool has a shape of (50,50,128) which is then flattened and serves as the input of the Dense layer giving you in total of 50*50*128*512=163840000 parameters. This is a lot.
To reduce the amount of parameters you can do one of the following:
- reduce the amount of filters in the last Conv2D layer
- do a MaxPool of more than 2x2
- reduce the size of the Dense layer
- reduce the size of the input images.
You have already tried the two latter options. You will only find out by trial and error which method ultimately gives you the best accuracy. You were already at 99%, which is good.
If you want a platform with more VRAM available, you can use Google Colab https://colab.research.google.com/

How can I speed up gzip compression with h5py?

I'm trying to store the frames of an mp4 video into hdf5 using h5py. At first I tried simply not compressing the data. This caused a 5000 MB video to be about 500 GBs when stored in hdf5. I'm experimenting with using gzip compression to make the dataset more managable, but using the compression it takes about a minute to store a single frame of the video. Here is a minimal code example
import h5py
hdf5 = h5py.File(file, mode='a')
dset = hdf5.create_dataset(dset_name, shape=(70000, 1080, 1920, 3),
dtype=np.uint8, chunks=True, compression='gzip')
for i, frame in enumerate(video_stream):
dset[i] = frame
Each video has about 70e3 of 1080p rbg images. video_stream is an object that returns (1080, 1920, 3) arrays when iterated over. You can look at it here if you think that's important. So how can I stored this data into hdf5 at a reasonable speed and end up with a reasonable file size? Is it possible to get close to mp4 compression?
MP4 is a quite advanced standard, specifically designed to store video, with often hardware acceleration. You see its efficiency when it manages to pack more than 400 billion values in just 5 billion bytes.
HDF5 is not a video standard, GZip isn't very well suitable for video either. Python probably doesn't matter a lot as the gzip compression is probably in C anyway, but it should be noted that the code is single-threaded. In summary, you're not going to get anything close to MP4.
To be honest, why are you even trying? I suspect you don't have much affinity with video data yet.

What would be the effect of increasing the number of bytes?

One byte is used to store each of the three color channels in a pixel. This gives 256 different levels each of red, green and blue. What would be the effect of increasing the number of bytes per channel to 2 bytes?
2^16 = 65536 values per channel.
The raw image size doubles.
Processing the file takes roughly 2 times more time ("roughly", because you have more data, but then again this new data size may be better suited for your CPU and/or memory alignment than the previous sections of 3 bytes -- "3" is an awkward data size for CPUs).
Displaying the image on a typical screen may take more time (where "a typical screen" is 24- or 32-bit and would as yet not have hardware acceleration for this particular job).
Chances are you cannot use the original data format to store the image back into. (Currently, TIFF is the only file format I know that routinely uses 16 bits/channel. There may be more. Can yours?)
The image quality may degrade. (If you add bytes you cannot set them to a sensible value. If 3 bytes of 0xFF signified 'white' in your original image, what would be the comparable 16-bit value? 0xFFFF, or 0xFF00? Why? (For either choice-- and remember, you have to make a similar choice for black.))
Common library routines may stop working correctly. Only the very best libraries are data size-ignorant (and they'd still need to be rewritten to make use of this new size.)
If this is a real world scenario -- say, I just finished writing a fully antialiased graphics 2D library, and then my boss offhandedly adds this "requirement" -- it'd have a particular graphic effect on me as well.

memory_limit=80M. what is the maximum image size for imagecreateformjpeg()?

i have a webhosting that gives maximum memory_limit of 80M (i.e. ini_set("memory_limit","80M");).
I'm using photo upload that uses the function imagecreatefromjpeg();
When i upload large images it gives the error
"Fatal error: Allowed memory size of 83886080 bytes exhausted"
What maximum size (in bytes) for the image i can restrict to the users?
or the memory_limit depends on some other factor?
The memory size of 8388608 is 8 Megabytes, not 80. You may want to check whether you can still increase the value somewhere.
Other than that, the general rule for image manipulation is that it will take at least
image width x image height x 3
bytes of memory to load or create an image. (One byte for red, one byte for green, one byte for blue, possibly one more for alpha transparency)
By that rule, a 640 x 480 pixel image will need at least 9.2 Megabytes of space - not including overhead and space occupied by the script itself.
It's impossible to determine a limit on the JPG file size in bytes because JPG is a compressed format with variable compression rates. You will need to go by image resolution and set a limit on that.
If you don't have that much memory available, you may want to look into more efficient methods of doing what you want, e.g. tiling (processing one part of an image at a time) or, if your provider allows it, using an external tool like ImageMagick (which consumes memory as well, but outside the PHP script's memory limit).
Probably your script uses more memory than the just the image itself. Trying debugging your memory consumption.
One quick-and-dirty way is to utilize memory_get_usage and memory_get_usage memory_get_peak_usage on certain points in your code and especially in a custom error_handler and shutdown_function. This can let you know what exact operations causes the memory exhaustion.

How to visually represent file size

This will be a bit subjective, I'm afraid, but I'd value the advice of the Collective.
Our web application lists documents that users can download; standard file navigator stuff:
Type Name Created Size
-----------------------------------
PDF Doc 1 01/04/2010 15 KB
PDF Doc 2 01/04/2010 15 MB
Currently we list the file size as text, but I'd like to improve this by having some way of showing visually whether the file is tiny, normal or huge.
The reason for this is so that users can scan the list quickly and spot files that are likely to take a long time downloading.
My options currently are:
Bigger font sizes for bigger files (drawback: the layout can become untidy)
Icons (like a wi-fi signal strength indicator; drawback: harder to scan)
Keep all sizes in KB so the number of zeroes indicates size (drawback: users have to calculate the "friendly" size in their heads)
I know this is quite a minor thing, but I'd appreciate anyone's thoughts on the matter!
Edit: Thanks for the answers!
From what you've said, I think that:
I really like Robert's idea of telling users roughly how long it will take to download the file
As someone pointed out, if I use a bar or "signal strength" icon, that gives the impression of a "maximum" file size
I like shading the text - stronger for larger files
I'm going to go with a combination approach:
Uniform font size
Darker text for larger files
A tooltip telling users roughly how long it will take to download
A tiny piece of text, in brackets, after the size, describing how big it is, e.g.:
15 KB (tiny)
2 MB (small)
20 MB (big)
300 MB (huge)
I'll see if I can put a screenshot on here of how it looks when I've got a prototype. Again, thanks for the feedback!
If it were me, I would show the size of the file in the usual way, but also display an estimated time to download (Assume 1.5 MBit DSL for your calculations).
How about a bar whose length depends on the size. This is similar to wi-fi signal icon idea but scanning would be easier.
The colors would start out at green and go to red as the length increases.
If you have the space, I’d go with your idea of keeping all file sizes in constant units so the order of magnitude is indicated by the number of places consumed. With right-aligned numbers, that will make it easy enough to scan for a particular order of magnitude.
Keep in mind you gain about three places of space with this approach because you eliminate the units column, instead putting the units in the file size column header, so this won't be that much of a space hog. To save a little more space, consider showing sizes in MB resolved to 0.1 MB. For downloading duration with today’s broadband, once you account for server response time and variation, anything under 0.1 MB is going to seem to have about the same duration. It'll take no longer than loading a new web page, and users don't expect/need duration estimates for that. You can write it as “under 0.1” for files less than 50kB. Maybe even resolving to 1 MB is good enough if you really need the space.
A linear graphical representation of file size (e.g., a bar graph) is better for assessing relative download times. However, I can’t see it working well when your download durations span three or more orders of magnitude. Users will likely want to distinguish a 5 versus 10 minute download, so you need a visually noticeable difference of about 2 MB. I'd say you need at least 3 pixels for 2 MB for a bar graph, which pretty much rules out representing files of a GB or more.
You could try to linearly represent GB, MB, and kB with separate graphics, but such displays can be notoriously hard to read and harder to scan (e.g., multi-hand altimeters have been largely abandoned in aircraft because of reading errors). I wouldn’t try something like that unless your users get training or a lot of experience with it.
Trying to rank or categorize files sizes with icons, colors, font size, or number of symbols is problematic unless you know the proper breakpoints for your users. However, you probably can’t know because the threshold of acceptable duration is going to vary by the user, their equipment, and their situation (how much time they have). I wouldn’t use red for any file size unless you want some users thinking the file is so large that downloading might damage their computer or cause some other technical problem.
Codes good for ranking, like font size and number of symbols, may also be problematic because users might assume they are linearly related to time, when you’d probably need to use a logarithmic transform. Writing out the sizes in constant units doesn’t have this issue because it’s clear the number of places is logarithmically related to size, even for users who don't know what a logarithm is. If you want to try some sort of ranking symbol, I suggest representing size by the volume of 3-D solids (e.g., various sizes of cubes). This may help users understand that one step means a nonlinear increase in size. Of course, any graphic coding using more than one dimension can have row spacing issues in your table.
If you can’t use constant units, then graphically distinguishing the kB, MB, GB symbols is a good alternative. I’d consider using font weight for that. It’s somewhat scanable, but its real function is to increase the chance of users noticing the different units, not to help scan for files of a particular size range. This is fine if users are going to download the file anyway, but just want to be able to plan for the download time.
Actually, if the task really is about users finding files of a particular size range, sorting or filtering the list by file size (by default or as a user option) is probably the best solution.
If you only have a single range step (ie only kb or mb, or only mb or gb) then I would use size + the lowest unit. eg. 15000kb, 15kb. If you have to do, kb, mb, gb then that isn't going to work.
What about a simple + ++ +++ or $ $$ $$$ after the size to show kb, mb, gb?
What about a size indicator in the style of a progress bar?
Another way would be gray-scaling: Tiny files light-gray, big ones black.
What about using colors ? Something like :
green if the file is less than 1MB
yellow if the file is between 1MB and 10MB
red if the file is more than 10MB
or any other scale that fits the kind of files you have to deal with...
You could put those colors either on the background or text color of the line describing your file, or on an icon near the size...
There are lots and lots of ways of doing it, although a logarithmic scale is definitely going to be necessary. I suggest using a field which has a character which is larger and repeated more for each power of 1000 or 1024. Like this:
Type Name Created Size
-------------------------------------
PDF Doc 0 01/04/2010 15 B
PDF Doc 1 01/04/2010 15 KB .
PDF Doc 2 01/04/2010 15 MB ::
PDF Doc 3 01/04/2010 15 GB |||
PDF Doc 4 01/04/2010 15 TB TTTT
PDF Doc 5 01/04/2010 15 PB PPPPP
PDF Doc 6 01/04/2010 15 EB EEEEEE
I like the KB idea b/c bigger numbers will stand out. Then set limits and possibly use css colors to highlight... green is shorter times, the darker the red, the longer, etc.

Resources