This will be a bit subjective, I'm afraid, but I'd value the advice of the Collective.
Our web application lists documents that users can download; standard file navigator stuff:
Type Name Created Size
-----------------------------------
PDF Doc 1 01/04/2010 15 KB
PDF Doc 2 01/04/2010 15 MB
Currently we list the file size as text, but I'd like to improve this by having some way of showing visually whether the file is tiny, normal or huge.
The reason for this is so that users can scan the list quickly and spot files that are likely to take a long time downloading.
My options currently are:
Bigger font sizes for bigger files (drawback: the layout can become untidy)
Icons (like a wi-fi signal strength indicator; drawback: harder to scan)
Keep all sizes in KB so the number of zeroes indicates size (drawback: users have to calculate the "friendly" size in their heads)
I know this is quite a minor thing, but I'd appreciate anyone's thoughts on the matter!
Edit: Thanks for the answers!
From what you've said, I think that:
I really like Robert's idea of telling users roughly how long it will take to download the file
As someone pointed out, if I use a bar or "signal strength" icon, that gives the impression of a "maximum" file size
I like shading the text - stronger for larger files
I'm going to go with a combination approach:
Uniform font size
Darker text for larger files
A tooltip telling users roughly how long it will take to download
A tiny piece of text, in brackets, after the size, describing how big it is, e.g.:
15 KB (tiny)
2 MB (small)
20 MB (big)
300 MB (huge)
I'll see if I can put a screenshot on here of how it looks when I've got a prototype. Again, thanks for the feedback!
If it were me, I would show the size of the file in the usual way, but also display an estimated time to download (Assume 1.5 MBit DSL for your calculations).
How about a bar whose length depends on the size. This is similar to wi-fi signal icon idea but scanning would be easier.
The colors would start out at green and go to red as the length increases.
If you have the space, I’d go with your idea of keeping all file sizes in constant units so the order of magnitude is indicated by the number of places consumed. With right-aligned numbers, that will make it easy enough to scan for a particular order of magnitude.
Keep in mind you gain about three places of space with this approach because you eliminate the units column, instead putting the units in the file size column header, so this won't be that much of a space hog. To save a little more space, consider showing sizes in MB resolved to 0.1 MB. For downloading duration with today’s broadband, once you account for server response time and variation, anything under 0.1 MB is going to seem to have about the same duration. It'll take no longer than loading a new web page, and users don't expect/need duration estimates for that. You can write it as “under 0.1” for files less than 50kB. Maybe even resolving to 1 MB is good enough if you really need the space.
A linear graphical representation of file size (e.g., a bar graph) is better for assessing relative download times. However, I can’t see it working well when your download durations span three or more orders of magnitude. Users will likely want to distinguish a 5 versus 10 minute download, so you need a visually noticeable difference of about 2 MB. I'd say you need at least 3 pixels for 2 MB for a bar graph, which pretty much rules out representing files of a GB or more.
You could try to linearly represent GB, MB, and kB with separate graphics, but such displays can be notoriously hard to read and harder to scan (e.g., multi-hand altimeters have been largely abandoned in aircraft because of reading errors). I wouldn’t try something like that unless your users get training or a lot of experience with it.
Trying to rank or categorize files sizes with icons, colors, font size, or number of symbols is problematic unless you know the proper breakpoints for your users. However, you probably can’t know because the threshold of acceptable duration is going to vary by the user, their equipment, and their situation (how much time they have). I wouldn’t use red for any file size unless you want some users thinking the file is so large that downloading might damage their computer or cause some other technical problem.
Codes good for ranking, like font size and number of symbols, may also be problematic because users might assume they are linearly related to time, when you’d probably need to use a logarithmic transform. Writing out the sizes in constant units doesn’t have this issue because it’s clear the number of places is logarithmically related to size, even for users who don't know what a logarithm is. If you want to try some sort of ranking symbol, I suggest representing size by the volume of 3-D solids (e.g., various sizes of cubes). This may help users understand that one step means a nonlinear increase in size. Of course, any graphic coding using more than one dimension can have row spacing issues in your table.
If you can’t use constant units, then graphically distinguishing the kB, MB, GB symbols is a good alternative. I’d consider using font weight for that. It’s somewhat scanable, but its real function is to increase the chance of users noticing the different units, not to help scan for files of a particular size range. This is fine if users are going to download the file anyway, but just want to be able to plan for the download time.
Actually, if the task really is about users finding files of a particular size range, sorting or filtering the list by file size (by default or as a user option) is probably the best solution.
If you only have a single range step (ie only kb or mb, or only mb or gb) then I would use size + the lowest unit. eg. 15000kb, 15kb. If you have to do, kb, mb, gb then that isn't going to work.
What about a simple + ++ +++ or $ $$ $$$ after the size to show kb, mb, gb?
What about a size indicator in the style of a progress bar?
Another way would be gray-scaling: Tiny files light-gray, big ones black.
What about using colors ? Something like :
green if the file is less than 1MB
yellow if the file is between 1MB and 10MB
red if the file is more than 10MB
or any other scale that fits the kind of files you have to deal with...
You could put those colors either on the background or text color of the line describing your file, or on an icon near the size...
There are lots and lots of ways of doing it, although a logarithmic scale is definitely going to be necessary. I suggest using a field which has a character which is larger and repeated more for each power of 1000 or 1024. Like this:
Type Name Created Size
-------------------------------------
PDF Doc 0 01/04/2010 15 B
PDF Doc 1 01/04/2010 15 KB .
PDF Doc 2 01/04/2010 15 MB ::
PDF Doc 3 01/04/2010 15 GB |||
PDF Doc 4 01/04/2010 15 TB TTTT
PDF Doc 5 01/04/2010 15 PB PPPPP
PDF Doc 6 01/04/2010 15 EB EEEEEE
I like the KB idea b/c bigger numbers will stand out. Then set limits and possibly use css colors to highlight... green is shorter times, the darker the red, the longer, etc.
Related
I'm experimenting with cache blocking. To do that, I implemented 2 convolution based smoothing algorithms. The gaussian kernel I'm using looks like this:
The first algorithm is just the simple double for loop, looping from left to right, top to bottom as shown below.
Image source: (https://people.engr.ncsu.edu/efg/521/f02/common/lectures/notes/lec9.html)
In the second algorithm I tried to play with cache blocking by spliting the loops into chunks, which became something like the following. I used a BLOCK size of 512x512.
Image source: (https://people.engr.ncsu.edu/efg/521/f02/common/lectures/notes/lec9.html)
I'm running the code on a raspberry pi 3B+, which has a Cortex-A53 with 32KB of L1 and 256KB of L2, I believe. I ran the two algorithms with different image sizes (2048x1536, 6000x4000, 12000x8000, 16000x12000. 8bit gray scale images). But across different image sizes, I saw the run time being very similar.
The question is shouldn't the first algorithm experience access latency which the second should not, especially when using large size image (like 12000x8000). Base on the description of cache blocking in this link, when processing data at the end of image rows using the 1st algorithm, the data at the beginning of the rows should have been evicted from the L1 cache. Using 12000x8000 size image as an example, since we are using 5x5 kernel, 5 rows of data is need, which is 12000x5=60KB, already larger than the 32KB L1 size. When we start processing data for a new row, 4 rows of previous data are still needed but they are likely gone in L1 so needs to be re-fetched. But for the second algorithm it shouldn't have this problem because the block size is small. Can anyone please tell me what am I missing?
I also profiled the algorithm using oprofile with the following data:
Algorithm 1
event
count
L1D_CACHE_REFILL
13,933,254
PREFETCH_LINEFILL
13,281,559
Algorithm 2
event
count
L1D_CACHE_REFILL
9,456,369
PREFETCH_LINEFILL
8,725,250
So it looks like the 1st algorithm does have more cache miss compared to the second, reflecting by the L1D_CACHE_REFILL counts. But it also has higher data prefetching rate, which maybe due to the simple behavior of the loop. So is the whole story of cache blocking not taking into account data prefetching?
Conceptually, you're right blocking will reduce cache misses by keeping the input window in cache.
I suspect the main reason you're not seeing a speedup is because the cache is prefetching from all 5 input rows. Your performance counters show more prefetch loads in the unblocked implementation. I suspect many textbook examples are out of date since cache prefetching has kept getting better. Intel's L2 cache can detect and prefetch from up to 16 linear streams about 10 years ago, I think.
Assume the filter takes 5 * 5 cycles. So that would be 20.8 ns = 25 / 1.2GHz on RPI3. The IO cost will be reading a 5 high column of new input pixels. The amortized IO cost will be 5 bytes / 20.8ns = 229 MiB/s, which is much less than the ~2 GiB/s DRAM bandwidth. So in theory, the relatively slow computation combined with prefetching (I'm not certain how effective) means that memory access isn't a bottleneck.
Try increasing the filter height. The cache can only detect and prefetch from a certain # streams. Or try vectorizing the computation so that memory access becomes the bottleneck.
I probably won't pursue this but I had this idea of generating a procedural universe in the most memory efficient way possible.
Like in the game Elite, you could use a random number generator based on a seed, and so each star system can be represented by a single seed number instead of lists of stats and other info. But if each star system is a 64 bit number, to make the milky way, 100 billion stars, that is 6.4 terabytes of memory. But if you use only 8-bits per star system, you'll only have 256 unique star systems in your game. So my other idea was to have each star system represented by 8 bits, but simply grab the next 7 star system's bytes in memory and use that combination to form a 64 bit number for the planet's seed. Obviously there would be 7 extra bytes at the end to account for the last star system in memory.
So is there any way to organize the values in these bytes such that every set of 8 bytes over the entire file covers all 64 bit values (hypothetically) with no repeats? Or is it impossible and I should just accept repeats? Or could I possibly use the address of the byte itself as part of the seed? So how would that work in C? Like if I have a file of 100 billion bytes, does that actually take up exactly 100 billion bytes in memory or is it more and how are the addresses for those bytes stored? And is accessing large files like that (like 100gb+) in a server client relationship practical? Thank you.
I am just creating a normal image classifier for rock-paper-scissors.I am using my local gpu itself and it isnt a high end gpu. When i began training the model it kept giving the error:
ResourceExhaustedError: OOM when allocating tensor with shape.
I googled this error and they suggested I decrease my batch size which i did. It still did not solve anything however later I changed my image size to 50*50 initially it was 200*200 and then it started training with an accuracy of 99%.
Later i wanted to see if i could do it with 150*150 sized images as i found a tutorial on the official tensorflow channel on youtube I followed their exact code and it still did not work. I reduced the batch size, still no solution. Later I changed the no. of units in the dense layer initially it was 512 and then i decreased it to 200 and it worked fine but now the accuracy is pretty trash. I was just wondering is there anyway I could tune my model according to my gpu without affecting my accuracy? So I was just wondering how does the no. of units in the dense layer matter? It would really help me alot.
i=Input(shape=X_train[0].shape)
x=Conv2D(64,(3,3),padding='same',activation='relu')(i)
x=BatchNormalization()(x)
x=Conv2D(64,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=Conv2D(128,(3,3),padding='same',activation='relu')(x)
x=BatchNormalization()(x)
x=MaxPool2D((2,2))(x)
x=Flatten()(x)
x=Dropout(0.2)(x)
x=Dense(512,activation='relu')(x)
x=Dropout(0.2)(x)
x=Dense(3,activation='softmax')(x)
model=Model(i,x)
okay now when I run this with image size of 150*150 it throws that error,
if I change the size of the image to 50*50 and reduce batch size to 8 it works and gives me an accuracy of 99. but if I use 150*150 and reduce the no. of units in the dense layer to 200(random) it works fine but accuracy is very very bad.
I am using a low end nvidia geforce mx 230 gpu.
And my vram is 4 gigs
For 200x200 images the output of the last MaxPool has a shape of (50,50,128) which is then flattened and serves as the input of the Dense layer giving you in total of 50*50*128*512=163840000 parameters. This is a lot.
To reduce the amount of parameters you can do one of the following:
- reduce the amount of filters in the last Conv2D layer
- do a MaxPool of more than 2x2
- reduce the size of the Dense layer
- reduce the size of the input images.
You have already tried the two latter options. You will only find out by trial and error which method ultimately gives you the best accuracy. You were already at 99%, which is good.
If you want a platform with more VRAM available, you can use Google Colab https://colab.research.google.com/
One byte is used to store each of the three color channels in a pixel. This gives 256 different levels each of red, green and blue. What would be the effect of increasing the number of bytes per channel to 2 bytes?
2^16 = 65536 values per channel.
The raw image size doubles.
Processing the file takes roughly 2 times more time ("roughly", because you have more data, but then again this new data size may be better suited for your CPU and/or memory alignment than the previous sections of 3 bytes -- "3" is an awkward data size for CPUs).
Displaying the image on a typical screen may take more time (where "a typical screen" is 24- or 32-bit and would as yet not have hardware acceleration for this particular job).
Chances are you cannot use the original data format to store the image back into. (Currently, TIFF is the only file format I know that routinely uses 16 bits/channel. There may be more. Can yours?)
The image quality may degrade. (If you add bytes you cannot set them to a sensible value. If 3 bytes of 0xFF signified 'white' in your original image, what would be the comparable 16-bit value? 0xFFFF, or 0xFF00? Why? (For either choice-- and remember, you have to make a similar choice for black.))
Common library routines may stop working correctly. Only the very best libraries are data size-ignorant (and they'd still need to be rewritten to make use of this new size.)
If this is a real world scenario -- say, I just finished writing a fully antialiased graphics 2D library, and then my boss offhandedly adds this "requirement" -- it'd have a particular graphic effect on me as well.
i have a webhosting that gives maximum memory_limit of 80M (i.e. ini_set("memory_limit","80M");).
I'm using photo upload that uses the function imagecreatefromjpeg();
When i upload large images it gives the error
"Fatal error: Allowed memory size of 83886080 bytes exhausted"
What maximum size (in bytes) for the image i can restrict to the users?
or the memory_limit depends on some other factor?
The memory size of 8388608 is 8 Megabytes, not 80. You may want to check whether you can still increase the value somewhere.
Other than that, the general rule for image manipulation is that it will take at least
image width x image height x 3
bytes of memory to load or create an image. (One byte for red, one byte for green, one byte for blue, possibly one more for alpha transparency)
By that rule, a 640 x 480 pixel image will need at least 9.2 Megabytes of space - not including overhead and space occupied by the script itself.
It's impossible to determine a limit on the JPG file size in bytes because JPG is a compressed format with variable compression rates. You will need to go by image resolution and set a limit on that.
If you don't have that much memory available, you may want to look into more efficient methods of doing what you want, e.g. tiling (processing one part of an image at a time) or, if your provider allows it, using an external tool like ImageMagick (which consumes memory as well, but outside the PHP script's memory limit).
Probably your script uses more memory than the just the image itself. Trying debugging your memory consumption.
One quick-and-dirty way is to utilize memory_get_usage and memory_get_usage memory_get_peak_usage on certain points in your code and especially in a custom error_handler and shutdown_function. This can let you know what exact operations causes the memory exhaustion.