FPDF with lots of images, any way to speed it up? - performance

I'm using FPDF with PHP and need to print an order manifest. This manifest will have up to 200-300 products with images. Creating it at this point is quite slow, and the images are stored on AmazonS3. Any idea if this could be sped up?
Right now just with images of about 15X15 mm it generates a file size of about 16mb and takes 3 1/2 to 4 minutes, which without the images is only about 52k and comes up almost instantly.
Of course, it may just be downloading that many images about which there's not really much I can do.

I suggest you to try img2pdf.
While this module offers much less options for interacting with PDFs compared with fpdf, if you are only interested in combining images into a PDF file, this is probably the best module you can use. It is fast. It is easy to use.
Here is an example code:
import img2pdf
filename = "mypdf.pdf"
images = ["image1.jpg", "image2.jpg"]
with open(filename,"wb") as f:
f.write(img2pdf.convert(images))
I used it to combine 400 images - it only took a second or so.

I found the extension i mentioned in my comment above:
http://fpdf.org/en/script/script76.php
this seems to reduce the time it takes a little for myself, you may have better results as your document is much larger than mine.

Related

Can tesseract be trained for non-font symbols?

I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:
There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).
I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.
I've been reviewing the 3.05 documentation on training tesseract:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method
Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?
I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?
This has been my non-tesseract solution to this, until someone proves there's a better way. I've setup:
Caffe: http://caffe.berkeleyvision.org/install_osx.html
Digits: https://github.com/NVIDIA/DIGITS/blob/master/docs/BuildDigits.md
Getting these to running was the hardest part. Next, I used my dataset to train a new caffe network. I prepared my dataset into a single depth folder structure:
./card
./card/2c
./card/2d
./card/2h
./card/2s
./card/3c
./card/3d
./card/3h
./card/3s
./card/4c
./card/4d
./card/4h
./card/4s
./card/5c
./card/5d
./card/5h
./card/5s
./card/6c
./card/6d
./card/6h
./card/6s
./card/7c
./card/7d
./card/7h
./card/7s
./card/8c
./card/8d
./card/8h
./card/8s
./card/9c
./card/9d
./card/9h
./card/9s
./card/_noise
./card/_table
./card/Ac
./card/Ad
./card/Ah
./card/As
./card/Jc
./card/Jd
./card/Jh
./card/Js
./card/Kc
./card/Kd
./card/Kh
./card/Ks
./card/Qc
./card/Qd
./card/Qh
./card/Qs
./card/Tc
./card/Td
./card/Th
./card/Ts
Within Digits, I chose:
Datasets tab
New Dataset Images
Classification
I pointed it to my card folder, e.g: /path/to/card
I set the validation % to 13.0%, based on the discussion here: https://stackoverflow.com/a/13612921/880837
After creating the dataset, I opened the models tab
Chose my new dataset.
Chose the GoogLeNet under Standard Networks, and left it to train.
I did this several times, each time I had new images in the dataset. Each learning session took 6-10 hours, but at this stage I can use my caffemodel to programmatically estimate what each image is expected to be, using this logic:
https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp
The results are either a card (2c, 7h, etc), noise, or table. Any estimates with an accuracy bigger than 90% are most likely correct. The latest run correctly recognised 300 out of 400 images, with only 3 mistakes. I'm adding new images to the dataset and retraining the existing model, further tuning the result accuracy. Hope this is valuable to others!
While I wanted the high level steps here, this was all done with large thanks to David Humphrey and his github post, I really recommend reading it and trying it out if you're interested in learning more: https://github.com/humphd/have-fun-with-machine-learning

Web site loading speed is slow

My website http://theminimall.com is taking more loading time than before
initially i had ny server in US at that time my website speed is around 5 sec.
but now i had transferred my server to Singapore and loading speed is got increased is about 10 sec.
the more waiting time is going in getting result from Store Procedure(sql server database)
but when i execute Store Procedure in Sql Server it is returning result very fast
so i assume that the time taken is not due to the query execution delay but the data transfer time from the sql server to the web server how can i eliminate or reduce the time taken any help or advice will be appreciated
thanks in advance
I took a look at your site on websitetest.com. You can see the test here: http://www.websitetest.com/ui/tests/50c62366bdf73026db00029e.
I can see what you mean about the performance. In Singapore, it's definitely fastest, but even there its pretty slow. Elsewhere around the world it's even worse. There are a few things I would look at.
First pick any sample, such as http://www.websitetest.com/ui/tests/50c62366bdf73026db00029e/samples/50c6253a0fdd7f07060012b6. Now you can get some of this info in the Chrome DevTools, or FireBug, but the advantage here is seeing the measurements from different locations around the world.
Scroll down to the waterfall. All the way on the right side of the Timeline column heading is a drop down. Choose to sort descending. Here we can see the real bottlenecks. The first thing in the view is GetSellerRoller.json. It looks like hardly any time is spent downloading the file. Almost all the time is spent waiting for the server to generate the file. I see the site is using IIS and ASP.net. I would definitely look at taking advantage of some server-side caching to speed this up.
The same is true for the main html, though a bit more time is spent downloading that file. Looks like its taking so long to download because it's a huge file (for html). I would take the inline CSS and JS out of there.
Go back to the natural order for the timeline, then you can try changing the type of file to show. Looks like you have 10 CSS files you are loading, so take a look at concatenating those CSS files and compressing them.
I see your site has to make 220+ connection to download everything. Thats a huge number. Try to eliminate some of those.
Next down the list I see some big jpg files. Most of these again are waiting on the server, but some are taking a while to download. I looked at one of a laptop and was able to convert to a highly compressed png and save 30% on the size and get a file that looked the same. Then I noticed that there are well over 100 images, many of which are really small. One of the big drags on your site is that there are so many connections that need to be managed by the browser. Take a look at implementing CSS Sprites for those small images. You can probably take 30-50 of them down to a single image download.
Final thing I noticed is that you have a lot of JavaScript loading right up near the top of the page. Try moving some of that (where possible) to later in the page and also look into asynchronously loading the js where you can.
I think that's a lot of suggestions for you to try. After you solve those issues, take a look at leveraging a CDN and other caching services to help speed things up for most visitors.
You can find a lot of these recommendations in a bit more detail in Steve Souder's book: High Performance Web Sites. The book is 5 years old and still as relevant today as ever.
I've just taken a look at websitetest.com and that website is completely not right at all, my site is amoung the 97% fastest and using that website is says its 26% from testing 13 locations. Their servers must be over loaded and I recommend you use a more reputatable testing site such as http://www.webpagetest.org which is backed by many big companies.
Looking at your contact details it looks like the focus audience is India? if that is correct you should use hosting where-ever your main audience is, or closest neighbor.

Creating a SWC image/file library at runtime using external files

I'm working on a Flash GUI project which has many images need to be dynamically loaded at runtime.
Problem:
Currently everytime a class initializes, it loads its assets (images) from HDD, but that usually takes too long (for example: I have a list of 100 items, each item has the same background, which is a PNG image stored on HDD, but it has to load the image 100 times from HDD to render the list, because the item's class gets to be initialized 100 times). Also, I want assets to be hidden from the users, so I want to pack it up somehow, into a single file.
Solution:
I think of SWC. I heard it's sort of library for Flash. But I have almost no experience on working with SWC. And there are too many images, would take very long to manually import and put class name for each of them in the FLA library. But I already have an XML file which stores the class names and the path to each class' assets. So I can load all the images into a variable, but I don't know how to actually write that variable into a SWC file on HDD to load it later as a library.
[MyButton.png] --load to RAM--> [myButton:Bitmap] --write to SWC file on HDD--> [Assets.swc] --import the SWC file at runtime--> [addChild(assets.myButton)]
The text in bold is the part I'm missing.
Thanks for your time! Any help is greatly appreciated.
SWC is a file that you "precompile" it's pretty much the same as a swf, but really nothing that you "create on the fly". The biggest difference is that a swc is something that is "compiled into" an swf and not loaded dynamically. That is, you can't load a swc-file during runtime, it is provided during compile time.
So, every picture added to the swc will increase its' size, the good thing is that it can be shared between different swf-files.
Now, correct me if I understood you wrong, but it seems like you reload the picture from hard drive whenever that picture is used? So 100 instances of "Ball" which is linked to the picture "Ball.png" would load that file 100 times?
If that is the case, why not just create an ImageManager and let that one keep one instance of the loaded images and then share it among all the instances that uses that image?
AFAIK there is no easy way to do this, however I wrote a blog post (since I couldn't find a better way to give you that solution) if you are interested in an example with caching loaded images.
It's pretty naive and revolves around a static ImageManager, loading only images, caching them by their url-id and then providing a new instance of the bitmapdata if they are allready loaded. However, it works like a charm and it is WAY MORE efficient than always loading the image from hard drive.
You can find the blog post here: http://messer1024.blogspot.se/2012/12/caching-loaded-images-in-as3.html

Fastest? ClientBundle vs plain URL images

Right now a large application I'm working on downloads all small images separately and usually on demand. About 1000 images ranging from 20 bytes to 40kbytes. I'm trying to figure out if there will be any client performance improvements by using a ClientBundle for the smaller most used ones.
I'm putting the 'many connections high latency' issue for the side now and just concentrate on javascript/css/browser performance.
Some of the images are used directly within CSS. Are there any performance improvements by "spriting" them vs using as usual?
Some images are created as new Image(url). Is it better to leave them this way, move them into CSS and apply styles dinamically or load from a ClientBundle?
Some actions have a result a setURL on an image. I've seen that the same code can be done with ClientBundle and it will probably set the dataURI for that image. Will doing improve performance or is it faster this way?
I'm specifically talking about runtime more than startup time, since this is an application which sees long usage times and all images will probably be cached in the first 10 minutes, so round-trip is not an issue (for now).
Short answer is not really (for FF, chrome, safari, opera) BUT sometimes for IE (<9)!!!
Lets look at what client bundle does
Client bundle packages every image into one ...bundle... so that all you need is one http connection to get all of them... and it requires only one freshness lookup the next time you load your application. (rather than n times, n being the number of your tiny images.. really wasteful.)
So its clear that client bundle greatly improves your apps load time.
Runtime Performance
There maybe times when one particular image fails to get downloaded or gets lost over the internet. If you make 1000 connections, the probability of something going wrong increases (however little). FF, Chrome, Safari, Opera simply put the image not found logo and move on with the running. IE <9 however, will keep trying to get those particular images, using up one connection of the two its allowed. That really impacts performance in IE.
Other than that, there will be some performance improvement if you keep loading new widgets asynchronously and they end up downloading images at a later stage.
Jai

benchmark for using base64 string instead of image reducing http requests

I am looking at replacing source of my images currently set to a image file in my css to a base64 string. Instead of the browser needing to make several calls, one for the CSS file and then one for each image, base64 embedding means that all of the images are embedded within the CSS file itself.
So I am currently investigating introducing this. However I have an issue I would like some advice on, a known problem with this approach. That is in my tests a base64 encoded string image is somewhere around 150% the size of a regular image. This means it’s unusable for large images. While I am not too concerned regarding larger images, I am not sure when I should and shouldn't use it.
Is there a benchmark I should use, as in if the base64 more than 150% larger I should not use it etc?
What are others views on this and what from your own experiences may help with the decision of when to and not to use it?
Base64 encoding always uses 4 output bytes for every 3 input bytes. It works by using essentially 6 bits of each output byte, mapped to characters that are safe to use. So you'll always see a consist 133% increase for anything you base64 encode, rounded up for the last chunk of 4 bytes. You can use gzip compression of your responses to gain some of this loss back.
This works in only handful of browsers. I would not recommend it. Especially for mobile browsers.
Images get cached on browser if you configure webserver properly. So, images don't get downloaded over and over again. They come from cache and thus super fast. There are various easy performance configuration you can do on your webserver to make this work over the base64 encoding of images embedded in CSS.
Take a look at this for some easy ways to boost website performance:
http://omaralzabir.com/making_best_use_of_cache_for_high_performance_website/
You are hopefully serving your HTML and CSS files gzipped. I tested this on a JPEG photo: I base64 encoded and gzipped it and the result was pretty close to the original image file size. So no difference there.
If you're doing it right, you end up with less requests per page but with approximately the same page size with base64 encoding.
The problem is with caching when you change something. Let's say you have 10 images embedded in a single CSS file. If you make any change to the CSS styles or to any single image, the users need to download that whole CSS file with all the embedded images again. You really need to judge yourself if this works for your site.
Base64 encoding requires very close to 4/3 of the original number of bytes, so a fair amount less than 150%, more like 133%.
I can only suggest that you benchmark this yourself and see whether your particular needs are better satisfied with the more complex approach, or whether you're better served sticking with the norm.

Resources