Xamarin.Mac PdfView performance problems - macos

I'm creating an app to visualize some experimental data (x,y plots with multiple series) where the primary goal is to generate a pdf file which conforms to scientific publication rules. The pdf generation works well and I can also display the pdf in a PdfKit PdfView. However, when I try to show large pdf files in the app (data series with lots of points (as in 20000)), some background thread will spend minutes (3-4) on generating some sort of cached version which the PdfView uses for faster zoom rendering.
I can get around the issue by setting the PdfView.Document as an empty pdf document and then adding a page afterwards. The PdfView will cache a blank page but I won't spend minutes at 100% CPU usage trying to generate a cached version. However, this introduced another issue, if there is no cached version, the zoom operation will be less pleasant than what is desired.
In addition, I have some sliders which affect the pdf result and the require very fast updates of the displayed pdf.
Is there a better way of displaying pdf files (data)? It would be best if the method allows zooming in the document.
Very slow for large pdf files:
PDFPlotView.Document = pdf;
Fast, but no cached version:
PDFPlotView.Document = new PdfDocument();
PDFPlotView.Document.InsertPage(pdf.GetPage(0), 0);
I've also tried generating a low res image from the pdf and then trying to use that image for cache generation, but it appears that there is a known bug which causes the pdf to become blank:
using (var context = new CGBitmapContext(null, (int)rect.Width, (int)rect.Height, 8, 0, CGColorSpace.CreateDeviceRGB(), CGBitmapFlags.NoneSkipLast)
{
InterpolationQuality = CGInterpolationQuality.None
})
{
context.SetFillColor(new CGColor(1, 1, 1, 1));
context.FillRect(rect);
context.ScaleCTM(1, 1);
context.DrawPDFPage(pdf.Document.GetPage(0));
img = context.ToImage();
}
PdfDocument lowrespdf = new PdfDocument();
NSImage cache = new NSImage(img, rect.Size);
PdfPage page = new PdfPage(cache);
lowrespdf.InsertPage(page, 0);
[UPDATE] Performance issue appears to be related not to the number of
data points, but rather to the length of a single path in the PDF file.
Splitting a problematic data series up in new lines every 1000 data
point results in a cache time of a few seconds (max) compared to a few
minutes for the 25000 data point line.

Related

Adding Images Efficiently to a Google Spreadsheet

I am exploring using a GAS script to build a human-readable product catalogue as a Google Spreadsheet, it's easy to generate a PDF or print from there. The product data is all quickly accessible via an API including image URLs for each product.
I'm running into issues because inserting an image which references a URL, then re-sizing it takes 3-4 seconds in my prototype, and I might have 150x products. Runtime is capped at 6 minutes. Here's a simplified example of the image processing loop that I'm imagining:
function insertImages(sheet, array_of_urls) {
for (var i in array_of_urls) {
let image = sheet.insertImage(list_of_urls[i], 1, (i+1)*3);
image.setWidth(90);
image.setHeight(90);
}
}
I think it takes so long because of the interaction with the UI. Can anyone recommend a way that I could make the script functionally efficient?
Insert images over cells:
If you want the images over cells (that is, not contained in a specific cell), I don't think there's a way to make this significantly faster. There's no method to insert multiple images at once.
You could at most try to retrieve the image blobs, resize the images through some third party before inserting them, and finally insert them via insertImage(blobSource, column, row).
In any case, there are ways to get through the 6 minute execution time limit. See, for example, this answer.
Insert image in cells:
If you don't have a problem having the images in specific cells, and not over cells, I'd suggest adding the images via IMAGE formula, using setFormulas.
The image size can be set through the IMAGE formula, the following way:
=IMAGE("URL", 4, [height in pixels], [width in pixels])
Also, to make sure the cells' height is large enough for the images to be seen, you can use setRowHeights.
Code snippet:
function insertImages(sheet, array_of_urls) {
const formulas = array_of_urls.map(url => ["=IMAGE(\"" + url + "\", 4, 90, 90)"]);
const firstRow = 1;
sheet.getRange(firstRow,1,formulas.length,formulas[0].length).setFormulas(formulas);
sheet.setRowHeights(firstRow, formulas.length, 90);
}

Best way to preload SVG image tags?

I'm building an SVG-based visualization that (partially) relies on showing many images in quick succession. The images can't be fetched fast enough from the network, so they must be preloaded.
My understanding is that SVG doesn't properly cache image tags, at least in major browsers. So JavaScript preloading librairies and techniques (eg. this SO question) won't work. (I could resort to using layered HTML img tags, but because of the specifics of my application, I would like to stick to pure SVG as much as possible)
I see two options:
Encoding the PNG image data as base64, storing it in memory as strings and using the strings to iteratively populate image tags using data:image/png;base64.
Layering many SVG groups on top of each other with all but one set to display: none or visibility: hidden and iteratively unhiding the appropriate group. However, I believe it won't be possible to programatically detect that all images have finished preloading.
What's the best way to preload the image data? Perhaps I've missed an easier option.
I'm not familiar enough with the underlying mechanics of web browsers to know if this will work with svg image tags, but I had success caching images with new Image():
//download low quality images
var imageArray = []
for (var i = 0; i < 600; i++){
imageArray[i] = new Image();
imageArray[i].src = moviePath + (i + 1) + '.jpg.t';
imageArray[i].onload = function(){
var i = this.src.split(movieName + '/')[1].split(".")[0];
d3.select("#bar" + i).style("stroke", 'rgb(' + colors[i].rgb + ')');
}
}
To show an image, I just set the src of the displayed image to one that was already loaded and the browser loads it from its cache.
There is another small trick used later in the source - show a low quality image first and starting loading a high quality one only after a short timeout passes without another image being selected. Then, after the high quality image has loaded, show it only if the same image is still selected.
No idea if these are best practices or anything, but it worked reasonably well.

FDPF: Image is too large for page

I'm using FPDF to create PDFs full of images. Some of these images are by far too long for a page and I need it to spread to the next page. Scaling it to a default page height won't do.
What I'm trying to archive is to automatically insert a page break at some position. I could split the image in parts and insert every part on a new page, but I would very much like to not do that. Is there a way I haven't found yet that FPDF does that for me?
$pdf = new FPDF("P", "mm", "A4");
$pdf->SetAutoPageBreak(true);
$pdf->SetDisplayMode('real');
$pdf->Image($picUrl, 12, $pdf->GetY(), 185);
$pdf->Output($project->getName().".pdf", "D");
Why exactly would you not like to split the image in parts and insert every part on a new page? This seems like the logical way (and it can be automated in PHP if you have the proper libraries).
What you could do is insert the same image over and over again, with different offsets. Then you will have visually the same result, but I don't know if FPDF will be smart enough to not store the image several times.

Serving Images with on-the-fly resize

my company has recently started to get problems with the image handling for our websites.
We have several websites (adult entertainment) that display images like dvd covers, snapshots and similar. We have about 100'000 movies and for each movie we have an average of 30 snapshots + covers. Almost every image has an additional version with blurring and overlay for non-members, this results in about 50 images per movie or a total of 5 million base images. Each of the images is available in several versions, depending on where it's placed on the page (thumbnail, original, small preview, not-so-small preview, small image in the top-list, etc.) which results in more images than i cared to count.
Now i had the idea to use a server to generate the images on-the-fly since it became quite clumsy to generate all the different images for all the different pages (as different pages sometimes even need different image sizes for basically the same task).
Does anyone know of an image processing server that can scale down images on-the-fly so we only need to provide the original images and the web guys can just request whatever size they need?
Requirements:
Very High performance (Several thousand users per day)
On-the-fly blurring and overlay creation
On-the-fly resize (with and without keeping aspect ratio)
Can handle millions of images
Must be able to read JPG, GIF, PNG and BMP and convert between them
Security is not that much of a concern as i.e. the unblurred images can already be reached by URL manipulation and more security would be nice but it's not required and frankly i stopped caring (After failing to get into my coworkers heads why (for our small reseller page) it's a bad idea to use http://example.com/view_image.php?filename=/data/images/01020304.jpg to display the images).
We tried PHP scripts to do this but the performance was too slow for this many users.
Thanks in advance for any suggestions you have.
I suggest you set up a dedicated web server to handle image resize and serve the final result. I have done something similar, although on a much smaller scale. It basically eliminates the process of checking for the cache.
It works like this:
you request the image appending the required size to the filename like http://imageserver/someimage.150x120.jpg
if the image exists, it will be returned with no other processing (this is the main point, the cache check is implicit)
if the image does not exist, handle the 404 not found via .htaccess and reroute the request to the script that generates the image of the required size
in the script specify the list of allowed sizes to avoid attacks like scripts requesting every possible size to shut your server down
keep this on a cookieless domain to minimize unnecessary traffic
EDIT: I don't think that PHP itself would slow the process much, as PHP scripting in this case is reduced to a minimum: the image scaling is done by a builtin library written in C. Whatever you do you'll have to use a library like this (GD or libmagick or so) so that's unavoidable. With my system at least you totally skip the overhead of checking the cache, thus further reducing PHP interaction. You can implement this on your existing server, so I guess it's a solution well suited for your budget.
Based on
We tried PHP scripts to do this but the performance was too slow for this many users.
I'm going to assume you weren't caching the results. I'd recommend caching the resulting images for a day or two (i.e. have your script check to see if the thumbnail has already been generated, if so use it, if it hasn't generate it on the fly).
This would improve performance dramatically as I'd imagine the main/start page probably has a lot more hits than random video X, thus when viewing the main page no images have to be created as they're cached. When User Y views Movie X, they won't notice the delay as much since it just has to generate that one page.
For the "On-the-fly resize" aspect - how important is bandwidth to you? I'd want to assume you're going through so much with movies that a few extra kb in images per request wouldn't do too much harm. If that's the case, you could just use larger images and set the width and height and let the browser do the scaling for you.
The ImageCache and Image Exact Sizes solutions from the Drupal community might do this, and like most solutions OSS use the libraries from ImageMagik
There are some AMI images for Amazons EC2 service to do image scaling. It used Amazon S3 for image storage, original and scales, and could feed them through to Amazons CDN service (Cloud Front). Check on EC2 site for what's available
Another option is Google. Google docs now supports all file types, so you can load the images up to a Google docs folder, and share the folder for public access. The URL's are kind of long e.g.
http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I
Add the =s paramter to scale the image, cool! e.g. for 200 pixels wide
http://lh6.ggpht.com/VMLEHAa3kSHEoRr7AchhQ6HEzHVTn1b7Mf-whpxmPlpdrRfPW216UhYdQy3pzIe4f8Q7PKXN79AD4eRqu1obC7I=s200
Google only charge USD5/year for 20GB. There is a full API for uploading docs etc
Other answers on SO
How best to resize images off-server
Ok first problem is that resizing an image with any language takes a little processing time. So how do you support thousands of clients? We'll you cache it so you only have to generate the image once. The next time someone asks for that image, check to see if it has already been generated, if it has just return that. If you have multiple app servers then you'll want to cache to a central file-system to increase your cache-hit ratio and reduce the amount of space you will need.
In order to cache properly you need to use a predictable naming convention that takes into account all the different ways that you want your image displayed, i.e. use something like myimage_blurred_320x200.jpg to save a jpeg that has been blurred and resized to 300 width and 200 height, etc.
Another approach is to sit your image server behind a proxy server that way all the caching logic is done automatically for you and your images are served by a fast, native web server.
Your not going to be able to serve millions of resized images any other way. That's how Google and Bing maps do it, they pre-generate all the images they need for the world at different pre-set extents so they can provide adequate performance and be able to return pre-generated static images.
If php is too slow you should consider using the 2D graphic libraries from Java or .NET as they are very rich and can support all your requirements. To get a flavour of the Graphics API here is a method in .NET that will resize any image to the new width or height specified. If you omit a height or width, it will resize maintaining the right aspect ratio. Note Image can be a created from a JPG, GIF, PNG or BMP:
// Creates a re-sized image from the SourceFile provided that retails the same aspect ratio of the SourceImage.
// - If either the width or height dimensions is not provided then the resized image will use the
// proportion of the provided dimension to calculate the missing one.
// - If both the width and height are provided then the resized image will have the dimensions provided
// with the sides of the excess portions clipped from the center of the image.
public static Image ResizeImage(Image sourceImage, int? newWidth, int? newHeight)
{
bool doNotScale = newWidth == null || newHeight == null; ;
if (newWidth == null)
{
newWidth = (int)(sourceImage.Width * ((float)newHeight / sourceImage.Height));
}
else if (newHeight == null)
{
newHeight = (int)(sourceImage.Height * ((float)newWidth) / sourceImage.Width);
}
var targetImage = new Bitmap(newWidth.Value, newHeight.Value);
Rectangle srcRect;
var desRect = new Rectangle(0, 0, newWidth.Value, newHeight.Value);
if (doNotScale)
{
srcRect = new Rectangle(0, 0, sourceImage.Width, sourceImage.Height);
}
else
{
if (sourceImage.Height > sourceImage.Width)
{
// clip the height
int delta = sourceImage.Height - sourceImage.Width;
srcRect = new Rectangle(0, delta / 2, sourceImage.Width, sourceImage.Width);
}
else
{
// clip the width
int delta = sourceImage.Width - sourceImage.Height;
srcRect = new Rectangle(delta / 2, 0, sourceImage.Height, sourceImage.Height);
}
}
using (var g = Graphics.FromImage(targetImage))
{
g.SmoothingMode = SmoothingMode.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.DrawImage(sourceImage, desRect, srcRect, GraphicsUnit.Pixel);
}
return targetImage;
}
In the time that this question has been asked, a few companies have sprung up to deal with this exact issue. It is not an issue that's isolated to you or your company. Many companies reach the point where they need to look for a more permanent solution for their image processing needs.
Services like imgix serve as a proxy and CDN for image operations like resizing and applying overlays. By manipulating the URL, you can apply different transformations to each image. imgix serves billions of requests per day.
You can also stand up services on your own and put them behind a CDN. Open source projects like imageproxy are good for this. This puts the burden of maintenance on your operations team.
(Disclaimer: I work for imgix.)
What you are looking for is best matched by Thumbor http://thumbor.readthedocs.org/en/latest/index.html , which is open source, backed by a huge company (means it will not disappear tomorrow), and ships with a lot of nice features like detecting what is important on an image when cropping.
For low-cost plus CDN I'd suggest to combine it with Cloudfront and AWS storage, or a comparable solution with a free CDN like Cloudflare. These might not be the best performing CDN providers, but at least still perform better than one server and also offload your image server on the cheap. Plus, it will save you a TON of bandwidth cost.
If each different image is uniquely identifiable by a single URL then I'd simply use a CDN such as AKAMAI. Let your PHP script do the job and let AKAMAI handle the load.
Since this kind of business doesn't usually have budget problems, that'd be the only place I'd look at.
Edit: that works only if you do find a CDN that will serve this kind of content for you.
This exact same problem is now being solved by image resize services dedicated to this task. They provide following features:
In built CDN - you need not worry about image distribution
Image resize on the fly - any size needed is available
No storage needed - you just store base image and all variants are handled by service
Ecosystem libraries - you can just include javascript and your job is done for all devices and all browsers.
One such service is Gumlet. You can also try some open source alternative like nginx plugin which can also resize image on the fly.
(I work for Gumlet.)

DynamicPDF image quality loss

We are using a product called DynamicPDF to generate PDF's on the fly from dynamic data from a database. Their documentation says that their software leaves the image bytes intact and doesn't make any changes. Despite this, we have observed that the images we add seem to have quality loss on the resulting PDF output (at least that's how they look). So my question is what do I need to do with the DynamicPDF API to ensure that the image quality output is equal or close to what I put in?
We are using Version 5.1.2 Build 13650, below is the code that we use to add the image.
private void plcImageMain_LaidOut(object sender, PlaceHolderLaidOutEventArgs e)
{
if (e.LayoutWriter.RecordSets.Current.HasData)
{
string productId = e.LayoutWriter.RecordSets.Current["ProductId"].ToString();
string imgUrl = base.SetImageUrlParams(e.LayoutWriter.RecordSets.Current["ImageUrl"] as string, e.ContentArea.Width, e.ContentArea.Height);
System.Drawing.Bitmap bm = base.GetBitmap(imgUrl);
ceTe.DynamicPDF.PageElements.Image img = new ceTe.DynamicPDF.PageElements.Image(bm, 0, 0);
img.Height = e.ContentArea.Height;
img.Width = e.ContentArea.Width;
e.ContentArea.Add(img);
}
}
/// <summary>
/// Gets a bitmap from the requested image url
/// </summary>
/// <param name="imgCtrl"></param>
/// <param name="imgUrl"></param>
protected System.Drawing.Bitmap GetBitmap(string imgUrl)
{
// TODO: Add some validation to ensure the url is an image.
System.Net.WebRequest httpRequest = System.Net.HttpWebRequest.Create(imgUrl);
using (System.Net.HttpWebResponse httpResponse = httpRequest.GetResponse() as System.Net.HttpWebResponse)
using (Stream imgStream = httpResponse.GetResponseStream())
{
System.Drawing.Bitmap bm = System.Drawing.Bitmap.FromStream(imgStream) as System.Drawing.Bitmap;
return bm;
}
}
[Edit]
Here is the before and after screenshot.
[Edit]
Code using GetImage (why so slow?)
protected ceTe.DynamicPDF.Imaging.ImageData GetImageData(string imgUrl)
{
ImageData imgData = null;
using (System.Net.WebClient wc = new System.Net.WebClient())
{
imgData = ImageData.GetImage(wc.DownloadData(imgUrl));
}
return imgData;
}
GetImageData ("http://s7d2.scene7.com/is/image/SwissArmy/cm_vm_53900E--111mm_sol_front_a?fmt=jpeg&wid=400&hei=640");
All right, this looks like poor effort at resizing but it could just as well be your Acrobat reader doing it on screen, with the actual data being perfectly fine.
You should be able to select an image by clicking it in Reader (so it's highlighted blue) and then copy and paste it to an image editing program of your choice. That way, you should get the resource in original solution no matter what it's scaled down to.
There are also tools to extract images and other resources from PDFs, but I don't know one I can recommend offhand.
In regards to the DynamicPDF product, there is not any resizing or resampling done to the image as it is added to the PDF document. Pekka is actually right on with this. It is the reader that is visually representing the image with differing clarity (at different zoom levels).
If you are able to pull the image out of the PDF (as Pekka recommends above) you will see the image data is completely original and not modified.
One additional thing you can do to demonstrate this would be to take your original image, right click on it and select "Convert To Adobe PDF" (requires full Acrobat Pro). In that newly created PDF you would also visually see the same results.
One final thing worth noting is just a smalll inefficiency in the code you displayed above. Right now you are pulling the image content as a Stream, creating a bitmap out of that Stream object and then using that bitmap to create the DynamicPDF Image object. The recommended way to accomplish this would be to take the Stream object of the image that you are pulling from the URL, pass this into the DynamicPDF's ImageData Static method "GetImage". This GetImage method will return the ImageData object. Then use that ImageData to create your DynamicPDF Image object out of.
There are two clear advantages to loading the image this way. First is that you do not have the overhead involved with the System.Drawing.Bitmap object needing to separately process the image content (so in theory the app would run faster without this). And the second advantage is that the image content is added to the PDF in whatever native compression that it was originally in. As in the case of JPEG images, using the image’s native compression as opposed to the bitmap’s compression will result in a smaller output PDF file size. None of this will have any influence on the image quality of the output PDF but it could affect the efficiency and output PDF file size.
You were both right that it was Acrobat that was causing the fuzzy display. There is a setting in preferences called resolution, instead of using the System dpi setting by default Acrobat decided to use a custom dpi setting of 110 (I have no idea why!?!?). After setting it to system (in my case 96dpi) the images were crystal clear.

Resources