ABCPDF Alter image compression when carrying out doc.AddImageUrl - abcpdf

Using ABCPDF, currently generating pdf's with 1mb files.
When we alter the pdf in acrobat pro, and simply change the image quality this drops the size to 100K.
I have looked at the documentation for ABCPDF however I cannot find a simple example of dropping the image quality prior to saving the document, hence getting a smaller pdf.

Appears simple :
doc.HtmlOptions.ImageQuality = 33;

Without reviewing the code you used to create the PDF, I would strongly suggest taking a second look at it ensure you are using the Flatten method for your document.
http://www.websupergoo.com/helppdf7net/default.html
The AbcPdf library adds objects to a page with each typically represented as an individual layer. When you execute the Flatten method all layers on the document will be removed and replaced by compressed respresentation.
Example c# code snippet:
for (int i = 1; i <= yourPdfDocument.PageCount; i++) {
yourPdfDocument.PageNumber = i;
yourPdfDocument.Flatten();
}

Related

Snapshot testing PDFs [duplicate]

I am generating and storing PDFs in a database.
The pdf data is stored in a text field using Convert.ToBase64String(pdf.ByteArray)
If I generate the same exact PDF that already exists in the database, and compare the 2 base64strings, they are not the same. A big portion is the same, but it appears about 5-10% of the text is different each time.
What would make 2 pdfs different if both were generated using the same method?
This is a problem because I can't tell if the PDF was modified since it was last saved to the db.
Edit: The 2 pdfs visually appear exactly the same when viewing the actual pdf, but the base64string of the bytes are different
Two PDFs that look 100% the same visually can be completely different under the covers. PDF producing programs are free to write the word "hello" as a single word or as five individual letters written in any order. They are also free to draw the lines of a table first followed by the cell contents, or the cell contents first, or any combination of these such as one cell at a time.
If you are actually programmatically creating the PDFs and you create two PDFs using completely identical code you still won't get files that are 100% identical. There's a couple of reasons for this, the most obvious is that PDFs support creation and modification dates. These will obviously change depending on when they are created. You can override these (and confuse everyone else so I don't recommend this) using something like this:
var info = writer.Info;
info.Put(PdfName.CREATIONDATE, new PdfDate(new DateTime(2001,01,01)));
info.Put(PdfName.MODDATE, new PdfDate(new DateTime(2001,01,01)));
However, PDFs also support a unique identifier in the trailer's /ID entry. To the best of my knowledge iText has no support for overriding this parameter. You could duplicate your PDF, change this manually and then calculate your differences and you might get closer to a comparison.
Then there's fonts. When subsetting fonts, producers create a unique internal name based on the original name and an arbitrary selection of six uppercase ASCII letters. So for the font Calibri the font's name could be JLXWHD+Calibri one time and SDGDJT+Calibri another time. iText doesn't support overriding of this because you'd probably do more harm than good. These internal names are used to avoid font subset collisions.
So the short answer is that unless you are comparing two files that are physical duplicates of each other you can't perform a direct comparison on their binary contents. The long answer is that you can tweak some of the PDF entries to remove unique parts for comparison only but you'd probably be doing more work than it would take to just re-store the file in the database.

Use image in a SAS Stored Process's HTML Stream

I am creating a report with SAS STP and I want to display a image(a logo) on the report. Okay here is what is happening:
data _null_;
file _webout;
put '<html>';
put '</html>';
run;
I am PUTing HTML because I have complex table formats which I need to display and I am not using %STPBEGIN & %STPEND because that opens up an ODS Stream which frankly I do not know how to handle and I am having problems. Not using %STPBEGIN means the above code. This has been a very successful mechanism for me. I can show beautiful reports with CSS and everything. The only problem is images. A client has recently requested to put logo on every report. i though this was going to be easy but it has not been. Ok here is the deal, I tried to use <img src=" "/ > tag and I thought I would use some relative path and my image will show. This technique succeeded and failed.
I added an image to a folder location using SAS Management Console
and use its relative path '/Products/SAS Enterprise GRC/****' (didn't work)
I copied an image to default theme's images folder under Web/Staging/*** and tried to used the relative path (didn't work). So i tried to use other images from the the default theme. It worked.
I am stuck, how can I use a custom images here?
If your image is static, you can embed it into your results using a datastep without having to copy files to the server.
The trick to doing this is to encode the image into Base64 encoding, then you can embed the image into an <img src="" /> statement by using this magical notation:
<img src="data:image/png;base64,...." />
You can see that the src= attribute contains metadata to tell the browser that the value contains image data, that represents a png file (I used a png file when testing this post, you may have a JPG/BMP etc...) and that the value is encoded using base64. The 4 periods at the end would be replaced by your image data represented in base64 notation. This would look something like this:
<img src="
... much much more base64 content here ...
HSLyz+h9xy+7HbHRL83L1tv9h8+4d/+Ic/Gf8DiYav3mpqHAMAAAAASUVORK5CYII=" />
Converting your image to base64 is simple. You can simply google for an "online base64 image converter" such as this one. Drag and drop your image and it will produce your base64 code for you.
To get this into a datastep in sas, it's simply a case of:
data _null_;
file _webout;
put '<html>';
put '<img src="......etc..." />';
put '</html>';
run;
If you image is particularly big (say greater than ~32k) you may run into issues trying to output it from a datastep. I probably need to test this to clarify. You can work around this by reading the base64 image from a file in SAS and streaming it directly to _webout, using code similar to below:
data _null_;
file _webout;
infile '\path\to\base64\file.ext';
input;
put _infile_;
run;
If you want to get really tricky, you can take any image you like (such as a chart generated in SAS) and convert it to base64 on the fly, then stream it out. Here is some SAS code that will take an image file and convert it to Base64:
data _null_;
length base64_format $20 base64_string $32767;
infile "\your_sasdir\hi.png" recfm=n;
file "\your_sasdir\hi.base64";
input byte $16000. ;
* FORMAT LENGTH NEEDS TO BE 4n/3 ROUNDED UP TO NEAREST MULTIPLE OF 4;
format_length = 4*(lengthn(byte)/3);
mod = mod(format_length,4);
if mod ne 0 then do;
format_length = format_length - mod + 4;
end;
base64_format = cats("$base64x",format_length,".");
base64_string = putc(cats(byte), base64_format);
put base64_string;
run;
Here is the image I used to test this with:
Once converted, the Base64 representation should look like:
iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAIAAAAC64paAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAABaSURBVDhP5YxbCsAgDAS9/6XTvJTWNUSIX3ZAYXcdGxW4QW6Khw42Axne81LG0shlRvVVLyeTI2aZ2fcPyXwPdBI8B999NK/gKTaGyxaMX8gTJRkpyREFmegBTt8lFJjOey0AAAAASUVORK5CYII=
I'm going to see if I can find a way to streamline this as this is something we do frequently at work.
EDIT : Interestingly, SAS9.4 seems to support doing this directly using ODS HTML5 in conjunction with the inline option. See the doc here.
See also this post, by Don Henderson, that provides a similar way to approach this problem. Thanks to Vasilij for the link.
When you define pictures in SAS metadata, it can be accessed via SAS Content server.
To get picture URL log into: 'https://severhost/SASContentServer/repository/default/sasfolders' and search for your picture.
If you defined your picture in catalog /Products/SAS Enterprise GRC/PictureName.gif, it should be accessible from adres 'https://severhost/SASContentServer/repository/default/sasfolders/Products/SAS Enterprise GRC/PictureName.gif(Report)'
Of course you have to remember, that customer user need to have access permission in SAS Metadata to read picture object.
If this won't solve your problem, please type which version of SAS software you are using.
I had a similar problem to you once. I have added the image to our intranet which happens to be SharePoint at the time. I defined that image to have public access level and then references in all my reports.
The idea that since the report is only for internal audience, they all will have access to intranet, but not necessarily to the Content Server so it circumvents the problem that Bagin mentioned.
If you don't have a suitable intranet, you could always reference a logo from your public website which is probably available to all of your audience even if they are external, but then you don't have control over that logo file and one day it might change in some undesirable way.
Regards,
Vasilij
Using SASjs you can compile ANY binary content into a SAS web service (Stored Process or Viya Job).
Here's an example using an MP3 file: https://github.com/allanbowe/sasrap

Finding Duplicate or Similar Images on a specific directory on a database

I am new on this, and my objection is to build some web application that implement the user to store an image on a database as a storage, and all I want is to reduce if there is a couple or some image that stored twice or more.
So, all I need is how to find duplicate or similar images that already stored on a database, or even better when the user try to import it on the first step, and if their image are similar with an images that already been stored on a database, the system can gave a warn not to store that image.
I just want to develop how to find some similar or duplicate image on a specific directory on a database. Can you give me some explanation from the first about how to build it, and what should I learn to accomplished this from the basic step, like a tutorial or something. I'd like to learn a lot, if it's possible.
Thanks in advance, I really need this help, thanks.
The solution for finding similar images is much more complex so I will stick to the finding duplicate images first. The easiest thing to do is to take a SHA1 hash of image bits. Here is some code in C# to accomplish this (see below). As for storing the hash in a database, I would recommend that you use a binary(20) datatype to store the results of the hash. This allows your SQL server to index and query much faster than storing this hash as a string or some other format.
private static byte[] GetHashCodeForFile(string file)
{
int maxNumberOfBytesToUse = 3840000;
using (Stream sr = File.OpenRead(file))
{
byte[] buffer = (sr.Length > maxNumberOfBytesToUse) ? new byte[maxNumberOfBytesToUse]: new byte[sr.Length];
int bytesToReadIn = (sr.Length < maxNumberOfBytesToUse) ? (int)sr.Length : maxNumberOfBytesToUse;
sr.Read(buffer, 0, bytesToReadIn);
System.Security.Cryptography.HashAlgorithm hasher = System.Security.Cryptography.SHA1.Create();
byte[] hashCode = hasher.ComputeHash(buffer);
return hashCode;
}
}
Searching for similar images is a difficult problem currently undergoing much research. And it kind of depends on how you define similar. Some prominent methods for finding similar images are:
Check the metadata (EXIF or similar) tags in the image file for creation date, similar images can be taken at times that are similar to each other. This may not be the best thing for what you want.
Calculate the relative historgram of both images and compare them for deltas in each color channel. This has the benefit of allowing an SQL query to be written and is invariant to image size. An image that has been converted to a thumbnail will be found with this method.
Performing an image subtraction between two images and seeing how close the image gets to pure black (all zeros). I don't know of a method to do this with a TSQL query and this code can get tricky with images that need to be resized.
Calculating the contours of the image (through Sobel, canny or other edge detectors) then subtract the two images to see how many of their contours overlap. Again I don't think this can be handled in SQL.

Loading files during run time in XNA 4.0

I made a content pipeline extension (using this tutorial) in XNA 4.0 game.
I altered some aspects, so it serves my need better, but the basic idea still applies. Now I want to go a step further and enable my game to be changed during run time. The file I am loading trough my content pipeline extension is very simple, it only contains decimal numbers, so I want to enable the user to change that file at will and reload it while the game is running (without recompiling as I had to do so far). This file is a very simplified version of level editor, meaning that it contains rows like:
1 1,5 1,78 -3,6
Here, the first number determines the object that will be drawn to the scene, and the other 3 numbers are coordinates where that object will be placed.
So, how can I change the file that contains these numbers so that the game loads it and redraws the scene accordingly?
Thanks
Considering you've created a custom content pipeline extension I presume you know how to load in data using streamreader? Where you could just empty your level data and load new data in by reading through the text file line by line?
The reason I mention this is because as far as I am aware it's not possible to load in data through the content pipeline during runtime especially because the xna redistribute does not contain the content pipeline.
Another option could be to change to using xml for the level file and use XElement which I quite recently found and this is my current method.
Here is a commented example of using StreamReader to load in simple level data from a .txt file. http://pastebin.com/fFXnziKv
In XNA 4, if you are using StorageContainer, you can do something like:
(...)
StorageContainer storageContainer = //get your container
Stream stream = storageContainer.OpenFile("Level.txt", FileMode.OpenOrCreate);
StreamReader sr = new StreamReader(stream);
while (!sr.EndOfStream)
{
String line = sr.ReadLine();
//use line to do something meaningful
}
stream.Close();
storageContainer.Dispose();
(...)
From personal experience, if you go for raw TextReader, the only problem is to get the path of your Content folder, which can be relatively easy to retrieve (in Windows only!)

Reading HGT Files (SRTM)

Presently I am having problems obtaining elevation point data from the SRTM3 format (.hgt) from NASA. I wish to use the data for creating a program that creates a 2d panoramic illustration of the given area based on the elevation points extracted.
I've exhausted a lot of resources from the Net but still to no avail.
What I want to ask is a form of pseudocode for me to be able to read .hgt files and obtain data from them so I can feed something to my program.
Thanks a lot!
You Could use UniboGeoTools a very small java library that provides Elevation Info in two way: SRTM and Google Elevation Api.
Take a look at the test to understand how it works ..
a pseudocode is:
file a = '~/S41W072.hgt'
size = 1201*1201
for(int i=0;i<size;i++){
int bb= a.readByte();
printScreen(bb,i%1201,(1201-(int)(-1+i/1201)));
}
I have java code somewhere, if I find it I'll upload

Resources