The following code is very good at putting a single page into a pdf.
It does not work for subsequent pages.
If the stream is an existing pdf file the image is replaced. How do I get NewPage() to actually create a new page and add the image at the end.
using (Stream ms = GetStream()) {
Document doc = new Document(PageSize.A4);
var writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
if (!doc.NewPage())
throw new InvalidOperationException("NewPage failed.");
PDFImage jpg = PDFImage.GetInstance(image, ImageFormat.Jpeg);
jpg.Alignment = Element.ALIGN_CENTER;
jpg.ScaleToFit(PageSize.A4.Width, PageSize.A4.Height);
doc.Add(jpg);
doc.Close();
}
Calling doc.NewPage() doesn't do anything when there's nothing on the current page. There are at least 3 options:
1) Add something invisible to the current page. An empty paragraph, some white space to the PdfContentByte, whatever.
2) Tell your PDF document "no, its really not empty, take my word": PdfDocument.PageEmpty =false;
3) Don't throw when NewPage returns false. That's perfectly acceptable under the circumstances.
I'd go with #3 personally.
Related
Is there an equivalent of deleteProperty(XMPConst.NS_DC, "description”) or some way to clear out EXIF:ImageDescription, XMP-dc:Description and IPTC:Caption-Abstract with a Photoshop Script (ie, JavaScript or AppleScript)?
I am trying to remove the tags/descriptions below from TIF, PSD and PSB images:
[EXIF:IFD0] ImageDescription
[XMP:XMP-dc] Description
[IPTC] Caption-Abstract
I can do this with Exiftool with this code:
exiftool -m -overwrite_original_in_place -EXIF:ImageDescription= -XMP-dc:Description= -IPTC:Caption-Abstract= FILE
While that works great for me, I have lots of vendors that would need this in their workflows so it would be easier for them to use an action with the Photoshop Events Manager "On Document Open", or via an Automator script (Java or AppleScript) in their workflows than installing ExifTool. Looking for some help to do this...
I don’t have much coding experience, but I found the JavaScript code below on PS-Scripts as a starting point. This code doesn't require Photoshop which I like and could be done with Automator, but it only references the one tag. Also, I don’t need to write anything to the tags as this code does (I’d prefer just to delete or wipe the content and/or tags so they don’t show up).
Code: Select allvar f = File("/c/captures/a.jpg");
setDescription(f,"My new description");
function setDescription( file, descStr ){
if ( !ExternalObject.AdobeXMPScript ) ExternalObject.AdobeXMPScript = new ExternalObject('lib:AdobeXMPScript');
var xmpf = new XMPFile( File(file).fsName, XMPConst.UNKNOWN, XMPConst.OPEN_FOR_UPDATE );
var xmp = xmpf.getXMP();
xmp.deleteProperty(XMPConst.NS_DC, "description");
xmp.setLocalizedText( XMPConst.NS_DC, "description", null, "x-default", descStr );
if (xmpf.canPutXMP( xmp )) {
xmpf.putXMP( xmp );
}
xmpf.closeFile( XMPConst.CLOSE_UPDATE_SAFELY );
}
And below is an attempt at the JavaScript that would be used as a Photoshop Event on "Open Document"; but again I don't know how to amend to ensure all 3 tags reference above are cleared:
function removeDescription() {
whatApp = String(app.name);
if(whatApp.search("Photoshop") > 0)
if(!documents.length) {
alert("There are no open documents. Please open a file to run this script.")
return;
}
if (ExternalObject.AdobeXMPScript == undefined) ExternalObject.AdobeXMPScript = new ExternalObject("lib:AdobeXMPScript");
var xmp = new XMPMeta( activeDocument.xmpMetadata.rawData);
xmp.deleteProperty(XMPConst.NS_DC, "description");
app.activeDocument.xmpMetadata.rawData = xmp.serialize();
}
}
removeDescription();
Finally, below was an alternate that was tried that wipes the Description, ImageDescription and Caption-Abstract on TIFFs and PNGs on the first try, but takes running through twice to work on a PSD/PSB/JPG. I think it has to do with the interaction between Description, ImageDescription and Caption-Abstract, and the solution possibly resides with amp.setLocalizedText to nothing?
function removeMetadata() {
whatApp = String(app.name);
if(whatApp.search("Photoshop") > 0) {
if(!documents.length) {
alert("There are no open documents. Please open a file to run this script.")
return;
}
if (ExternalObject.AdobeXMPScript == undefined) ExternalObject.AdobeXMPScript = new ExternalObject("lib:AdobeXMPScript");
var xmp = new XMPMeta( activeDocument.xmpMetadata.rawData);
if (xmp.doesArrayItemExist(XMPConst.NS_DC, "description", 1))
{
xmp.deleteArrayItem(XMPConst.NS_DC, "description", 1);
}
app.activeDocument.xmpMetadata.rawData = xmp.serialize();
debugger
}
}
removeMetadata();
Here is an example Python script that uses the Pillow library to remove the metadata descriptions.
from PIL import Image
# Open the image file
image = Image.open('example.jpg')
# Remove the EXIF:ImageDescription metadata field
image.info.pop('EXIF:ImageDescription', None)
# Remove the XMP-dc:Description metadata field
image.info.pop('XMP-dc:Description', None)
# Remove the IPTC:Caption-Abstract metadata field
image.info.pop('IPTC:Caption-Abstract', None)
# Save the modified image file
image.save('example_modified.jpg')
Change "example.jpg" to your needs.
there may be other metadata fields that contain descriptions, depending on the specific image file format and how it was created. You may need to modify the script to remove additional fields if necessary.
I have several PDF documents that supposedly contain scanned images, but upon inspection in Acrobat Pro, each page contains a huge number of tiny "inline images". From what I understand these are not regular images inside XObjects, but rather images embedded directly inside content streams.
How could I go about extracting and merging these images?
The only code I could find online starts out like this:
var reader = new PdfReader(#"path\to\file.pdf");
PdfDocument document = new PdfDocument(reader);
for (var i = 1; i <= document.GetNumberOfPages(); i++)
{
PdfDictionary obj = (PdfDictionary)document.GetPdfObject(i);
// ... more code goes here
}
...but the rest of the code doesn't work because the PdfDictionary returned from GetPdfObject is not a stream, only a dictionary. I don't know how to access the images inside it.
In order to find an added image file and replace it with another image file when I read a PDF next time, I want to use Itext to add an image file into an existing PDF, and declare a unique name for it.
My code:
final PdfName key = new PdfName("MY_SIGN_KEY");
final PdfName val = new PdfName("MY_SIGN_VAL");
Image signImage=Image.getInstance(signPngFile.getAbsolutePath());
signImage.setAlignment(1);
signImage.scaleAbsolute(newWidth, newHeight);
signImage.setAbsolutePosition(200,200);
PdfContentByte over = stamper.getOverContent(1);
PdfImage stream = new PdfImage(signImage, "", null);
stream.put(key,val);// a unique name for it.(设置唯一标识符)
//PdfIndirectObject ref=over.getPdfWriter().addToBody(stream);
//signImage.setDirectReference(ref.getIndirectReference());
over.addImage(signImage);
I have tried your code and it works for me. See the AddImageWithID example:
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Image image = Image.getInstance(IMG);
PdfImage stream = new PdfImage(image, "", null);
stream.put(new PdfName("ITXT_SpecialId"), new PdfName("123456789"));
PdfIndirectObject ref = stamper.getWriter().addToBody(stream);
image.setDirectReference(ref.getIndirectReference());
image.setAbsolutePosition(36, 400);
PdfContentByte over = stamper.getOverContent(1);
over.addImage(image);
stamper.close();
reader.close();
}
In this example, I take a file named hello.pdf and I add an image named bruno.jpg with the file hello_with_image_id.pdf as result.
The image doesn't look black:
The ID is added:
Can you try the code I shared and see if the problem persists.
I can think of one reason why you'd get a black image: in our code, we assume that a single image is added. In the case of JPEG, this is always the case. In the case of PNG or GIF though, adding one source image could result in two images being added. Strictly speaking, PDF doesn't support transparent images (depending on how you interpret the concept of transparent images). Whenever you add a single source image with transparent parts, two images will be added to the PDF: one opaque image and one image mask. The combination of the opaque image and the image mask results in something that is perceived as a transparent image. Maybe this is what happens in your case.
From your help I have managed to get a very nice PDF generation tool built. It builds a PDF based off of a 5 page template. On the 3rd and 5th page there is a possibility of needing additional pages added and moving the next pages down. The 5th page is landscape even. Everything works perfect except one little additional functionality that I am looking for.
The template that I have built has form fields on the fifth page. Therefore, I use the following code to fill the field:
var pdfReader = new PdfReader(existingFileStream);
var stamper = new PdfStamper(pdfReader, newFileStream);
var form = stamper.AcroFields;
form.SetField("fkClientName", clientName);
The field gets filled just fine, but not on the additional pages. Which is weird because I do call this line:
PdfImportedPage templatePage = stamper.GetImportedPage(pdfReader, 5);
I feel like it should see that there is form fields on that fifth page. However, I read that stamper.GetImportedPage does not retrieve form fields. I don't really care if it's a form field or text. I just need the client name at the top of each generated additional page. Here is what my columntext code looks like that builds the additional pages:
while (true)
{
ct.SetSimpleColumn(-75, 75, PageSize.A4.Height + 25, PageSize.A4.Width - 200);
if (!ColumnText.HasMoreText(ct.Go()))
break;
pageNum++;
stamper.InsertPage(pageNum, new Rectangle(792f, 612f));
stamper.GetOverContent(pageNum).AddTemplate(templatePage, 0, -1f, 1f, 0, 0, PageSize.A4.Width);
ct.Canvas = stamper.GetOverContent(pageNum);
}
If you had company stationery with some kind of background and you wanted to create a document that has flowing text (a column that can flow over to the next page) that also has a repeating header, then I would prefer using PdfWriter.
I'd use PdfWriter to add the content (without using ColumnText, just use the page size and the margins to define the column) and I would add the background and the header using page events. See for instance the Stationery example from my book.
I'd create a subclass for PdfPageEventHelper and I'd load the page you want to see repeated into a PdfImportedPage instance named page:
PdfReader reader = new PdfReader(STATIONERY);
page = writer.getImportedPage(reader, 1);
You may also want to initialize a Phrase with the name of your customer:
header = new Phrase(customerName);
Then you override the onEndPage() method like this:
public void onEndPage(PdfWriter writer, Document document) {
writer.getDirectContentUnder().addTemplate(page, 0, 0);
ColumnText.showTextAligned(writer.getDirectContent(),
Element.ALIGN_RIGHT, header, 36, 806, 0);
}
Now you don't have to worry about ColumnText and new pages. Every time a new page is created, the background and the header will be added automatically.
However, you are using PdfStamper because your original document isn't company stationery: it's a 5 page document. If this document doesn't contain any interactive elements (you've created it using iTextSharp, so you know if it's a flat document or not), I'd still try the PdfWriter approach and change the page instance in the event whenever a new page is needed.
If you want to keep on using PdfStamper, you'll have to add the header in a different way. For instance using a different ColumnText instance, or, if it's a single line, using ColumnText.showTextAligned(). If you don't know the coordinates for the header, you can retrieve the position of the field using the getFieldPositions() method.
I am new to iText and faced with a real interesting case about adding external images to a paragraph. Here is the thing:
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("out2.pdf"));
document.open();
Paragraph p = new Paragraph();
Image img = Image.getInstance("blablabla.jpg");
img.setAlignment(Image.LEFT| Image.TEXTWRAP);
// Notice the image added to the Paragraph through a Chunk
p.add(new Chunk(img2, 0, 0, true));
document.add(p);
Paragraph p2 = new Paragraph("Hello Worlddd!");
document.add(p2);
gives me the picture and "Hello Worlddd!" string below. However,
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("out2.pdf"));
document.open();
Paragraph p = new Paragraph();
Image img = Image.getInstance("blablabla.jpg");
img.setAlignment(Image.LEFT| Image.TEXTWRAP);
// Notice the image added directly to the Paragraph
p.add(img);
document.add(p);
Paragraph p2 = new Paragraph("Hello Worlddd!");
document.add(p2);
gives me the picture and string "Hello worlddd!" located on the right hand side of the picture and one line above it.
What is the logic behind that difference?
The behaviour you described is because in the second code snippet the Paragraph doesn't adjust its leading, but adjust its width. If in the second snippet you add the line
p.add("Hello world 1")
just before
p.add(img)
you'll see the string "Hello world 1" on the left and a little bit above the string "Hello Worlddd!". If you output the leading of p (System.out.println(p.getLeading()) you can see it's a low number (typically 16) and not the height of the image.
In the first example you use the chunk constructor with 4 arguments
new Chunk(img, 0, 0, true)
with the last (true) saying to adjust the leading, so it print as you expected.
If you add an image directly, its alignment properties (set with
setAlignment()) are taken into account. So the image is on the left (Image.LEFT) and the text is wrapped around (Image.TEXTWRAP).
If you wrap the image in a Chunk it is handled as if it were a chunk of
text. So the alignment properties, specific to images, are lost. This results in the text being below the image.
If you try Image.RIGHT, this becomes more apparent. Nothing changes in the first example: the image is still on the left. In the second example, the image is aligned to the right and the text is wrapped left of it.