I have code that takes a bunch of PDFs and combines them into one file...
PdfWriter writer = PdfWriter.getInstance(document, baos)
....
PdfFileSpecification fs = PdfFileSpecification.fileEmbedded(writer, null, af.fileName, af.documentData.data)
fs.addCollectionItem(item)
writer.addFileAttachment(fs)
This works great and I see all of the pages as I would expect. Now I am looking to split it up and copy to multiple sub files based on size. However, when I consume the generated byte[] iText only seems to see the first page...
PdfCopy copy = null
PdfReader reader = new PdfReader(before) // Add the byte array here
int pages = reader.getNumberOfPages() //It is only 1 I would expect more pages.
It seems to only be recognizing the cover page which was added before the other documents. Is there a way I can get the count of ALL pages? If I take before and just send it as an attachment it shows up with all 1xx pages.
You are combining the files by attaching them to a new PDF which only has a cover page, and making this file a Portfolio as Adobe calls it:
PdfFileSpecification fs = PdfFileSpecification.fileEmbedded(writer, null, af.fileName, af.documentData.data);
fs.addCollectionItem(item);
writer.addFileAttachment(fs);
Thus, your code is not one of the typical merge functions (using PdfCopy or PdfWriter with getImportedPage; correct ones use PdfCopy, fishy ones use PdfWriter) which generate a single PDF containing all source pages as genuine document pages. Instead your code creates a single page document with other PDFs as attachments and some extra information making PDF viewers display the attached PDFs inline.
If you want to access the pages of these attached PDFs, you have to extract them again and open them in separate PdfReader instances.
You can find more information on extracting PDF attachments in this answer.
Related
i'm trying to manipulate pdf and i tried some of opensource libraries (e.g pdfSharp, pdfjet) and i cannot achive what i must do. Because pdfsharp add it with a new page to pdf, or pdfjet put an advert into pdf. So, i cannot use those libraries.
What i must achive is:
i must put a string at the end of the last page of pdf. If last page have enough space to put the string, then there is no need to add a new page, otherwise, string can be splitted or add a new page to pdf.
Here is a code sample i tried;
HtmlToPdf renderer = new IronPdf.HtmlToPdf();
IronPdf.PdfDocument doc = IronPdf.PdfDocument.FromFile(existingPdfPath);
doc.AppendPdf(renderer.RenderHtmlAsPdf(stringAddToPdf));
doc.SaveAs(newPdfPath);
Thanks for helps,
From what I can understand, AppendPdf is actually designed to always insert a new page..
https://ironpdf.com/c%23-pdf-documentation/html/M_IronPdf_PdfDocument_AppendPdf.htm
My suggestion would be to add the string as a footer as this should be possible either with AddHeaders/AddFooters/AddHTMLHeaders/AddHTMLFooters
https://ironpdf.com/c%23-pdf-documentation/html/Methods_T_IronPdf_PdfDocument.htm
I am using ActivePDF tool to convert few different file formats to PDF. Before this conversion, I need to find out how many pages of PDF I will end up with. So, say my word document is converted to 4 page pdf, I need to get that count of pages before the actual conversion.
How can I best achieve this?
I have a developed PHP code to dynamically load files contained in a directory into a gallery / slideshow. I have many (40 - 50) of these gallery web pages which display images grouped by content. With hundreds of images, the dynamic gallery code allows me to add images to a directory without having to write code to each web page each time.
However I've realized that these files will be invisible to search engines since there isn't any HTML code to index on (e.g. the 'alt' tag). Does anyone have any suggestions on how to get these images indexed? Two ideas I've had:
1) Write a program to automatically generate a single web page for every jpeg file which will display the image when found with the search engine and contain a link to the gallery page where the user can see more content. The benefit to this method is not having to modify my live web pages. The downside is hundreds of additional files only to be found by a search engine.
2) Write a program to generate hidden links that can be pasted into my gallery html page - using the alt tag. The benefit to this method is that users would find my main gallery page with a search. The downside is having to cut and paste code to my live gallery web pages - defeating somewhat the purpose of a dynamic gallery.
I'm new at this, so any suggestions would be appreciated.
If I understand you correctly:
I would have one page that just lists thumbnails of pages, and then one page for each of the images, that shows a bigger version of each image, and all the meta-data you have. The best would be if you added a short unique snippet of text to each image, describing what in it.
I'm having trouble figuring out how to add an external image (referenced by a URL) to a PDF using iText. Is this kind of thing possible?
The PDF spec in 7.1.5 says you should be able to reference a PDF via a URL by using a URL specification. This is what I've got so far:
PdfFileSpecification pdfSpec =
PdfFileSpecification.url(writer, "http://www.someurl.com/test.jpg");
StringBufferInputStream sbis = new StringBufferInputStream("");
PdfStream dict = new PdfStream(sbis, writer);
dict.put(PdfName.FILTER, PdfName.DCTDECODE)
dict.put(PdfName.TYPE, PdfName.XOBJECT);
dict.put(PdfName.SUBTYPE, PdfName.IMAGE);
dict.put(PdfName.WIDTH, new PdfNumber(100));
dict.put(PdfName.HEIGHT, new PdfNumber(100));
dict.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
dict.put(PdfName.LENGTH, new PdfNumber(0));
dict.put(PdfName.F, pdfSpec);
PdfIndirectObject img = writer.addToBody(dict);
I know I still need to make sure the color space is added and stuff, but my main concern right now is actually getting this image into the body of the document. I can't figure out how to do this... it seems I can't get a reference to a PdfPage or the resources dictionary or anything. Is this possible using iText?
As a side note, this exercise is useless if I'm going to be presented with a security warning when the view tries to go load the image. Does anyone know if that is the case?
External content is described in the PDF spec, but almost no PDF processor does actually support them. By now Acrobat 9 has support for it, but I would be very cautious with that feature: Your clients or users may not be able to see the referenced content.
I am checking if a PDF document is searchable if I can get any text from every single page in a PDF.
But checking every page seems to take forever when I am trying to extract text from a PDF that contains more than 500~2000 pages.
Is it possible for a PDF to contain text for one page but not in the rest?
What I am trying to do here is that, if a first page of PDF contains text, then it is a searchable PDF else not..
Yes, it is very possible for a PDF to contain text on one page but not the rest. You could very well have a 500 page PDF that contains images on the first 499 pages, but contain text on the last page.
Unless you want to open the PDF file yourself and scan it for text/text operations, you will need to use an existing third-party PDF library that allows you to extract text from a PDF.
Also, see Ferruccio's response to a related question, which is to use the IFilter interface, specifically made for search indexing and text extraction.
Try this version of Searcharoo, which lets you search Word and PDF documents.