How can I list and merge inline-images in a PDF file using IText7-dotnet? - itext7

I have several PDF documents that supposedly contain scanned images, but upon inspection in Acrobat Pro, each page contains a huge number of tiny "inline images". From what I understand these are not regular images inside XObjects, but rather images embedded directly inside content streams.
How could I go about extracting and merging these images?
The only code I could find online starts out like this:
var reader = new PdfReader(#"path\to\file.pdf");
PdfDocument document = new PdfDocument(reader);
for (var i = 1; i <= document.GetNumberOfPages(); i++)
{
PdfDictionary obj = (PdfDictionary)document.GetPdfObject(i);
// ... more code goes here
}
...but the rest of the code doesn't work because the PdfDictionary returned from GetPdfObject is not a stream, only a dictionary. I don't know how to access the images inside it.

Related

How do I embed an image uploaded by form to google doc/auto generating PDF?

I have a google form (Medical report). Whenever someone answers the form, the answers go to a sheet and creates copy of a doc template, replaces keywords and then generates a pdf. I've create separate folders to organize the PDFs for the responses. This is working well thus far.
However, the report (Google Form) requires the user to upload an image as part of some of the answers (Sample Form Layout). I would like to have two separate images automatically appear on the doc for Before Treatment and After Treatment (To replace the sample text from google doc with pictures).
The image uploads to drive fine, but I don't know how to have the uploaded image appear on the Google docs and pdf. So far I'm only able to obtain a drive link instead of an actual image (How the PDF generated file looks like)
Here's a sample of my script:
Function afterFormSubmit(e) {
const info = e.namedValues;
//Running the createPDF fuction into the form submission
createPDF(info);
}
//Function to create PDF
function createPDF(info){
const pdfFolder = DriveApp.getFolderById("pdfFolderId");
const tempFolder = DriveApp.getFolderById("tempFolderId");
const templateDoc = DriveApp.getFileById("docTemplateId");
//Get ID of file and open as a document
const OpenDoc = DocumentApp.openById(newTempFile.getId());
const body = OpenDoc.getBody();
//Get body from the PDF text and replace the keywords based on the response submission
body.replaceText("{name}", info['Patient Name'][0] );
body.replaceText("{report no}", info['Report No'][0]);
body.replaceText("{before}", info['Before Treatment Image'][0]);
body.replaceText("{after}", info['After Treatment Image'][0]);
OpenDoc.saveAndClose();
//Creating a file name based on the report no
const blobPDF = newTempFile.getAs(MimeType.PDF);
const PdfFile = pdfFolder.createFile(blobPDF).setName(info['Report No'][0]])
tempFolder.removeFile(newTempFile);
}

How do I insert a slide into another PowerPoint slide using OpenXML?

I would like to take a PowerPoint slide (the "source"), and insert it into another PowerPoint slide (the "target") that already contains some content, at a specific position in the source PowerPoint slide.
I've tried several ways to research code that does this, but I keep getting results for merging slides into PowerPoint presentations, which is not what I want. I want to take an existing slide and insert it into another, much like one would insert a picture into an existing slide.
I have code that another coworker wrote that clones all of the elements from the source slide, but it is convoluted and uses different code variations for different element types. Here is a representative sample of that code:
foreach (OpenXmlElement element in sourceSlide.CommonSlideData.ShapeTree.ChildElements.ToList())
{
string elementTypeName = element.GetType().ToString();
if (elementTypeName.EndsWith(".Picture"))
{
// Deep clone the element.
elementClone = element.CloneNode(true);
// Adjust the offsets so it is positioned correctly.
((Picture)elementClone).ShapeProperties.Transform2D.Offset.X += (Int64)shapeStruct.OffsetX;
((Picture)elementClone).ShapeProperties.Transform2D.Offset.Y += (Int64)shapeStruct.OffsetY;
// Get the shape tree that we're adding the clone to and append to it.
ShapeTree shapeTree = slideCard.CommonSlideData.ShapeTree;
shapeTree.Append(elementClone);
string rId = ((Picture)element).BlipFill.Blip.Embed.Value;
ImagePart imagePart = (ImagePart)slideInstProc.SlidePart.GetPartById(rId);
string contentType = imagePart.ContentType;
// Locate the same object we cloned over to the slide.
var blip = ((Picture)elementClone).BlipFill.Blip;
slidePart = slideCard.SlidePart;
try
{
ImagePart imagePart1 = slidePart.AddImagePart(contentType, rId);
imagePart1.FeedData(imagePart.GetStream());
}
catch (XmlException)
{
//Console.WriteLine(xe.ToString());
Console.WriteLine("Duplicate rId (" + rId + ")");
}
}
if (elementTypeName.EndsWith(".GroupShape"))
{
... etc
The code continues with an else-if ladder containing blocks of code for element type names ending with .GroupShape, .GraphicFrame, .Shape, and .ConnectionShape, concluding with a catchall else at the bottom.
The problem is this code doesn't process some types of objects properly. For one thing, it doesn't process drawings at all (perhaps because some of them originated from an older version of PowerPoint), and when it does, it does things like change the color of the drawing.
What I was hoping is that there was a more fundamental way (i.e. simpler, generic code) to embed a source PowerPoint slide into another, treating it like a single object, without looking at element types within the source PowerPoint specifically.
Alternatively, what would be the way to process drawings or images in ordinary "shapes" that don't identify themselves specifically as images?
This is the code that solved the specific problem I was describing above:
using A = DocumentFormat.OpenXml.Drawing;
foreach(A.BlipFill blipFill in shape.Descendants<A.BlipFill>())
{
string rId = blipFill.Blip.Embed.Value;
ImagePart imagePart = (ImagePart)slideInstProc.SlidePart.GetPartById(rId);
string contentType = imagePart.ContentType;
try
{
ImagePart imagePart1 = slidePart.AddImagePart(contentType, rId);
imagePart1.FeedData(imagePart.GetStream());
}
catch (XmlException)
{
Console.WriteLine("Duplicate rId (" + rId + ")");
}
}
Which, when applied to elementTypeName.EndsWith(".shape"); produces exactly the result I want.
For composing complete slides into a presentation (which doesn't require some of the generation mechanics that we do), OpenXmlPowerTools is a much better approach.

Google Advanced Drive API fails on insert of some PDFs but not others

function extractTextFromPDF() {
// PDF File URL
// You can also pull PDFs from Google Drive
// this Fall2019_LLFullCatalog.pdf will not insert - internal error on insert is all the feedback that gets logged"
// doesn't matter if I retrieve it from the university website or if I first copy it to my google drive and then retrieve it from there
//var url = "https://uwf.edu/media/university-of-west-florida/offices/continuing-ed/leisure-learning/docs/Fall2019_LLFullCatalog.pdf";
//var url = "https://drive.google.com/drive/u/0/my-drive/Fall2019_LLFullCatalog.pdf";
// both of these pdfs will insert just fine. Size is not the issue because this one is much larger than the one I need to insert
var url = "https://eloquentjavascript.net/Eloquent_JavaScript_small.pdf";
//var url = "https://img.labnol.org/files/Most-Useful-Websites.pdf";
var blob = UrlFetchApp.fetch(url).getBlob();
var size = blob.getBytes().length;
var resource = {
title: blob.getName(),
mimeType: blob.getContentType()
};
// Enable the Advanced Drive API Service
var file = Drive.Files.insert(resource, blob, {ocr: true, ocrLanguage: "en"});
// Extract Text from PDF file
var doc = DocumentApp.openById(file.id);
var text = doc.getBody().getText();
return text;
}
See comments in code above that describe the problem.
The PDF that I need to insert with OCR is not working - regardless of whether I retrieve it from the original site or retrieve a copy that I put on google drive. However, two other PDF urls will insert just fine and one of them is considerably larger than the one that fails.
What else could be the issue, if not size limitation?
Thanks,
Steve
It could very well be a bug in the Chrome API. Not all PDF software is created equal, check if the PDF can be read in Adobe Acrobat as a simple test.

Save multiple images with one function - AS3 ADOBE AIR

I've got an Array with 17 web links of images
var products:Array;
trace(products)
// Ouput :
"http://www.myWebsite.com/zootopia.jpg"
"http://www.myWebsite.com/james.jpg"
"http://www.myWebsite.com/tom.jpg"
..etc
If I do products[10].movieimage; the output will be the 9th link (something like : "http://www.myWebsite.com/lalaland.jpg")
I'm looking for downloading every images without a dialog box.
I manage to do so for 1 image with the specific link, like that :
function saveImage (event:Event):void {
var stream:URLStream = new URLStream();
var image1:File = File.documentsDirectory.resolvePath("test.jpg");
var fileStream:FileStream = new FileStream();
stream.load(new URLRequest("http://www.myWebsite.com/lalaland.jpg"));
stream.addEventListener(Event.COMPLETE, writeComplete);
        
function writeComplete(evt:Event):void  {
                var fileData:ByteArray = new ByteArray();
                stream.readBytes(fileData,0,stream.bytesAvailable);
                fileStream.openAsync(image1, FileMode.UPDATE);
                fileStream.writeBytes(fileData,0,fileData.length);
                fileStream.close();
trace("writeComplete");
trace(image1.url);
        }
}
Question : Is there a way to download all the images with the web links of my products array ? (and if images already exist, replace them. I could use if (image1.exists){ if (image2.exists){ ..etc for each image. But maybe there is a simplier solution)
If you could show me how, with a bit of code, I could that.
Also, note that I'd like to load the images then in Uiloader, like that :
function loadImages():void {
uiloader1.source = image1.url;
uiloader2.source = image2.url;
etc...
}
Don't over think it. You have your array of images. You have your tested routine for saving one image. Now put it together:
Some function initializes things and kicks it off.
Either splice out (or pop out) an item on the array – OR use a index var to access an item in the array
Pass that to your download function.
When the download completes either pop another item off the array OR increment your index. But first you would test if array.length== 0 OR `index > array.length. If either is true (depending on which method you use), then you are done.
If you want to get fancy you can show a progress bar and update it each time your download completes.

pdfbox - rotation issue

As part of a project I am realizing, there are given pdfdocuments which include forms as JPEG Images within A4 pages inside this documents. If have to extract those JPGs out of the PDF. Later on those JPGs are used to build PDF Documents again.
When I simply open up those Documents with any PDFViewer they seem to have no rotation at all, at least it is not visible. So like this icon the have vertical format.
but when I use this sample code to extract the images :
PDDocument doc = PDDocument.load("/path/to/file);
List pages = doc.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
int i = 0;
while (iter.hasNext()) {
PDPage page = (PDPage) iter.next();
System.out.println(page.getRotation());
System.out.println("ROTATION = " + page.getRotation());;
PDResources resources = page.getResources();
Map pageImages = resources.getXObjects();
if (pageImages != null) {
Iterator imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
if(((PDXObjectImage) pageImages.get(key)) instanceof PDXObjectImage){
PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
image.write2file("/path/to/file" + i);
}
i ++;
}
}
}
all extracted JPGs are horizontal format. Further the sysout on the page.rotation tells me that the rotation is set to 270°.
How is that possible? 270 is set, but the PDF is shown vertical (I am no expert on PDF). I even did page.setRotate(0) before extracting the JPGs, but the images still remain horizontally. I read the following Thread telling how to rotate images before drawing them on the pdf. But i need to rotate them before writing them on the filesystem. What is the best way to achieve that?
Unfortunately, I can not attach any of the documents since they are confidential.

Resources