Rotated image extracted from pdfsharp - image

I am successfully able to extract images from a pdf using pdfsharp. The image are of CCITFFaxDecode. But in the tiff image created , the image is getting rotated. Any idea what might be going wrong?
This is the code im using :
byte[] data = xObject.Stream.Value;
Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\clip_TIFF.tif", "w");
tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width));
tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height));
tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4);
tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp));
tiff.WriteRawStrip(0,data,data.Length);
tiff.Close();

Since the question is still tagged w/iTextSharp might as add some code, even though it doesn't look like you're using the library here. PDF parsing support was added starting in iText[Sharp] 5.
Didn't have an test PDF with the image type you're using, but found one here (see the attachment). Here's a very simple working example in ASP.NET (HTTP handler .ashx) using that test PDF document to get you going:
<%# WebHandler Language="C#" Class="CCITTFaxDecodeExtract" %>
using System;
using System.Collections.Generic;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Dotnet = System.Drawing.Image;
using System.Drawing.Imaging;
public class CCITTFaxDecodeExtract : IHttpHandler {
public void ProcessRequest (HttpContext context) {
HttpServerUtility Server = context.Server;
HttpResponse Response = context.Response;
string file = Server.MapPath("~/app_data/CCITTFaxDecode.pdf");
PdfReader reader = new PdfReader(file);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener();
for (int i = 1; i <= reader.NumberOfPages; i++) {
parser.ProcessContent(i, listener);
}
for (int i = 0; i < listener.Images.Count; ++i) {
string path = Server.MapPath("~/app_data/" + listener.ImageNames[i]);
using (FileStream fs = new FileStream(
path, FileMode.Create, FileAccess.Write
))
{
fs.Write(listener.Images[i], 0, listener.Images[i].Length);
}
}
}
public bool IsReusable { get { return false; } }
/*
* see: TextRenderInfo & RenderListener classes here:
* http://api.itextpdf.com/itext/
*
* and Google "itextsharp extract images"
*/
public class MyImageRenderListener : IRenderListener {
public void RenderText(TextRenderInfo renderInfo) { }
public void BeginTextBlock() { }
public void EndTextBlock() { }
public List<byte[]> Images = new List<byte[]>();
public List<string> ImageNames = new List<string>();
public void RenderImage(ImageRenderInfo renderInfo) {
PdfImageObject image = renderInfo.GetImage();
PdfName filter = image.Get(PdfName.FILTER) as PdfName;
if (filter == null) {
PdfArray pa = (PdfArray) image.Get(PdfName.FILTER);
for (int i = 0; i < pa.Size; ++i) {
filter = (PdfName) pa[i];
}
}
if (PdfName.CCITTFAXDECODE.Equals(filter)) {
using (Dotnet dotnetImg = image.GetDrawingImage()) {
if (dotnetImg != null) {
ImageNames.Add(string.Format(
"{0}.tiff", renderInfo.GetRef().Number)
);
using (MemoryStream ms = new MemoryStream()) {
dotnetImg.Save(
ms, ImageFormat.Tiff);
Images.Add(ms.ToArray());
}
}
}
}
}
}
}
If the image(s) is/are being rotated, see this thread on the iText mailing list; perhaps some of the pages in the PDF document have been rotated.

By the by this is the complete code which is extracting the image from the pdf, but rotating it. Sorry about the length of the code.
PdfDocument document = PdfReader.Open("D:\\Sample.pdf");
PdfDictionary resources =document.pages.Elements.GetDictionary("/Resources");
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
if (xObjects != null)
{
ICollection<PdfItem> items = xObjects.Elements.Values;
// Iterate references to external objects
foreach (PdfItem item in items)
{
PdfReference reference = item as PdfReference;
if (reference != null)
{
PdfDictionary xObject = reference.Value as PdfDictionary;
// Is external object an image?
if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
{
string filter = xObject.Elements.GetName("/Filter");
if (filter.Equals("/CCITTFaxDecode"))
{
int width = xObject.Elements.GetInteger(PdfImage.Keys.Width);
int height = xObject.Elements.GetInteger(PdfImage.Keys.Height);
int bpp = xObject.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
byte[] data = xObject.Stream.Value;
Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\sample.tif", "w");
tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width));
tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height));
tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4);
tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp));
tiff.SetField(TiffTag.STRIPOFFSETS, 187);
tiff.WriteRawStrip(0,data,data.Length);
tiff.Close();
}
}
}
}
}

Related

Combining forms while retaining form fonts in itext7

I am trying to fill and combine multiple forms without flattening(need to keep them interactive for users). However I notice a problem. I have PDF files that contain the forms I am trying to fill. The form fields have their fonts set in adobe PDF. I notice after I combine the forms the fields lose their original fonts. Here is my program.
using iText.Forms;
using iText.Kernel.Pdf;
using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.CompilerServices;
using System.Threading.Tasks;
namespace PdfCombineTest
{
class Program
{
static void Main(string[] args)
{
Stream file1;
Stream file2;
using (var stream = new FileStream("./pdf-form-1.pdf", FileMode.Open, FileAccess.Read))
{
file1 = Program.Fill(stream, new[] { KeyValuePair.Create("Text1", "TESTING"), KeyValuePair.Create("CheckBox1", "Yes") });
}
using (var stream = new FileStream("./pdf-form-2.pdf", FileMode.Open, FileAccess.Read))
{
file2 = Program.Fill(stream, new[] { KeyValuePair.Create("Text2", "text 2 text") });
}
using (Stream output = Program.Combine(new[] { file1, file2 }))
{
using (var fileStream = File.Create("./output.pdf"))
{
output.CopyTo(fileStream);
}
}
}
public static Stream Combine(params Stream[] streams)
{
MemoryStream copyStream = new MemoryStream();
PdfWriter writer = new PdfWriter(copyStream);
writer.SetSmartMode(true);
writer.SetCloseStream(false);
PdfPageFormCopier formCopier = new PdfPageFormCopier();
using (PdfDocument combined = new PdfDocument(writer))
{
combined.InitializeOutlines();
foreach (var stream in streams)
{
using (PdfDocument document = new PdfDocument(new PdfReader(stream)))
{
document.CopyPagesTo(1, document.GetNumberOfPages(), combined, formCopier);
}
}
}
copyStream.Seek(0, SeekOrigin.Begin);
return copyStream;
}
public static Stream Fill(Stream inputStream, IEnumerable<KeyValuePair<string, string>> keyValuePairs)
{
MemoryStream outputStream = new MemoryStream();
PdfWriter writer = new PdfWriter(outputStream);
writer.SetCloseStream(false);
using (PdfDocument document = new PdfDocument(new PdfReader(inputStream), writer))
{
PdfAcroForm acroForm = PdfAcroForm.GetAcroForm(document, true);
acroForm.SetGenerateAppearance(true);
IDictionary<string, iText.Forms.Fields.PdfFormField> fields = acroForm.GetFormFields();
foreach (var kvp in keyValuePairs)
{
fields[kvp.Key].SetValue(kvp.Value);
}
}
outputStream.Seek(0, SeekOrigin.Begin);
return outputStream;
}
}
}
I've noticed after several hours of debugging that PdfPageFormCopier excludes the default resources which contain fonts when merging form fields, is there a way around this? The project I'm working on currently does this process in ItextSharp and it works as intended. However we are looking to migrate to iText7.
Here are links to some sample pdf's I made I can't upload the actual pdf's I'm working with but these display the same problem.
https://www.dropbox.com/s/pukt91d4xe8gmmo/pdf-form-1.pdf?dl=0
https://www.dropbox.com/s/c52x6bc99gnrvo6/pdf-form-2.pdf?dl=0
So my solution was to modify the PdfPageFormCopier class from iText. The main issue is in the function below.
public virtual void Copy(PdfPage fromPage, PdfPage toPage) {
if (documentFrom != fromPage.GetDocument()) {
documentFrom = fromPage.GetDocument();
formFrom = PdfAcroForm.GetAcroForm(documentFrom, false);
}
if (documentTo != toPage.GetDocument()) {
documentTo = toPage.GetDocument();
formTo = PdfAcroForm.GetAcroForm(documentTo, true);
}
if (formFrom == null) {
return;
}
//duplicate AcroForm dictionary
IList<PdfName> excludedKeys = new List<PdfName>();
excludedKeys.Add(PdfName.Fields);
excludedKeys.Add(PdfName.DR);
PdfDictionary dict = formFrom.GetPdfObject().CopyTo(documentTo, excludedKeys, false);
formTo.GetPdfObject().MergeDifferent(dict);
IDictionary<String, PdfFormField> fieldsFrom = formFrom.GetFormFields();
if (fieldsFrom.Count <= 0) {
return;
}
IDictionary<String, PdfFormField> fieldsTo = formTo.GetFormFields();
IList<PdfAnnotation> annots = toPage.GetAnnotations();
foreach (PdfAnnotation annot in annots) {
if (!annot.GetSubtype().Equals(PdfName.Widget)) {
continue;
}
CopyField(toPage, fieldsFrom, fieldsTo, annot);
}
}
Specifically the line here.
excludedKeys.Add(PdfName.DR);
If you walk the the code in the CopyField() function eventually you will end in the PdfFormField class. You can see the constructor below.
public PdfFormField(PdfDictionary pdfObject)
: base(pdfObject) {
EnsureObjectIsAddedToDocument(pdfObject);
SetForbidRelease();
RetrieveStyles();
}
The function RetrieveStyles() will try to set the font for the field based on the default appearance. However that will not work. Due to the function below.
private PdfFont ResolveFontName(String fontName) {
PdfDictionary defaultResources = (PdfDictionary)GetAcroFormObject(PdfName.DR, PdfObject.DICTIONARY);
PdfDictionary defaultFontDic = defaultResources != null ? defaultResources.GetAsDictionary(PdfName.Font) :
null;
if (fontName != null && defaultFontDic != null) {
PdfDictionary daFontDict = defaultFontDic.GetAsDictionary(new PdfName(fontName));
if (daFontDict != null) {
return GetDocument().GetFont(daFontDict);
}
}
return null;
}
You see it is trying to see if the font exists in the default resources which was explicitly excluded in the PdfPageFormCopier class. It will never find the font.
So my solution was to create my own class that implements the IPdfPageExtraCopier interface. I copied the code from the PdfPageFormCopier class and removed the one line excluding the default resources. Then I use my own copier class in my code. Not the prettiest solution but it works.

How to select multiple picture from gallery using GMImagePicker in xamarin IOS?

I am following this blog for selecting multiple pictures from the gallery. For IOS I am Using GMImagePicker for selecting multiple pictures from the gallery.(In the blog suggesting elcimagepicker, but that is not available in Nuget Store now)
I go through the GMImagePicker usage part but didn't find how to add the selected images to List and pass that value in MessagingCenter(like the android implementation). In that usage part only telling about the picker settings. Anyone please give me any sample code for doing this feature?
Hi Lucas Zhang - MSFT, I tried your code but one question. Here you are passing only one file path through the messagecenter, so should I use a List for sending multiple file paths?
I am passing the picture paths as a string List from android. Please have a look at the android part code added below. Should I do like this in IOS?
protected override void OnActivityResult(int requestCode, Result resultCode, Intent data)
{
base.OnActivityResult(requestCode, resultCode, data);
if (resultCode == Result.Ok)
{
List<string> images = new List<string>();
if (data != null)
{
ClipData clipData = data.ClipData;
if (clipData != null)
{
for (int i = 0; i < clipData.ItemCount; i++)
{
ClipData.Item item = clipData.GetItemAt(i);
Android.Net.Uri uri = item.Uri;
var path = GetRealPathFromURI(uri);
if (path != null)
{
//Rotate Image
var imageRotated = ImageHelpers.RotateImage(path);
var newPath = ImageHelpers.SaveFile("TmpPictures", imageRotated, System.DateTime.Now.ToString("yyyyMMddHHmmssfff"));
images.Add(newPath);
}
}
}
else
{
Android.Net.Uri uri = data.Data;
var path = GetRealPathFromURI(uri);
if (path != null)
{
//Rotate Image
var imageRotated = ImageHelpers.RotateImage(path);
var newPath = ImageHelpers.SaveFile("TmpPictures", imageRotated, System.DateTime.Now.ToString("yyyyMMddHHmmssfff"));
images.Add(newPath);
}
}
MessagingCenter.Send<App, List<string>>((App)Xamarin.Forms.Application.Current, "ImagesSelected", images);
}
}
}
Also, I am getting an error, screenshot adding below:
GMImagePicker will return a list contains PHAsset .So you could firstly get the filePath of the images and then pass them to forms by using MessagingCenter and DependencyService.Refer the following code.
in Forms, create an interface
using System;
namespace app1
{
public interface ISelectMultiImage
{
void SelectedImage();
}
}
in iOS project
using System;
using Xamarin.Forms;
using UIKit;
using GMImagePicker;
using Photos;
using Foundation;
[assembly:Dependency(typeof(SelectMultiImageImplementation))]
namespace xxx.iOS
{
public class SelectMultiImageImplementation:ISelectMultiImage
{
public SelectMultiImageImplementation()
{
}
string Save(UIImage image, string name)
{
var documentsDirectory = Environment.GetFolderPath
(Environment.SpecialFolder.Personal);
string jpgFilename = System.IO.Path.Combine(documentsDirectory, name); // hardcoded filename, overwritten each time
NSData imgData = image.AsJPEG();
if (imgData.Save(jpgFilename, false, out NSError err))
{
return jpgFilename;
}
else
{
Console.WriteLine("NOT saved as " + jpgFilename + " because" + err.LocalizedDescription);
return null;
}
}
public void SelectedImage()
{
var picker = new GMImagePickerController();
picker.FinishedPickingAssets += (s, args) => {
PHAsset[] assets = args.Assets;
foreach (PHAsset asset in assets)
{
PHImageManager.DefaultManager.RequestImageData(asset, null, (NSData data, NSString dataUti, UIImageOrientation orientation, NSDictionary info) =>
{
NSUrl url = NSUrl.FromString(info.ValueForKey(new NSString("PHImageFileURLKey")).ToString());
string[] strs = url.Split("/");
UIImage image = UIImage.LoadFromData(data);
string file = Save(UIImage.LoadFromData(data), strs[strs.Length - 1]);
MessagingCenter.Send<Object, string>(this, "ImagesSelected", file);
}
);
}
};
UIApplication.SharedApplication.KeyWindow.RootViewController.PresentViewController(picker, true,null);
}
}
}
in your contentPages
...
List<string> selectedImages;
...
public MyPage()
{
selectedImages = new List<string>();
InitializeComponent();
MessagingCenter.Subscribe<Object,string>(this, "ImagesSelected",(object arg1,string arg2) =>
{
string source = arg2;
selectedImages.Add(source);
});
}
If you want to select the images ,call the method
DependencyService.Get<ISelectMultiImage>().SelectedImage();

Converting PDF to Grayscale pdf using ABC PDF

I am trying convert PDF to grayscale(Black/White) PDF using Websupergoo ABCpdf.
I am referring
http://www.websupergoo.com/helppdfnet/source/8-abcpdf.operations/3-recoloroperation/1-methods/recolor.htm?q=recoloroperation
Doc theDoc = new Doc();
theDoc.Read(Server.MapPath("src.pdf"));
int pages = theDoc.PageCount;
MyOp.Recolor(theDoc, (WebSupergoo.ABCpdf8.Objects.Page)theDoc.ObjectSoup[theDoc.Page]); //Here problem
theDoc.Save(Server.MapPath("greyscale1.pdf"));
theDoc.Clear();
Above code works fine for single page PDf.
This Code Converts only first page of PDF
When I tried to use a loop the below error is occurring
Page Number is not the same as Page in abcPDF, so you cannot use the page number as an index into the object soup.
Try something like this instead (untested):
int pages = theDoc.PageCount;
for(int i=0; i < pages; i++)
{
theDoc.PageNumber = i;
MyOp.Recolor(theDoc, (WebSupergoo.ABCpdf8.Objects.Page)theDoc.ObjectSoup[theDoc.Page]);
}
Edit: The above apparently didn't work, but as the documentation you linked to shows, there's a method that takes a Doc object instead of a Page object. This should work if you change your MyOp.Recolor() method to this:
public class MyOp
{
public static void Recolor(Doc doc) {
RecolorOperation op = new RecolorOperation();
op.DestinationColorSpace = new ColorSpace(doc.ObjectSoup, ColorSpaceType.DeviceGray);
op.ConvertAnnotations = false;
op.ProcessingObject += Recoloring;
op.ProcessedObject += Recolored;
op.Recolor(doc);
}
}
I am not sure what you are doing (or need to do) in the Recoloring() method or Recolored() method, but that should not matter for the changes here.
Since I went crazy with converting PDF to grayscale here
c# printing through PDF drivers, print to file option will output PS instead of PDF
I found above answer (thank you) but needs to be corrected a little bit for everyone may need:
Doc theDoc = new Doc();
theDoc.Read("test.pdf");
//doc.Rendering.ColorSpace = XRendering.ColorSpaceType.Gray;
//doc.SaveOptions.
//MyOp.Recolor(theDoc, (Page)theDoc.ObjectSoup[theDoc.Page]);
int pages = theDoc.PageCount;
for (int i = 0; i < pages; i++)
{
theDoc.PageNumber = i+1; // this is because numbering is from 1 :)
MyOp.Recolor(theDoc, (Page)theDoc.ObjectSoup[theDoc.Page]);
}
theDoc.Save("out.pdf");
theDoc.Clear();
The class remains as in their example
public class MyOp
{
public static void Recolor(Doc doc, Page page)
{
RecolorOperation op = new RecolorOperation();
op.DestinationColorSpace = new ColorSpace(doc.ObjectSoup, ColorSpaceType.DeviceGray);
op.ConvertAnnotations = false;
op.ProcessingObject += Recoloring;
op.ProcessedObject += Recolored;
op.Recolor(page);
}
//public static void Recolor(Doc doc)
//{
// RecolorOperation op = new RecolorOperation();
// op.DestinationColorSpace = new ColorSpace(doc.ObjectSoup, ColorSpaceType.DeviceGray);
// op.ConvertAnnotations = false;
// op.ProcessingObject += Recoloring;
// op.ProcessedObject += Recolored;
// op.Recolor(doc);
//}
public static void Recoloring(object sender, ProcessingObjectEventArgs e)
{
PixMap pm = e.Object as PixMap;
if (pm != null)
{
ColorSpaceType cs = pm.ColorSpaceType;
if (cs == ColorSpaceType.DeviceCMYK)
e.Cancel = true;
e.Tag = cs;
}
}
public static void Recolored(object sender, ProcessedObjectEventArgs e)
{
if (e.Successful)
{
PixMap pm = e.Object as PixMap;
if (pm != null)
{
ColorSpaceType cs = (ColorSpaceType)e.Tag;
if (pm.Width > 1000)
pm.CompressJpx(30);
else if (cs == ColorSpaceType.DeviceRGB)
pm.CompressJpeg(30);
else
pm.Compress(); // Flate
}
}
}
}
Don't forget to use (not other version) and works like a charm.
using WebSupergoo.ABCpdf9.Objects;
using WebSupergoo.ABCpdf9.Operations;

CodeFluent Image-Type with Windows-Forms

how can I handle and bind a PictureBox-Control from Windows-Forms to a binary CFE-Type i.e. image-type? Am I supposed to take another type for this?
Regards,
Mykola
You can load the image using the GetInputStream method:
using (var stream = _customer.Photo.GetInputStream()
{
pictureBox1.Image = Image.FromStream(stream);
}
Using Extension-Methods from the ImageConverter-Class save and load of an image-value could be very easy, i.e.:
pictureBoxLogo.Image.saveImage(obj.Photo);
pictureBoxLogo.Image = ((Image)null).loadImage(obj, obj.Photo);
Here how the Converter-Class could look like:
...
using System.IO;
using CodeFluent.Runtime.BinaryServices;
public static class ImageConverter
{
public static byte[] toByteArray(this Image image)
{
using (var ms = new System.IO.MemoryStream())
{
image.Save(ms, image.RawFormat);
return ms.ToArray();
}
}
public static Image toImage(this byte[] bytesArr)
{
MemoryStream memstr = new MemoryStream(bytesArr);
Image img = Image.FromStream(memstr);
return img;
}
public static Image loadImage(object entity, BinaryLargeObject image)
{
if (entity != null && image != null)
{
using (var stream = image.GetInputStream())
{
if (stream.Length > 0)
return Image.FromStream(stream);
else
return null;
}
}
else
return null;
}
public static Image loadImage(this Image owner, object entity, BinaryLargeObject image)
{
return loadImage(entity, image);
}
public static void saveImage(this Image owner, BinaryLargeObject image)
{
if (owner != null && image != null)
image.Save(owner.toByteArray());
}
}

Extract Images from PDF coordinates using iText

I found some examples for how to extract images from PDF using iText. But what I am looking for is to get the images from PDF by coordinates.
Is it possible? If yes then how it can be done.
Along the lines of the iText example ExtractImages you can extract code like this:
PdfReader reader = new PdfReader(resourceStream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = new ImageRenderListener("testpdf");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
The ImageRenderListener is defined like this:
class ImageRenderListener implements RenderListener
{
final String name;
int counter = 100000;
public ImageRenderListener(String name)
{
this.name = name;
}
public void beginTextBlock() { }
public void renderText(TextRenderInfo renderInfo) { }
public void endTextBlock() { }
public void renderImage(ImageRenderInfo renderInfo)
{
try
{
PdfImageObject image = renderInfo.getImage();
if (image == null) return;
int number = renderInfo.getRef() != null ? renderInfo.getRef().getNumber() : counter++;
String filename = String.format("%s-%s.%s", name, number, image.getFileType());
FileOutputStream os = new FileOutputStream(filename);
os.write(image.getImageAsBytes());
os.flush();
os.close();
PdfDictionary imageDictionary = image.getDictionary();
PRStream maskStream = (PRStream) imageDictionary.getAsStream(PdfName.SMASK);
if (maskStream != null)
{
PdfImageObject maskImage = new PdfImageObject(maskStream);
filename = String.format("%s-%s-mask.%s", name, number, maskImage.getFileType());
os = new FileOutputStream(filename);
os.write(maskImage.getImageAsBytes());
os.flush();
os.close();
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
As you see the ImageRenderListener method renderImage retrieves an argument ImageRenderInfo. This arguments has methods
getStartPoint giving you a vector in User space representing the start point of the xobject and
getImageCTM giving you the coordinate transformation matrix active when this image was rendered. Coordinates are in User space.
The latter gives you the information which exact manipulation on a 1x1 user space unit square are used to actually draw the image. As you are aware, an image may be rotated, stretched, skewed, and moved (the former method actually extracts its result from the matrix from the "moved" information).

Resources