I'm trying to extract text from a rotated PDF page: the page has "/Rotate 90" instruction inside. This mean page is rotated when displayed, but it seems not be rotated when extracting text with PdfTextExtractor and LocationTextExtractionStrategy.
I followed example by Mr. Lowagie on
this link
I tryed to rotate area instead of page, but it seem to extract whole text block as one piece instead the exact selected area.
I'm using iText 5.5.12 with Java 1.8
How can I rotate the page for extraction?
Update
The code I use is like this:
PdfReader reader = null;
try {
reader = new PdfReader("C:\\Temp\\rotated.pdf");
Rectangle rect = new Rectangle(480, 484, 576, 525);
final Rectangle pageRect = reader.getPageSize(1);
RenderFilter regionFilter = new RegionTextRenderFilter(rect);
TextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(),
regionFilter);
System.out.println(">>" + PdfTextExtractor.getTextFromPage(reader, 1, strategy).trim());
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null)
reader.close();
}
I can't find a way to upload here an example PDF. I put this image taken from Gimp with selected area. Pdf was created with LibreOffice export function and then manually edited to add /Rotate command.
Given coordinates consider zero point on lower-right corner.
Program output is empty string.
Related
I've got this script:
function testchartbuild() {
var sheet = SpreadsheetApp.getActiveSheet();
var chart = sheet.newChart()
.setChartType(Charts.ChartType.LINE)
.asLineChart()
.setOption("useFirstColumnAsDomain", false)
.addRange(sheet.getRange("BA24:BD231"))
.setNumHeaders(1)
.setPosition(5, 5, 0, 0)
.setOption("legend", {position: "bottom"})
.setOption("width", 690)
.setOption("height", 195)
.build();
// create an image from that EmbeddedChart
sheet.insertImage(chart.getBlob(),58,2); // this works
};
It currently works great to create a chart, and display it as an image on my sheet. I am doing it this way because I can't create a chart above a locked row. Each time I run the script, it creates a new image on top of the old image (if the script was run previously). I would like to edit the script so that it looks to see if another image is already there first, deletes it, then replaces it with the new chart image. I've been looking everywhere to find a way to delete the old image first but can't figure it out. Any help would be great! Thanks
In your situation, in order to remove the existing image with the anchor cell "BF2", how about the following modification?
From:
.build();
// create an image from that EmbeddedChart
sheet.insertImage(chart.getBlob(),58,2); // this works
To:
.build();
sheet.getImages().forEach(e => {
if (e.getAnchorCell().getA1Notation() == "BF2") {
e.remove();
}
});
// create an image from that EmbeddedChart
sheet.insertImage(chart.getBlob(),58,2); // this works
Note:
If you want to remove all existing images, please modify the above modification as follows.
sheet.getImages().forEach(e => e.remove());
References:
getImages()
remove()
In C# I'm trying to pass in a simple HTML string and have the string parsed and added to a PDF document. In the below examples, I'm adding the string to an iText7 Paragraph.
I read this article and managed to write the below code.
https://itextpdf.com/en/resources/books/itext-7-converting-html-pdf-pdfhtml/chapter-1-hello-html-pdf
The first paragraph (p1), Example 1, renders the correct font face, Helvetica. Of course, I'm using the SetAction method, which is completely a different approach than the article. This is for demo purposes only.
The second paragraph (p2), Example 2, converts the HTML just fine but the font for the word "link" is rendered differently than Helvetica. It seems that when HTML is rendered, it ignores the font face of the document.
Sample Screenshot
How can I get the font face of "link" to be Helvetica and use the approach in Example 2? I think I'm missing something minor here. Do I need to define a CSS class since we're in HTML land?
Thank you for any suggestions.
class Program
{
static void Main(string[] args)
{
var pdfWriter = new PdfWriter(#"c:\temp\test.pdf");
var pdfDocument = new PdfDocument(pdfWriter);
var document = new Document(pdfDocument);
// Example 1
var p1 = new Paragraph("p1: this is a test url")
.SetFont(PdfFontFactory.CreateFont(StandardFonts.HELVETICA))
.SetFontSize(12f)
.SetFontColor(new DeviceCmyk(1f, .31f, 0, 0))
.SetFixedPosition(35, 600, UnitValue.CreatePercentValue(100f))
.SetAction(PdfAction.CreateURI("www.google.com"));
document.Add(p1);
// Example 2
var html = #"p2: this is a test url";
var elements = HtmlConverter.ConvertToElements(html);
var p2 = new Paragraph()
.SetFont(PdfFontFactory.CreateFont(StandardFonts.HELVETICA))
.SetFontSize(12f)
.SetFontColor(new DeviceCmyk(1f, .31f, 0, 0))
.SetFixedPosition(35, 550, UnitValue.CreatePercentValue(100f));
foreach (var element in elements)
{
p2.Add((IBlockElement)element);
}
document.Add(p2);
document.Close();
pdfDocument.Close();
pdfWriter.Close();
}
}
The default font-family in pdfHTML is Times, and you are overriding it only for the top-level elements while (almost) all the elements at all nesting levels have their font family explicitly specified after ConvertToElements invocation. To change the font family the easiest solution is indeed apply some CSS to your initial HTML. You can set font-family in style declaration directly:
var html = #"<p style=""font-family: Helvetica"">p2: this is a test url</p>";
Then you don't even have to set font to your paragraph and the paragraph creation code simplifies to
var p2 = new Paragraph()
.SetFontSize(12f)
.SetFontColor(new DeviceCmyk(1f, .31f, 0, 0))
.SetFixedPosition(35, 550, UnitValue.CreatePercentValue(100f));
foreach (var element in elements)
{
p2.Add((IBlockElement)element);
}
Currently I was working on a UWP application. I captured the ink strokes into an image. Now the same image need to display as a preview. So the original image need to re-size to shorter size or thumbnail need to be generated.
I tried with using the larger image directly as a source to shorter sized image canvas --> not working (visible image quality degrade)
I also used Transcode of image programatically --> same result as above
I test with the same image. Re-sized the same image using paint, and there interestingly the quality of the re-sized image remains good.
Please help me to solve the issue I faced.
I not sure, need to see some code, but could be how you are saving the image? Here an example:
try
{
Windows.Storage.Pickers.FileSavePicker save = new Windows.Storage.Pickers.FileSavePicker();
save.SuggestedStartLocation = Windows.Storage.Pickers.PickerLocationId.Desktop;
save.DefaultFileExtension = “.png”;
save.FileTypeChoices.Add(“PNG”, new string[] { “.png” });
StorageFile filesave = await save.PickSaveFileAsync();
using (IOutputStream fileStream = await filesave.OpenAsync(FileAccessMode.ReadWrite))
{
if (fileStream != null)
{
await m_InkManager.SaveAsync(fileStream);
}
}
}
catch (Exception ex)
{
var dialog = new MessageDialog(ex.Message);
dialog.ShowAsync();
}
I will post a great tutorial of Can Bilgin that explain perfectly all about inking.
Drawing / Inking API in WinRT (C#) – I
Drawing / Inking API in WinRT (C#) – II
Drawing / Inking API in WinRT (C#) – III
I use Helvetica font and 14 px size for text. The problem is that if a page does not have any image on it the text is very clear, but in a page with at least 1 image the text is getting a little bold. You can see what I mean in images below:
* Without image on page
* With image on page
The correct font is the one that appear in picture #1. How to make all pages have the same font even if the page contains an image or not?
Thanks.
Sample code:
Document document = new Document(PageSize.LETTER);
document.SetMargins(docMargin, docMargin, docMargin, 25);
writer = PdfWriter.GetInstance(document, new FileStream(filename, FileMode.Create));
document.Open();
Font defaultFont = FontFactory.GetFont("Helvetica", 7.8, Font.NORMAL, new Color(75, 75, 75));
document.Add(new Paragraph("Lorem ipsum lorem ipsum lorem ipsum", defaultFont));
document.Add(Chunk.NEWLINE);
Image img = Image.GetInstance("my png image path");
document.Add(img);
document.Close();
I was finally able to reproduce your problem. The first PNG that I tested with which didn't reproduce your problem I created from Photoshop and used the Save For Web command. The second PNG that I tested and was able to reproduce your problem I created from MSPAINT.EXE. I tried various combinations within Save For Web and none of them have the same problem as Paint.
According to this thread from the official iText mailing list it appears to be something about the color profile of the image.
What are you seeing is the impact of newly placed transparency into a
PDF that had not previously contained it, when consideration isn't
given for the blending colorspace of the final output document.
You have an RGB document that upon adding transparency is forced into
CMYK due to lack of explicit blending space. If you were to specify
RGB as your explicit blending space at the same time you added your
transparency, all would be well.
One thing they recommend is setting the following property on your PdfWriter before adding anything:
writer.RgbTransparencyBlending = true;
When I do it I still see a very minor shift but no where near as pronounced as without it.
This isn't an answer, I just need to be able to post code.
I'm unable to reproduce your results but if I were to guess it has something to do with your PDF renderer. You can confirm this by zooming in on the text, does it look the same when zoomed in? If so, that's your renderer trying to apply visual hints to a print document. If not, can you post a simplified version of your code that does this? Does this do this for all images or just one specific one? How are you creating your text, with Paragraphs, Tables, HTML parsing or something else? What version of iTextSharp are you using?
Below is a full working WinForms C# 2010 targeting iTextSharp 5.1.2.0 that creates a two page PDF. The first page has just text and the second page has text followed by an image loaded from the desktop. On my machine, using Adobe Acrobat Pro 9.1.3 I don't see any difference in fonts when I view it on screen.
using System;
using System.IO;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1 {
public partial class Form1 : Form {
public Form1() {
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e) {
string pdfFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.pdf");
string imgFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Test.png");
using (FileStream fs = new FileStream(pdfFile, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (Document doc = new Document(PageSize.LETTER)) {
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1250, BaseFont.NOT_EMBEDDED);
iTextSharp.text.Font f = new iTextSharp.text.Font(bf, 14);
doc.NewPage();
doc.Add(new Paragraph("This is a test", f));
doc.NewPage();
doc.Add(new Paragraph("This is a test", f));
iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance(imgFile);
img.ScaleAbsolute(100, 100);
doc.Add(img);
doc.Close();
}
}
}
this.Close();
}
}
}
I am using ASP.NET MVC and I've an action that uploads the file. The file is being uploaded properly. But I want width and height of the image. I think I need to convert the HttpPostedFileBase to Image first and then proceed. How do I do that?
And please let me know if there is another better way to get the width and height of the image.
I use Image.FromStream to as follows:
Image.FromStream(httpPostedFileBase.InputStream, true, true)
Note that the returned Image is IDisposable.
You'll need a reference to System.Drawing.dll for this to work, and Image is in the System.Drawing namespace.
Resizing the Image
I'm not sure what you're trying to do, but if you happen to be making thumbnails or something similar, you may be interested in doing something like...
try {
var bitmap = new Bitmap(newWidth,newHeight);
using (Graphics g = Graphics.FromImage(bitmap)) {
g.SmoothingMode = SmoothingMode.HighQuality;
g.PixelOffsetMode = PixelOffsetMode.HighQuality;
g.CompositingQuality = CompositingQuality.HighQuality;
g.InterpolationMode = InterpolationMode.HighQualityBicubic;
g.DrawImage(oldImage,
new Rectangle(0,0,newWidth,newHeight),
clipRectangle, GraphicsUnit.Pixel);
}//done with drawing on "g"
return bitmap;//transfer IDisposable ownership
} catch { //error before IDisposable ownership transfer
if (bitmap != null) bitmap.Dispose();
throw;
}
where clipRectangle is the rectangle of the original image you wish to scale into the new bitmap (you'll need to manually deal with aspect ratio). The catch-block is typical IDisposable usage inside a constructor; you maintain ownership of the new IDisposable object until it is returned (you may want to doc that with code-comments).
Saving as Jpeg
Unfortunately, the default "save as jpeg" encoder doesn't expose any quality controls, and chooses a terribly low default quality.
You can manually select the encoder as well, however, and then you can pass arbitrary parameters:
ImageCodecInfo jpgInfo = ImageCodecInfo.GetImageEncoders()
.Where(codecInfo => codecInfo.MimeType == "image/jpeg").First();
using (EncoderParameters encParams = new EncoderParameters(1))
{
encParams.Param[0] = new EncoderParameter(Encoder.Quality, (long)quality);
//quality should be in the range [0..100]
image.Save(outputStream, jpgInfo, encParams);
}
If you are sure, that the source is image and doesn't need editing, you can do it easily as described here
[HttpPost]
public void Index(HttpPostedFileBase file)
{
if (file.ContentLength > 0)
{
var filename = Path.GetFileName(file.FileName);
System.Drawing.Image sourceimage =
System.Drawing.Image.FromStream(file.InputStream);
}
}
To secure the file is image, add javascript validation to View by adding accept attribute with MIME type to input tag
<input type="file" accept="image/*">
and jQuery validation script
$.validator.addMethod('accept', function () { return true; });
The whole solution can be found here