Issue with Generating PDF from Long Html string using itextsharp - pdf-generation

I have a requirement to generate a PDF from HTML that contains a table with almost 2000 rows. When I am trying to generating the PDF, it's generating up to 250 rows then skipping the other rows and HTML content and only generating 6 pages. What may cause the issue?
string HtmlStream="Some large content", string FileName="abcd.pdf";
object TargetFile = FileName;
string ModifiedFileName = string.Empty;
string FinalFileName = string.Empty;
ModifiedFileName = TargetFile.ToString();
ModifiedFileName = ModifiedFileName.Insert(ModifiedFileName.Length - 4, "1");
SelectPdf.HtmlToPdf converter = new SelectPdf.HtmlToPdf();
// set converter options
string pdf_page_size = "A4";
SelectPdf.PdfPageSize pageSize = (SelectPdf.PdfPageSize)Enum.Parse(typeof(SelectPdf.PdfPageSize),
pdf_page_size, true);
converter.Options.PdfPageSize = pageSize;
converter.Options.PdfPageOrientation = SelectPdf.PdfPageOrientation.Portrait;
converter.Options.MarginLeft = 10;
converter.Options.MarginRight = 10;
converter.Options.MarginTop = 10;
converter.Options.MarginBottom = 10;
converter.Options.MaxPageLoadTime = 1000;
// create a new pdf document converting an url
SelectPdf.PdfDocument doc = converter.ConvertHtmlString(HtmlStream);
doc.Save(ModifiedFileName.ToString());

Related

Issue viewing image created using iTextSharp

I have been successful in creating image from PDF using iTextSharp. It creates images equal to number of pages in PDF but generated images does not preview in any image viewer software. It says image is corrupted. Below is the code I have created.
try
{
PdfReader reader = null;
int currentPage = 1;
int pageCount = 0;
string destinationFolderPath = string.Format(#"{0}PageImages\{1}", BaseDataPath, Convert.ToString(documentId));
if (!Directory.Exists(destinationFolderPath))
{
Directory.CreateDirectory(destinationFolderPath);
}
reader = new PdfReader(filePath);
reader.RemoveUnusedObjects();
pageCount = reader.NumberOfPages;
string ext = ".png";
for (int i = 1; i <= pageCount; i++)
{
PdfReader reader1 = new PdfReader(filePath);
string destinationFilePath = string.Format(#"{0}/{1}{2}", destinationFolderPath, Convert.ToString(i), ext);
reader1.RemoveUnusedObjects();
Document doc = new Document(reader1.GetPageSizeWithRotation(currentPage));
PdfCopy pdfCpy = new PdfCopy(doc, new FileStream(destinationFilePath, FileMode.Create));
doc.Open();
for (int j = 1; j <= 1; j++)
{
PdfImportedPage page = pdfCpy.GetImportedPage(reader1, currentPage);
//pdfCpy.SetFullCompression();
pdfCpy.AddPage(page);
currentPage += 1;
}
doc.Close();
pdfCpy.Close();
reader1.Close();
reader.Close();
}
}
catch (Exception ex)
{
throw ex;
}
Could someone please suggest what is wrong here?
Thanks
You are creating a PDF file using PdfCopy, but you are storing that PDF as if you were creating a PNG file:
string ext = ".png";
string destinationFilePath =
string.Format(#"{0}/{1}{2}",
destinationFolderPath, Convert.ToString(i), ext);
PdfCopy pdfCpy = new PdfCopy(doc,
new FileStream(destinationFilePath, FileMode.Create));
You can't open a .png file in a PDF viewer. Your operating system will try to open the file you're creating as if it were an image, but the bytes of that "image" will be PDF bytes and your image viewer won't recognize it.
Change this line:
string ext = ".png";
To this:
string ext = ".pdf";
And you'll be able to open your file in a PDF viewer.
By the way: your code is awkward. For instance. I don't understand why you'd create a look to execute something only once:
for (int j = 1; j <= 1; j++)
Also: if it's your intention to convert PDF pages to PNG, reconsider. iTextSharp doesn't convert PDF to images.

How to retrieve column names from a excel sheet?

Using EPPlus I'm writing data to multiple sheets. If a sheet is not created I'm adding a sheet else I'm retrieving the used rows and adding data from that row and saving it
FileInfo newFile = new FileInfo("Excel.xlsx");
using (ExcelPackage xlPackage = new ExcelPackage(newFile))
{
var ws = xlPackage.Workbook.Worksheets.FirstOrDefault(x => x.Name == language.Culture);
if (ws == null)
{
worksheet = xlPackage.Workbook.Worksheets.Add(language.Culture);
//writing data
}
else
{
worksheet = xlPackage.Workbook.Worksheets[language.Culture];
colCount = worksheet.Dimension.End.Column;
rowCount = worksheet.Dimension.End.Row;
//write data
}
worksheet.Cells[worksheet.Dimension.Address].AutoFitColumns();
xlPackage.Save();
And it is working great.
Now I want to retrieve the column names of each sheet in the excel using LinqToExcel and this is my code
string sheetName = language.Culture;
var excelFile = new ExcelQueryFactory(excelPath);
IQueryable<Row> excelSheetValues = from workingSheet in excelFile.Worksheet(sheetName) select workingSheet;
string[] headerRow = excelFile.GetColumnNames(sheetName).ToArray();
At header row it is throwing me an exception
An OleDbException exception was caught
External table is not in the expected format.
But I don't want to use Oledb and want to work with Linq To Excel.
Note: When I'm working with single sheet rather than multiple sheets
it is working fine and retrieving all columns. Where am I going wrong.
(Based on OP's Comments)
The AutoFitColumn function has always been a little touchy. The important thing to remember is to call it AFTER you load the cell data.
But if you want a use a minimum width (when columns are very narrow and you want to use a minimum) I find EPP to be unreliable. It seems to always use DefualtColWidth of the worksheet even if you pass in a minimumWidth to one of the function overloads.
Here is how I get around it:
[TestMethod]
public void Autofit_Column_Range_Test()
{
//http://stackoverflow.com/questions/31165959/how-to-retrieve-column-names-from-a-excel-sheet
//Throw in some data
var datatable = new DataTable("tblData");
datatable.Columns.Add(new DataColumn("Nar", typeof(int))); //This would not be autofitted without the workaround since the default width of a new ws, usually 8.43
datatable.Columns.Add(new DataColumn("Wide Column", typeof(int)));
datatable.Columns.Add(new DataColumn("Really Wide Column", typeof(int)));
for (var i = 0; i < 20; i++)
{
var row = datatable.NewRow();
row[0] = i;
row[1] = i * 10;
row[2] = i * 100;
datatable.Rows.Add(row);
}
var existingFile2 = new FileInfo(#"c:\temp\temp.xlsx");
if (existingFile2.Exists)
existingFile2.Delete();
using (var package = new ExcelPackage(existingFile2))
{
//Add the data
var ws = package.Workbook.Worksheets.Add("Sheet1");
ws.Cells.LoadFromDataTable(datatable, true);
//Keep track of the original default of 8.43 (excel default unless the user has changed it in their local Excel install)
var orginaldefault = ws.DefaultColWidth;
ws.DefaultColWidth = 15;
//Even if you pass in a miniumWidth as the first parameter like '.AutoFitColumns(15)' EPPlus usually ignores it and goes with DefaultColWidth
ws.Cells[ws.Dimension.Address].AutoFitColumns();
//Set it back to what it was so it respects the user's local setting
ws.DefaultColWidth = orginaldefault;
package.Save();
}
}

How can I add disparate chunks to a PdfPCell using iTextSharp?

How can I concatenate disparate chunks and add them to a paragraph, the paragraph to a cell, then the cell to a table using iTextSharp (in generating a PDF file)?
I am able to get to a certain "place" in my PDF file generation, so that it looks like so (the right side of the page is blank, as it should be):
This is the code I'm using for that:
using (var ms = new MemoryStream())
{
using (var doc = new Document(PageSize.A4, 50, 50, 25, 25))
{
//Create a writer that's bound to our PDF abstraction and our stream
using (var writer = PdfWriter.GetInstance(doc, ms))
{
//Open the document for writing
doc.Open();
var courierBold11Font = FontFactory.GetFont(FontFactory.COURIER_BOLD, 11, BaseColor.BLACK);
var docTitle = new Paragraph("Mark Twain", courierBold11Font);
doc.Add(docTitle);
var timesRoman9Font = FontFactory.GetFont("Times Roman", 9, BaseColor.BLACK);
var subTitle = new Paragraph("Roughing It", timesRoman9Font);
doc.Add(subTitle);
var courier9RedFont = FontFactory.GetFont("Courier", 9, BaseColor.RED);
var importantNotice = new Paragraph("'All down but nine; set 'em up on the other alley, pard' - Scotty Briggs", courier9RedFont);
importantNotice.Leading = 0;
importantNotice.MultipliedLeading = 0.9F; // reduce the width between lines in the paragraph with these two settings
PdfPTable table = new PdfPTable(1);
PdfPCell cellImportantNote = new PdfPCell(importantNotice);
cellImportantNote.BorderWidth = PdfPCell.NO_BORDER;
table.WidthPercentage = 50;
table.HorizontalAlignment = Element.ALIGN_LEFT;
table.AddCell(cellImportantNote);
doc.Add(table);
doc.Close();
}
var bytes = ms.ToArray();
String PDFTestOutputFileName = String.Format("iTextSharp_{0}.pdf", DateTime.Now.ToShortTimeString());
PDFTestOutputFileName = PDFTestOutputFileName.Replace(":", "_");
var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), PDFTestOutputFileName);
File.WriteAllBytes(testFile, bytes);
MessageBox.Show(String.Format("{0} written", PDFTestOutputFileName));
}
}
However, I need to break up the red text so that part of it is bolded, parts of it are anchor tags/hrefs, etc.
I thought I could do it this way:
var courier9RedBoldFont = FontFactory.GetFont(FontFactory.COURIER_BOLD, 9, BaseColor.RED);
// Build up chunkified version of "important notice"
Chunk boldpart = new Chunk("All down but nine - set 'em up on the other alley, pard", courier9RedBoldFont);
Chunk attribution = new Chunk("Scotty Briggs", courier9RedFont);
PdfPTable tbl = new PdfPTable(1);
tbl.WidthPercentage = 50;
tbl.HorizontalAlignment = Element.ALIGN_LEFT;
var par = new Paragraph();
par.Chunks.Add(boldpart);
par.Chunks.Add(attribution );
PdfPCell chunky = new PdfPCell(par);
chunky.BorderWidth = PdfPCell.NO_BORDER;
tbl.AddCell(chunky);
doc.Add(tbl);
...but that's not adding anything at all to the PDF file, but why not? Doesn't a cell take a paragraph, and cannot a paragraph be comprised of Chunks?
Instead of para.Chunks.Add() just use par.Add(); The Chunks that are returned from Paragraph actually come from the base class Phrase. If you look at the code for that property you'll see that the collection returned is actually a temporary collection created on the fly so it is effectively read-only.

Loading a PNG image in HTML file

In my Windows Phone7.1 App Iam loading a HTML file from local path in a WebBrowser. For this I
converted a PNG Image to base64 format using the below code and the problem is base 64 format of image path is not loading the image in the webbrowser.
Please help me where i made mistake?
string s = "data:image/jpg;base64,";
imgStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("NewUIChanges.Htmlfile.round1.png");
byte[] data = new byte[(int)imgStream.Length];
int offset = 0;
while (offset < data.Length)
{
int bytesRead = imgStream.Read(data, offset, data.Length - offset);
if (bytesRead <= 0)
{
throw new EndOfStreamException("Stream wasn't as long as it claimed");
}
offset += bytesRead;
}
base64 = Convert.ToBase64String(data);
Stream htmlStream = Assembly.GetExecutingAssembly().GetManifestResourceStream("NewUIChanges.Htmlfile.equity_built.html");
StreamReader reader = new StreamReader(htmlStream);
string htmlcontent = reader.ReadToEnd();
htmlcontent = htmlcontent.Replace("round1.png", s + base64);
wb.NavigateToString(htmlcontent);
If you have no error, that data contains your image, and round1.png exist in htmlcontent, then it's just probably a image type error, try this:
string s = "data:image/png;base64,";

Abcpdf copyable/selectable text

i'm using websupergoos abcpdf to convert html pages to pdf via addimageurl.
Works great, but the resulting pdf does not allow the user to select text and copy. All is one 'image'.
Is it possible to do this? Which are the settings to use?
This is my current code. The commented "flatten" does not seem to do anything relevant. The HttpStream simply forewards the pdf to users as a doc.
var doc = new Doc();
doc.HtmlOptions.UseScript = true;
doc.Units = "mm";
doc.MediaBox.String = "0 0 210 297";
doc.Rect.String = doc.MediaBox.String;
doc.Rect.Inset(10.0, 10.0);
doc.SetInfo(0, "License", abcpdfkey);
doc.HtmlOptions.UseScript = true;
doc.HtmlOptions.AddMovies = true;
doc.HtmlOptions.RetryCount = 0;
doc.HtmlOptions.ContentCount = 1;
doc.Page = doc.AddPage();
for (int i = doc.AddImageUrl(url); doc.Chainable(i); i = doc.AddImageToChain(i))
{
doc.Page = doc.AddPage();
}
int pageCount = doc.PageCount;
for (int j = 1; j <= pageCount; j++)
{
doc.PageNumber = j;
// doc.Flatten();
}
this.HttpStream(doc.GetData(), filename);
Before sending the PDF to the HTTP stream, you can set the encryption properties
The CanCopy Property sets if the user can copy text from the PDF
To set it add the following code:
doc.Encryption.CanCopy = true;
You may need to set doc.Encryption.CanExtract as well

Resources