table width not set in iTextSharp when converting html to PDF - pdf-generation

I am trying to convert an html to pdf but the problem i face is that the html table tags width is not getting set correctly..
This is my html
<table cellpadding='4' cellspacing='4' border='0' width='100%' style='width:100%'>
<tr style='background-color:#000000'>
<td colspan='2' align='center' valign='middle' width='100%'>
<font face='Calibri' size='6' color='#FFFFFF'>Retail Natural Gas Deal Sheet</font>
</td>
</tr>
<tr>
<td colspan='2' width='100%'> </td>
</tr>
<tr>
<td width='90%' style='width:90%'>
<table cellpadding='0' cellspacing='0' border='0' width='100%'>
<tr>
<td width='42%'>
<font face='Calibri' size='4'>
<b>Deal Number</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='4'>
<b>15RTLG7149</b>
</font>
</td>
</tr>
<tr>
<td colspan='3' width='100%'> </td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Trade Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Price Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Authorize Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td colspan='3' width='100%'> </td>
</tr>
</table>
</td>
<td width='10%' style='width:10%' valign='top'>
<table cellpadding='0' cellspacing='0' border='0' width='100%'>
<tr>
<td colspan='2' align='center' width='100%'>
<font face='Calibri' size='2'>
<b>Volumes (MMMBtu)</b>
</font>
</td>
</tr>
</table>
</td>
</tr>
</table>
this is the c# code i am using to generate the pdf
Document pdfDoc = new Document();
//Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 10f, 0f);
//HTMLWorker htmlparser = new HTMLWorker(pdfDoc);
using (MemoryStream memoryStream = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(pdfDoc, memoryStream);
pdfDoc.Open();
XMLWorkerHelper.GetInstance().ParseXHtml(writer, pdfDoc, new StringReader(HTML));
pdfDoc.Close();
byte[] bytes = memoryStream.ToArray();
memoryStream.Close();
return bytes;
}
but this is how its rendered in the pdf.. I am not able to find the right answers.. I need help.. Thanks in advance..
http://i.stack.imgur.com/8WyBh.jpg

I have copy pasted your HTML to a text editor (Notepad++; marked 1 in the screen shot below). I have opened this HTML in a browser (Chrome; marked 2 in the screen shot below). I have converted the HTML to PDF (using XML Worker; the PDF is marked 3 in the screen shot below).
When I compare what I see in the browser with what I see in the PDF, I have the impression that iText's XML Worker is doing a great job. There isn't that much difference between what I see in the browser and what I see in the PDF.
However, when I look at your HTML, I see inconsistencies. Have you tried viewing your HTML in a browser? It doesn't look the way you expected, does it? Seems like the problem isn't caused by iText, but it's caused by the way you create your HTML. Please tell us if the HTML looks the way you expect in a browser. If not, please explain what you expect. Right now, it is hard to understand the problem as what I see in the PDF corresponds really well with what I see in a browser.
Update:
In your question, you didn't add any borders (border='0') and it was hard to see what you mean. I've now added borders, so that the HTML looks like this:
You want the PDF to look like this:
This is very easy if you simplify your HTML like this:
<table cellpadding='4' cellspacing='4' border='1' width='100%' style='width:100%'>
<tr style='background-color:#000000'>
<td colspan='2' align='center' valign='middle'>
<font face='Calibri' size='6' color='#FFFFFF'>XXXX XXXXX XXXXX</font>
</td>
</tr>
<tr>
<td colspan='2'> </td>
</tr>
<tr>
<td width='90%' style='width:90%'>
<table cellpadding='0' cellspacing='0' border='1' width='100%'>
<tr>
<td width='42%'>
<font face='Calibri' size='4'>
<b>Deal Number</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='4'>
<b>XXXXXXXXXX</b>
</font>
</td>
</tr>
<tr>
<td colspan='3' width='100%'> </td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Trade Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Price Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td width='42%'>
<font face='Calibri' size='2'>
<b>Authorize Date</b>
</font>
</td>
<td width='1%'> </td>
<td width='57%'>
<font face='Calibri' size='2'>February 09, 2015</font>
</td>
</tr>
<tr>
<td colspan='3' width='100%'> </td>
</tr>
</table>
</td>
<td width='10%' style='width:10%' valign='top'>
<table cellpadding='0' cellspacing='0' border='1' width='100%'>
<tr>
<td colspan='2' align='center' width='100%'>
<font face='Calibri' size='2'>
<b>Xxxxxxx (XXXXXXX)</b>
</font>
</td>
</tr>
</table>
</td>
</tr>
</table>
What did I change? I removed the width='100%' in the <td> tags where colspan='2'. This information is ambiguous: you are saying that the two columns together should take 100% of the width. However:
You already defined this in the <table> tag where you also have width='100%', and
If a cell has colspan 2 and you say that this cell should take 100% of the width, there is no way to tell the width of each column. It doesn't make sense to put width='100%' there.
iTextSharp defines the width of the columns based on the first row where it can find information about the width. In this case, the first row width such information is a row with colspan 2 in a table with 2 columns. You define the width of these 2 columns combined as 100%, and iTextSharp interprets this as if you want to say that each column takes 50% (100% / 2) of the width.
If you remove this ambiguous information, iText will define the width of the columns based on the widths defined in the third row (which is what you expect).

Related

HTML signature leaves big spaces between rows (Gmail App from Outlook)

This is how it looks in GMail mobile app when sent from Outlook:
How can I avoid those big gaps?
My code is as follows:
<table id="sig" width='320' cellspacing='0' cellpadding='0' border-spacing='0' style="width:320px;margin:0;padding:0;">
<tr>
<td valign='top' width="120" height="48" style="width:120px;height:48px;margin:0;padding:0;vertical-align:top;">
<a style="border:none;text-decoration:none;">
<img moz-do-not-send="true" src="https://s3.amazonaws.com/media_crisalix/signatures/logo.jpg" alt="Crisalix" width='120' height='48' style="border:none;width:120px;height:48px;display:block;">
</a>
</td>
</tr>
<tr>
<td>
<table id="sig1" cellspacing='0' width='320' cellpadding='0' border-spacing='0' style="padding:0;margin:0;font-family:sans-serif,Arial,'Helvetica Neue',Helvetica;mso-line-height-rule:exactly;line-height:11px;color:#b0b0b0;border-collapse:collapse;-webkit-text-size-adjust:none;width:320px;">
<tr style="margin:0;padding:0;">
<td style="width:320px;margin:0;padding:0;font-family:sans-serif,Arial,'Helvetica Neue',Helvetica;white-space:nowrap;font-weight:600;line-height:1.6;font-size:13px;">
<span style="color:#137191">Jaime</span>
</td>
</tr>
<tr style="margin:0;padding:0;">
<td style="width:320px;margin:0;padding:0;font-family:sans-serif,Arial,'Helvetica Neue',Helvetica;white-space:nowrap;font-size:12px;line-height:1">
<span style="color:#555555">Chief Executive Officer</span>
</td>
</tr>
<tr>
<td valign='top' width="27" height='21' style="width:27px;height:1px;margin:0;padding:0;vertical-align:top;">
<img moz-do-not-send="true" src="https://s3.amazonaws.com/media_crisalix/signatures/separator.jpg" alt="Crisalix" width='27' height='21' style="border:none;width:27px;height:21px;display:block;">
</td>
</tr>
<tr style="margin:0;padding:0;">
<td style="width:320px;margin:0;padding:0;font-family:sans-serif,Arial,'Helvetica Neue',Helvetica;white-space:nowrap;font-size:12px;line-height:1.4;">
<div><span style="color:#137191;font-weight:bold">P / </span><span style="color:#555555;"></span></div>
<div><span style="color:#137191;font-weight:bold">A / </span><span style="color:#555555">Parc Scientifique (PSE-A) - EPFL1015</span></div>
<div> <span style="color:#555555"> Lausanne | Switzerland </span></div>
</td>
</tr>
<tr style="margin:0;padding:0;">
<td style="margin:0;padding:0;font-family:sans-serif,Arial,'Helvetica Neue',Helvetica;white-space:nowrap;font-weight:600;line-height:1.6;font-size:13px;width:320px;">
<span style="color:#137191;border:none;text-decoration:none!important;color:#137191;">www.crisalix.com</span>
</td>
</tr>
<tr>
<td valign='top' width="27" height='21' style="width:27px;height:1px;margin:0;padding:0;vertical-align:top;">
<img moz-do-not-send="true" src="https://s3.amazonaws.com/media_crisalix/signatures/separator.jpg" alt="Crisalix" width='27' height='21' style="border:none;width:27px;height:21px;display:block;">
</td>
</tr>
</td>
</table>
</tr>
<tr>
<td valign='top' width="230" height="225" style="width:230px;height:225px;margin:0;padding:0;vertical-align:top;">
<a href='http://www.crisalix.com' title="Crisalix" style="border:none;text-decoration:none;">
<img moz-do-not-send="true" src="https://s3.amazonaws.com/media_crisalix/signatures/signature-banner.jpg" alt="Crisalix" width='230' height='225' style="border:none;width:230px;height:225px;display:block;">
</a>
</td>
</tr>
</table>

How to create a two column email newsletter

I am trying to create a two column email flyer but I'm having trouble with the coding as Outlook hates CSS.
I'm using tables to keep it as simple as possible but I want two separate tables on the left and the right so I can add data into it as I wish.
I tried using float left and right on the two tables but Outlook ignores this style.
I know the two grey tables at the bottom are each in their own separate "holder" tables but this is so I can duplicate the grey "data" tables for when I add new articles.
<table class="all" width="auto" height="auto" border="0" cellspacing="0"><tr><td height="504">
<table width="750" height="140" border="0" cellspacing="0">
<tr>
<td width="200" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="345" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="152" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
<td width="45" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
</tr>
<tr>
<td width="200" valign="bottom" bgcolor="#E6E6E6"> </td>
<td align="center" valign="bottom" bgcolor="#E6E6E6"><font color="#111111" face="Arial Narrow" size="+2">DECEMBER NEWSLETTER</font></td>
<td width="152" align="center" valign="bottom" bgcolor="#E6E6E6"><font size="2"><strong>#4 - <span class="orange">04.12.13</span></strong></font></td>
<td width="45" align="center" valign="bottom" bgcolor="#E6E6E6"> </td>
</tr>
</table>
<table width="750" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="75" height="50" bgcolor="#E6E6E6" scope="row"> </td>
<td width="600" rowspan="2" scope="row"><img src="http://placehold.it/600x200"/></td>
<td width="75" bgcolor="#E6E6E6" scope="row"> </td>
</tr>
<tr>
<td width="75" height="81" scope="row"> </td>
<td scope="row"> </td>
</tr>
</table>
<table class="holder" width="750" border="0" cellspacing="0" cellpadding="0">
<tr>
<td valign="top" scope="row">
<table class="inlinetableleft" width="360">
<tr>
<td width="371" align="left">
<!------------LEFT COLUMN------------------>
<table width="360" border="0" cellspacing="0" cellpadding="0">
<tr>
<th height="103" colspan="4" align="left" valign="middle" bgcolor="#CCCCCC" scope="row"> </th>
</tr>
</table>
<!--------------LEFT COLUMN END------------->
</td>
</tr>
</table>
<table class="inlinetableright" width="360">
<tr>
<td align="left">
<!------------RIGHT COLUMN------------------>
<table width="360" border="0" cellspacing="0" cellpadding="0">
<tr>
<td height="106" align="left" bgcolor="#CCCCCC" scope="row"> </td>
</tr>
</table>
<!-----------RIGHT COLUMN END-------------->
</td></tr>
</table>
</td>
</tr>
</table>
Here is a fiddle of my newsletter so far, it's the bottom two grey tables that I want to be side by side.
Fiddle
For HTML emails, nested tables are your friend :)
JSFiddle
Note: the border around the table is just to show you where the tables are.
<table border="0" width="600" cellpadding="0" cellspacing="0" align="center">
<tr>
<td colspan="2">
header content here
</td>
</tr>
<tr>
<td width="300">
<table border="0" width="300" cellpadding="1" cellspacing="0" align="left">
<tr>
<td>Left Content</td>
</tr>
</table>
</td>
<td width="300">
<table border="0" width="300" cellpadding="1" cellspacing="0" align="left">
<tr>
<td>Right content</td>
</tr>
</table>
</td>
</tr>
</table>

prototype : update tbody content dynamically

I have below HTML, i am trying to update the content of tbody dynamically using ajax. I have response html all i want is to update the tbody content using prototype. Till now i have tried $('table-body').innerHTML="html content here";
<tbody class="table-body">
<tr>
<td id="11" class="consumables model" width="15%">Aficion SP 20022</td>
<td id="12" class="consumables type" width="15%">Print Cartridge</td>
<td class="consumables" width="15%">Black </td>
<td class="consumables" width="15%">15000 </td>
<td class="consumables" width="15%">
<td class="consumables" width="25%">
</tr>
<tr>
<td id="10" class="consumables model" width="15%">Aficion SP 2002</td>
<td id="12" class="consumables type" width="15%">Print Cartridge</td>
<td class="consumables" width="15%">Black </td>
<td class="consumables" width="15%">15000 </td>
<td class="consumables" width="15%">
<td class="consumables" width="25%">
</tr>
<tr>
<td id="2" class="consumables model" width="15%">Aficion SP C242SF</td>
<td id="14" class="consumables type" width="15%">Print cartridge SP 4100</td>
<td class="consumables" width="15%">Magenta </td>
<td class="consumables" width="15%">50000 </td>
<td class="consumables" width="15%">
<td class="consumables" width="25%">
</tr>
</tbody>
There are 2 ways to solve this
change the class to an id on the <tbody> tag and then do $('table-body').update("html content here")
use the class to select the first element matching that selector $$('.table-body').first().update("html content here")

How do I retrieve multiple row node data from an html table in XPATH?

Sometime during the dark ages a script was built that outputs the following html..
...
<TABLE BORDER=0 FRAME=ALL_FRAMES RULES=ALL_RULES ALIGN=CENTER BGCOLOR="ffffe5">
<CAPTION ALIGN=TOP>
<FONT COLOR=009594 SIZE=-1><B>Access Information</B></FONT>
</CAPTION>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Access Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT 111**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Other Circuit(s):</B></FONT>
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT AAA**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT BBB**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
**DATA TO COLLECT CCC**
</TD>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
&nbsp
</TD>
<TD ALIGN=LEFT VALIGN=MIDDLE>
&nbsp
</TD>
</TR>
<TR>
<TD ALIGN=RIGHT VALIGN=MIDDLE>
<FONT COLOR=black SIZE=-1><B>Customer:</B></FONT>
</TD>
...
Sorry, I would show you the table layout but I don't know how without <table> on SO
How can I use XPATH (in PHP) to collect only each DATA TO COLLECT section? So far I've been able to retrieve the first row with //*[*='Access Circuit(s):']/following-sibling::td[1].
Things to note:
This is only a small section of a large document.
I cannot change the scripts output.
I wont know how many rows there will be (figure 0 to 6).
The data should be expected to always be in the same "column".
I may only have XPATH version 1. But version 2 answers are still welcomed.
The expression I came up with is this:
//TR[(.//B[.='Access Circuit(s):']) or ((./preceding-sibling::TR//B[.='Access Circuit(s):']) and (./following-sibling::TR//B[.='Customer:']))]//TD[2]
returns
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT 111**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT AAA**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT BBB**</TD>
<TD ALIGN="LEFT" VALIGN="MIDDLE">**DATA TO COLLECT CCC**</TD>
It uses the knowledge that the first row contains Access Circuit(s): and the first uncollected row contains Customer:. If you can't be sure of either one of those, then I think it can't be done with a single XPath expression.
Step-by-step
1. //TR[
2. (.//B[.="Access Circuit(s):"])
3. or ( (./preceding-sibling::TR//B[.="Access Circuit(s):"])
4. and (./following-sibling::TR//B[.="Customer:"]) )
5. ]//TD[2]
Means
1. all TR nodes
2. that either contain "Access Circuit(s):"
3. or
- (3.) are positioned after "Access Circuit(s):"
- (4.) and are positioned before "Customer:"
5. all TD nodes that are the second TD of their parents

Need query for XPath that finds all <tr> elements that contain 7 <td> elements

Hello and hopefully thanks for the help.
Honestly I am not very experienced at XPath and I am hoping a guru out there will have a quick answer for me.
I am scraping a web page for data. The defining aspect of the data I want is that it is contained in a row <tr> that has 7 <td> elements. Each <td> element has one of the pieces of data I need to import. I am using the HTML Agility Pack on CodePlex to grab the data, but I can't seem to figure out how to define the query.
Contained in the web page is a section like this:
<table border="0" cellpadding="3" cellspacing="1" width="100%">
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td class="dataHdrText02" valign="top" width="50" align="center"><nobr>SYMBOL</nobr></td>
<td class="dataHdrText02" valign="top" align="center">PERIOD</td>
<td class="dataHdrText02" valign="top" align="center" width="*">EVENT TITLE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ESTIMATE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center">PREV. YEAR ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center"><nobr>DATE/TIME (ET)</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO </nobr></td>
<td align="center">Q4 2011</td>
<td align="left" width="*">Q4 2011 CISCO Systems Inc Earnings Release</td>
<td align="center">$ 0.38 </td>
<td align="center">n/a </td>
<td align="center">$ 0.43 </td>
<td align="center"><nobr>10-Aug-11</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO  </nobr></td>
<td align="center">Q3 2011</td>
<td align="left" width="*">Q3 2011 Cisco Systems Earnings Release</td>
<td align="center">$ 0.37 </td>
<td align="center">$ 0.42 </td>
<td align="center">$ 0.42 </td>
<td align="center"><nobr>11-May-11 AMC</nobr></td>
</tr>
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td align="center" colspan="7"><img src="/format/cb/images/spacer.gif" width="1" height="4"></td>
</tr>
</table>
My goal is to grab the earnings event data and place it into a database for analysis. My original thought was to grab all <tr> elements with 7 <td> elements then work with that data. Any advice or alternative suggestions would be welcome.
This should do it for you.
//tr[count(td)=7]

Resources