pandoc: convert HTML table to DOCX - pandoc

I have a very simplistic HTML document with a table:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>Analysis</title>
</head>
<body>
<TABLE border=1>
<TR> <TD> 18.365 </TD> <TD> 1 </TD> </TR>
<TR> <TD> 23.465 </TD> <TD> 1 </TD> </TR>
<TR> <TD> 26.020 </TD> <TD> 1 </TD> </TR>
<TR> <TD> 14.371 </TD> <TD> 1 </TD> </TR>
<TR> <TD> 17.258 </TD> <TD> 1 </TD> </TR>
</TABLE>
</body>
</html>
and I would like to create a DOCX file from it using pandoc. In the result, however, the table is completely messed up. Can anyone please help me with a working example? It is the last step in a complex workflow I have and I assume that a table should be possible.
Pandoc version:1.12.4.2

It's a regression that has already been fixed in the development version
(https://github.com/jgm/pandoc/issues/1341). You can install the development
version from source or revert to a package for 1.12.3.3. This will be fixed in the next pandoc release.

Related

Outlook can't parser multiple tables from html with pywin32

I try to convert the html file to msg file, it stopped convert when meet third table tag in html.
I searched this question but didn't get any result -- it seems like only myself meet this problem.
So this is the example html code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>This Email is using for understand outlook mail synthesis</h1>
<h2>0001</h2>
<table>
<tr>
<td>This is image001.jpg</td>
</tr>
<tr>
<td>
// stop parsing after parse this table
<table>
<tr>
<td>
<img src="cid:image001.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image002.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image002.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image003.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image003.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image004.jpg</td>
</tr>
<tr>
<table>
<tr>
<td>
<img src="cid:image004.png" alt="">
</td>
</tr>
</table>
</tr>
</table>
<h1>No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</h1>
</body>
</html>
And this is my python code:
from win32com import client as win32
import os
outlook = win32.Dispatch("outlook.application")
mail = outlook.CreateItem(0)
mail.Subject = "This is a subject"
with open(".\\mix.html", "r", encoding="utf-8") as f:
html = f.read()
mail.HtmlBody = html
current_path = os.getcwd()
at = mail.Attachments.Add(current_path + "\\image001.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image001.jpg")
at = mail.Attachments.Add(current_path + "\\image002.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image002.jpg")
at = mail.Attachments.Add(current_path + "\\image003.jpg")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image003.jpg")
at = mail.Attachments.Add(current_path + "\\image004.png")
at.PropertyAccessor.SetProperty("http://schemas.microsoft.com/mapi/proptag/0x3712001F", "image004.png")
mail.SaveAs(current_path + "\\rst.msg")
This is what I see when I open the "rst.msg" file:
stop parsing after parse the table
I deleted the table in second tr tag and run the python script, this is what I get:
stop parsing again after parse the table
This is the html code I deleted the table in second tr tag:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<h1>This Email is using for understand outlook mail synthesis</h1>
<h2>0001</h2>
<table>
<tr>
<td>This is image001.jpg</td>
</tr>
<tr>
<td>
</td>
</tr>
<tr>
<td>This is image002.jpg</td>
</tr>
<tr>
<td>
// stop parsing after parse this table
<table>
<tr>
<td>
<img src="cid:image002.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image003.jpg</td>
</tr>
<tr>
<td>
<table>
<tr>
<td>
<img src="cid:image003.jpg" alt="">
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>This is image004.jpg</td>
</tr>
<tr>
<table>
<tr>
<td>
<img src="cid:image004.png" alt="">
</td>
</tr>
</table>
</tr>
</table>
<h1>No!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</h1>
</body>
</html>
Hope you can help me! Thank you very much!
This question was solved by upgrading outlook version. In the new version, outlook can generate email as expected, but if you add some picture to the email used for html generating and save this email, then pictures are added to the email as appendix. If you send the email after generate it and before save it, the receiver can't see those appendix. That's what you might want.
Upgrading the outlook version is a way to solve this question.

Laravel email verification sent with HTML tags

Im trying to send an email but when the email is received it with its html tags
<tr> <td class="header"> TikTak </td> </tr> <tr> <td> <table
class="footer" align="center" width="570" cellpadding="0" cellspacing="0"> <tr> <td class="content-
cell" align="center"> <p>é 2019 êÃÂçÃÂàíÃÂÃÂÃÂ
èñçàRabter ÃÂíÃÂÃÂø ÃÂàèçôï</p> </td> </tr>
</table> </td> </tr>
I do have \Blade::setEchoFormat('e(utf8_encode(%s))'); in my Appserviceprovider and also changed {{}} in markdown folder to {!! !!} and also in the html folder but unfortunately it did NOT fix it
It was working properly and suddently its output turned into this mess
Thanks for anyhelp
You need to put <meta charset="utf-8"> tage in you email template.

Get a cell that is in a table before the current table

See html below. Have a series of tables that include rows with a name attribute name="laneStop". I can select those rows like this in the Chrome dev console
$x("/html[1]/body[1]//TR[#name='laneStop']")
However, I also need to get the 2nd cell of the 2nd row of the 1st table ABOVE these rows, eg. the value
abc_123_florida-45
Here is the html. Whats a way to refer to this value above - knowing that Im getting the "laneStop" rows first
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
Try the following xpath.
//td[text()='shipment number']/following::td[1]
Demo:
If you want to travel from your current node (i.e., the "laneStop" rows), one way to do that is to use this xpath expression:
./preceding-sibling::*/ancestor::*[6]/preceding-sibling::table[1]//tr[1]/td[1]/table[1]//td[1]//tr[2]/td[2]
I'm curious to see if it works for you.

How to get the table immediately previous to current table row

Say I get a list of rows like this
var table_stop_rows = (from r in doc.Descendants("TR").Cast<HtmlNode>()
where r.Attributes["name"]?.Value == "laneStop"
select r).ToList();
Now, for each of those "laneStop" rows, I want to refer back to the smaller table containing the "shipment_number" field and read its corresponding node value, eg "abc_123_florida-4". However, I cant simply get a list of all rows where there is a shipment_number, each one has to be in a table that precedes the "laneStop" row in the row collection I'm getting.
I suppose my question then is - if I have a collection of rows, can I then use an xpath statement relative to each row to get back to this shipment_number field in the table preceding?
Here is the html doc, note there would be dozens of these "table pairs". Since I can't control the structure of these files, I need a way to extract the data from the existing structure
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<title></title>
</head>
<body>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>
<table>
<tbody>
<tr>
<td>Date</td>
<td>11/15/2019</td>
</tr>
<tr>
<td>shipment number</td>
<td>abc_123_florida-45</td>
</tr>
<tr>
<td>Departure time:</td>
<td>0430</td>
</tr>
</tbody>
</table>
</td>
<td>
<table>
<tbody>
<tr>
<td>Time arrival</td>
<td>1715</td>
</tr>
<tr>
<td>customer</td>
<td>bob smith</td>
</tr>
<tr>
<td>box type</td>
<td>square</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table border="1">
<tbody>
<tr>
<td>
<table>
<tbody>
<tr name="laneStop">
<td>box1</td>
<td>23.45</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>17.14</td>
<td>lane1</td>
<td>south</td>
</tr>
<tr name="laneStop">
<td>box3</td>
<td>17.18</td>
<td>lane1</td>
<td>north</td>
</tr>
<tr name="laneStop">
<td>box2</td>
<td>199.14</td>
<td>lane1</td>
<td>west</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</body>
</html>
Try this xpath expression:
(//tr[#name="laneStop"]/ancestor::table/preceding-sibling::table//tr[2]/td[2])[1]

Ruby list files in remote http server

I have a files listing page on a remote server, say http://myserver.com/uploads. How can I get the list of files using Ruby, preferably with net-http only?
This is the HTML code of the page:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!-- saved from url=(0025)http://myserver.com/uploads/ -->
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Index of /uploads</title>
</head>
<body>
<h1>Index of /uploads</h1>
<table>
<tbody>
<tr>
<th><img src="./Index of uploads_files/blank.gif" alt="[ICO]"></th>
<th>Name</th>
<th>Last modified</th>
<th>Size</th>
<th>Description</th></tr><tr><th colspan="5"><hr></th>
</tr>
<tr>
<td valign="top"><img src="./Index of uploads_files/back.gif" alt="[DIR]"></td>
<td>Parent Directory</td>
<td> </td>
<td align="right"> - </td>
<td> </td>
</tr>
<tr>
<td valign="top"><img src="./Index of uploads_files/compressed.gif" alt="[ ]"></td>
<td>Backup_201305281256.tar.gz</td>
<td align="right">28-May-2013 18:00 </td>
<td align="right"> 13M</td><td> </td>
</tr>
<tr><th colspan="5"><hr></th></tr>
</tbody>
</table>
<address>Apache/2.2.22 (Ubuntu) Server at myserver.com Port 80</address>
</body>
</html>
What you see is an HTML page with link to files generated by the HTTP server.
You'll need to parse this HTML to get list the files or you use a regex to match the URI's.
Take a look at the URI regex.

Resources