How can I have Sphinx tables fit to width? - python-sphinx

Consider this table, where pyeval is a macro that evaluates an expression and replaces it with its value (so I can avoid hardcoding values in the documentation):
======================= ===========================================
Subsytem Default path
======================= ===========================================
:pyeval:`constants.FOO` :pyeval:`pathutils.DEFAULT_FOO_STORAGE_DIR`
:pyeval:`constants.BAR` :pyeval:`pathutils.DEFAULT_BAR_STORAGE_DIR`
:pyeval:`constants.BAZ` :pyeval:`pathutils.DEFAULT_BAZ_STORAGE_DIR`
======================= ===========================================
This renders with this HTML:
<table border="1" class="docutils">
<colgroup>
<col width="40%">
<col width="60%">
</colgroup>
<thead valign="bottom">
<tr class="row-odd">
<th class="head">Subsystem</th>
<th class="head">Default storage path</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even">
<td><tt class="docutils literal"><span class="pre">foo</span></tt></td>
<td><tt class="docutils literal"><span class="pre">/srv/badp/foo-path/</span></tt></td>
</tr>
<tr class="row-odd">
<td><tt class="docutils literal"><span class="pre">bar</span></tt></td>
<td><tt class="docutils literal"><span class="pre">/srv/badp/bar-path/</span></tt></td>
</tr>
<tr class="row-even"><td><tt class="docutils literal">
<span class="pre">baz</span></tt></td>
<td><tt class="docutils literal"><span class="pre">/var/run/badp/baz-path/</span></tt></td>
</tr>
</tbody>
</table>
Because of the macro, the amount of width I have to give to the Subsytem column is only slightly smaller than the column Default path gets, but the contents of its column are much shorter. Since Sphinx tries to be "helpful", it tries to transfer the ratio of widths in the source file to the HTML page (notice the colgroup tag) and the result is quite uneven:
Notice that Chrome (just like Firefox does) "helpfully" breaks at the hyphenation point and, since this is a path, I don't get to change hyphens to non breaking hyphens; people are just too likely to copy paste these values.
If I remove the colgroup element, however, I get the table I want.
How can I tell Sphinx to please be less smart with my table?

I too have run into this problem. Reading the docutils source, it appears that the colgroup widths are calculated using the number of dashes for the column in the separator lines for grid tables and the number of characters in the longest column entry for the column in simple tables as used here.
An attempt to write a custom directive to generate a table without a colgroup ran into what appears to be a bug in docutils in that later processing of the generated elements expects a colgroup to be present.
One technique I have used is to use aliases to create data items that are closer in length to their real text. For example:
.. |FOO| replace:: :pyeval:`constants.FOO`
which helps but isn't perfect.
An experiment disabling the colgroup element using the following css
colgroup { display: none; }
worked perfectly on FireFox but hid the enter table in IE9 so clearly this isn't an acceptable solution either.

What seems to work (at least in Firefox) is to reset the col widths:
table.docutils col {
width: auto;
}

Related

How to get the count number under tbody, use xpath in UFT

I am using UFT to write the automation script, I want to to get the count number under one tbody element,(count how many 'tr', and return the result)
<tbody role="rowgroup">
<tr _calss="01">
<tr _calss="02">
<tr _calss="03">
<tr _calss="04">
<tr _calss="05">
</tbody>
How to write the code in UFT?
Since it is unclear if there can be more than one tbody you could use this for the first one.
count((//tbody)[1]/tr)

correct way to scrape this table (using scrapy / xpath)

Given a table (unknown number of <tr> but always three <td>, and sometimes containing a strikethrough (<s>) of the first element which should be captured as additional item (with value 0 or 1))
<table id="my_id">
<tr>
<td>A1</td>
<td>A2</td>
<td>A3</td>
</tr>
<tr>
<td><s>B1</s></td>
<td>B2</td>
<td>B3</td>
</tr>
...
</table>
Where scraping should yield [[A1,A2,A3,0],[B1,B2,B3,1], ...], I currently try along those lines:
my_xpath = response.xpath("//table[#id='my_id']")
for my_cell in my_xpath.xpath(".//tr"):
print('record 0:', my_cell.xpath(".//td")[0])
print('record 1:', my_cell.xpath(".//td")[1])
print('record 2:', my_cell.xpath(".//td")[2])
And in principle it works (e.g. by adding a pipeline after add_xpath()), just I am sure there is a more natural and elegant way to do this.
Try contains :
my_xpath = response.xpath("//table[contains(#id, 'my_id')]").getall()

XPath to select specific text inside of text block

I am trying to figure out a way to pull specific values out of a big long text block.
So far I have //td[#class="PadLeft10"] which returns me a big long value starting with the company name and ending with the "View More Info" piece.
I am trying to break my results up into segments, so for example I want my code to look for the words "Primary Contact:" and then return the text that follows that, ending at the <br/>.
I need to get the Company Name, which is always the first bit of text, then the Primary Contact, then the Address, then the Phone and Fax, then the Website, and the Organization type.
The problem is that not every record has all the values. As you can see, the second entry has the address and website, but the first one doesn't.
I am using the Dataminer Chrome Plugin, for anyone familiar with that. It has separate xpath for rows and columns, so I am going to try to make a bunch of different columns that correspond to each of the fields that I am looking for.
Any direction would be greatly appreciated.
<td align="left" valign="top" width="2%">
<script>
if (0 == 1) document.write('<img src="https://website.com" border="0" alt=""/>');
</script>
<br/><br/></td>
<td class="PadLeft10" align="left" valign="top" width="32%" style="padding-left: 15px;">
<span style="font-weight: bold;font-size: 12pt;"><br/>Company Name Here</span><br/>Primary Contact: Mr. Eric Cartman <br/>Phone: (555) 555-5555<br/>Fax: (333) 333-3333<span style="text-decoration: underline;color: #0000ff"></span><br/>Organization Type: Distributor Branch
<br/>
» View More Info<br/>
<br/>
</td>
<td align="left" valign="top" width="2%">
<script>
if (0 == 1) document.write('<img src="https://website.com" border="0" alt=""/>');
</script>
<br/><br/></td>
<td class="PadLeft10" align="left" valign="top" width="32%" style="padding-left: 15px;">
<span style="font-weight: bold;font-size: 12pt;"><br/>Other Company</span><br/>Primary Contact: Mr. Jimmy Valmer<br/>100 N Ohio St 2rd Fl<br/>Rochester, IN 54225<br/>United States<br/>Phone: (888) 888-8888<br/>Fax: (999) 999-9999<span style="text-decoration: underline;color: #0000ff"><br/>Web Site: http://www.companywebsite.com</span><br/>Organization Type: Financial Service
<br/>
» View More Info<br/>
<br/>
</td>
</tr>
<tr>
I am new to xpath, but the least i can say: if you are the creator of the html code, you absolutely need to change it to be more structured
like : Primary Contact:<span id/class='primaryContact'>..</span>
Or else, you can get the elements by this selector (to edit) //td[#class="PadLeft10"]//child::span//following-sibling::text()[1] split by ':' and then proceed, but this solution stay just a diy.
Any direction would be greatly appreciated.
As far as a direction, the sections within table cell that you mention are neither nested DOM items, nor sibling-type DOM nodes. Those are sequential html elements that require special processing.
<br/>Company Name Here</span>
<br/>Primary Contact: Mr. Eric Cartman
<br/>Phone: (555) 555-5555
<br/>...
Both xpath and regex can be leveraged for such a case.
You can select the text node you're looking for using a predicate and the contains function:
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")]
Then you can get the substring using the substring-after function:
substring-after(
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")],
'Primary Contact:'
)
And remove leading and trailing whitespace using normalize-space:
normalize-space(
substring-after(
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")],
'Primary Contact:'
)
)

XPath matching text in a table - Ruby - Nokigiri

I have a table that looks like this
<table cellpadding="1" cellspacing="0" width="100%" border="0">
<tr>
<td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td>
</tr>
<tr style="background-color: #D8E4F6;vertical-align: top;">
<td nowrap="nowrap"><b>Bill Date</b></td>
<td nowrap="nowrap"><b>Bill Amount</b></td>
<td nowrap="nowrap"><b>Bill Due Date</b></td>
<td nowrap="nowrap"><b>Bill (PDF)</b></td>
</tr>
</table>
I am trying to create the XPATH to find this table where it contains the test Bill Statement Detail. I want the entire table and not just the td.
Here is what I have tried so far:
page.parser.xpath('//table[contains(text(),"Bill")]')
page.parser.xpath('//table/tbody/tr[contains(text(),"Bill Statement Detail")]')
Any Help is appreciated
Thanks!
Your first XPath example is the closest in that you're selecting table. The second example, if it ever matched, would select tr—this one will not work mainly because, according to your example, the text you want is in a b node, not a tr node.
This solution is as vague as I could make it, because of *. If the target text will always be under b, change it to descendant::b:
//table[contains(descendant::*, 'Bill Statement Detail')]
This is as specific, given the example, as I can make:
//table[tr[1]/td/b['Bill Statement Detail']]
You might want
//table[contains(descendant::text(),"Bill Statement Detail")]
The suggested codes don't work well if the match word is not in the first row. See the related post Find a table containing specific text

<td> does not display full contents (Mozilla Firefox)

The code goes like this
<div id='blogbook'></div>
...
<script>
...
var z="<table>
<td>Blog title and date<br><hr></td>
<tr>
<td>A very long string consisting of many paragraphs, say, a blog</td></table>";
function disp(){
document.getElementById('blogbook').innerHTML=z;
}
disp();
</script>
The display comes out like this..
Blog title and date
A very long string consisting of
...(many many lines)...
many paragraphs, sa
The whole of the blog does not display, instead stops long before the actual end of the blog. Questions:
Why does this happen?
How does one solve this?
This problem occurs in Firefox(I'm using v7 but IE displays it just fine, that is, the complete blog)
Your HTML markup is incorrect.
var z="<table>
<td>Blog title and date<br><hr></td>
<tr>
<td>A very long string consisting of many paragraphs, say, a blog</td></table>";
That code is this:
<table>
<td>Blog title and date<br><hr></td>
<tr>
<td>A very long string consisting of many paragraphs, say, a blog</td>
</table>
It should be:
<table>
<tr>
<td>Blog title and date<br><hr></td>
</tr>
<tr>
<td>A very long string consisting of many paragraphs, say, a blog</td>
</tr>
</table>
whats going on with this line <div id='blogbook'></td>? You need to close the div. its not semantically correct and may cause the browser to display incorrectly e.g
<div id='blogbook'></div></td>
Plus your not closing the table above or your not opening a new td if your nesting tables

Resources