Nokogiri without DIV - ruby

I am trying to scrape information from an HTML table. There are multiple tables on the page. Before each table there is a paragraph with text. I want to key off of this text field ("CONSOLIDATED" text in the pastebin below) to identify the table since there are no DIV tags on the page and therefore no other way to uniquely identify the table. How would I do this? What XPath statement would I use? Here's a link to the HTML page: http://pastebin.com/HeapZvPV
Thanks!

Related

Sejda HTML to PDF : how to keep thead on all pages where a table is displayed?

I'm using SEJDA API with PHP to convert HTML to PDF.
In this HTML, I have multiple tables with a variable number of rows, and each of them have a variable height, with a variable number of columns, etc.
Sometimes one (or more) table is long enough to be displayed on more than one page, so I can have a page with only rows of tbody but no thead "labels" to tell what each columns can represents.
Is there a proper way to force the thead to be shown on each page where the table is displayed ? The "css position:fixed" solution is not a good way because there is potentially more than one table and not all pages have a table on it.
Add this to your print styles:
thead {display: table-header-group;}
If your still having issues, try https://docamatic.com. There is a dynamic table template (JSON to PDF).

download data from website using uipath RPA

I have automated the login and getting to the downloading page where i have some pdf's which i want to download. These pdf's are dynamic ,sometimes there are 10 sometimes 100 ,it changes everyday.i want to download those pdf's .
please find the attached image
Here i want to download the pdf by clicking each elements in column 3(hyperlink highlighted in blue colour) ,the number of rows in the table is dynamic.how can i do it using UIPATH.
From the top of my head, without knowing the application you are working in I see a few different approaches you can try:
Approach 1: Extract table as Data Table
Perhaps you can extract the table as a Data Table, enumerate the rows and find the individual link selectors you then can pass to a click activity.
Approach 2: Dynamically manipulating the selector
Use UIExplorer to find the selector of the link in the third column. Typically the attribute idx is the unique identifier. You can construct your own variable idx and in a while loop increment this variable while passing it to a click-activitys selector: "<your normal selector here someAttr="something" idx="+idx.ToString+"/>
This way, when the click fails with selector not found you will be at the last row of the column and you can exist the while loop.
Approach 3: Using Find Children
Another approach is to use the Find children activity on the column or table to get the children, i.e. the rows. You need to know which filter to use, it is basically the selector.
Find children outputs a IEnumerable<UIElement> you can iterate and pass to a click activity
The shared image is a perfect case of scraping a table from Web page which can be done through UiPath's Data Scraping Wizard, refer this tutorial. This will convert your html table into DataTable. This Data Scraping Wizard will take care of dynamin number of rows as well as the pagination (if exist).
Later, you've to iterate the DataTable (ForEach activity) and hit the link to download PDF files.

Display HTML content inside a table cell - BIRT report designer

I have a dataset in which one of the column has html tags. When I try to bind the data column with a cell inside a table, the data is being displayed as it is - I see the html tags like < br >,< br > in the cell. Is there a way I can get rid of the tags and display the data with proper formatting?
Yes.
But a data item does not work for this.
You'll have to use a text item, and inside the text item, reference the data as row["MY_COLUMN"] (you know, what I mean). It is important to switch the text item's Content Type from the default Plain to HTML.

Using schema.org microdata for Database Content

I was wondering if one can mark text fields with schema.org microdata for fields which will contain values retrieved from a database, but which are initially empty upon loading the web page. Basically, I have some fields that I would like to mark using microdata, which will not contain any data until the values are retrieved from a database, the population of which would be initiated by users.
The microdata format is about making data machine-readable.
You 'could' mark up the text fields but I'm not sure why you'd want to.
A search engine will only see the empty field which it won't be filling it in, so no help there.
It's possible that a browser could do something with that data after the user has filled it in, but I can't imagine what.
If your form is saved and presented on another page, then that page is a perfect candidate for microdata markup.

Sorting Dynamically generated HTML Table

I am generating a dynamic HTML table string and displaying inside a div. I am assigning ID and runat server in that string. eg.
string s="<table id='tblAll' runat='server'></table>".
This string is generated on some different page, and its passed as XmlhttpResponseText
I want to apply sorting on this table, It would be great if any one can help me out.
Thanks
You can use jQuery tablesorter plugin
Use the jQuery tablesorter plugin, and instead of selecting the table via it's Id, you can select it via the containing div and a child selector:
$("div#container table").tablesorter();

Resources