Using XPATH to select the row *after* a row containing certain text - xpath

I can't work out if this is possible or not, I've got a basic table but that table has a varying number of rows and data within it.
Assuming the table is just one column wide and a random number of rows long to select a row containing the text "COW" I can do something very simple like do: -
table/tbody/tr[contains(td[1],"COW")]/td[1]
But lets say that this table contains two types of data in it, a list of animals and, underneath each animal, a list of attributes, all in the same column, looking something like this: -
COW
Horns = 2
Hooves = 4
Tail = 1
CHICKEN
Horns = 0
Hooves = 0
Tail = 1
Is there a way using XPATH to first identify the row that contains the text COW and then select the row directly after this to return the text "Horns = 2"?
Cheers

It seems that you want something like this:
table/tbody/tr[contains(td[1],"COW")]/following-sibling::tr[1]/td[1]
This will select the first td in the row immediately following the row which contains the td which contains COW.

Related

Remove all columns to the right of a specific column

I have an Excel template file with a dynamic number of columns that represent work week dates. Some users have decided to add their own subtotal columns to the right of those columns. I need a way to identify the first blank column, and then truncate that column and all columns following it.
I had previously been using the following script to remove all columns that begin with the word "Column":
// Create a list of columns that start with "Column" and remove them.
Removed_ColumnNum_Columns = Table.RemoveColumns(PreviousStepName, List.Select(Table.ColumnNames(PreviousStepName), each Text.StartsWith(_, "Column") )),
Based on being able to find the first ColumnXX column, I want to remove it and all columns after it
You can use List.PositionOf to get your ColumnIndex instead of parsing text.
I'd put it together like this:
// [...]
ColumnList = Table.ColumnNames(#"Promoted Headers"),
ColumnXX = List.Select(ColumnList, each Text.StartsWith(_, "Column")){0},
ColumnIndex = List.PositionOf(ColumnList, ColumnXX),
ColumnsToKeep = List.FirstN(ColumnList, ColumnIndex),
FinalTable = Table.SelectColumns(#"Promoted Headers", ColumnsToKeep)
Remove Columns after ColumnXX
Find the first column that begins with the name "Column" and delete that column and all columns following it. This parses the XX as the column index so you need to make sure you haven't deleted columns prior to this step. i.e. "Column35" needs to be the 35th column at this step in the code.
// Find the first ColumnXX column and remove it and all columns to the right.
ColumnXX = List.Select(Table.ColumnNames(#"Promoted Headers"), each Text.StartsWith(_, "Column")){0},
ColumnIndex = Number.FromText(Text.Middle(ColumnXX, 6,4)),
ColumnListToRemove = List.Range(Table.ColumnNames(#"Promoted Headers"),ColumnIndex-1),
RemovedTrailingColumns = Table.RemoveColumns(#"Promoted Headers", ColumnListToRemove),
To make this more robust I would prefer to have a way to identify the column index of columnXX without parsing the digits from it.

SQL Statement to delete only one row out of duplicates

So I am working in Ruby, and say I have 6 rows in a table of two columns that are exactly identical. In my case, my table "campaign_items" has two columns "campaign_name" and "item." I would like to delete only one row out of the 6 duplicates using a single query. I started with this:
db.exec("DELETE FROM products WHERE campaign_name = '#{camp_name}' AND product_type = 'fleecejacket' AND size = '#{size_array[index]}'")
Which of course deleted all items of that condition. So I found in another question an answer along these lines:
db.exec("DELETE FROM products a WHERE a.ctid <> (SELECT min(b.ctid) FROM products b WHERE a.key = b.key)")
However, this would delete all duplicates except for one. I have not found a way that only deletes a SINGLE row that has duplicates. Is there a delete top query that I am looking for? Thanks in advance.
Edit: I also have a column "id" which is a primary key.
So I definitely overthought this, but all that is needed is this:
x = db.exec("SELECT * FROM campaign_items WHERE campaign_name = '#{camp_name}' AND item = 'fleecejacket'")
id = x[0]['id']
db.exec("DELETE FROM campaign_items WHERE campaign_name = '#{camp_name}' AND item = 'fleecejacket' AND id = '#{id}'")
Get the unique id from the first duplicate (since it doesn't matter which one is deleted) and delete the row with that id.

Using XPath to find rows where a specific column has value

I'm having trouble using XPath to find a row in a table where a specific column contains a value. The table has 10 columns where 2 of them will show Yes|No but I'm only interested in finding the value in one of the columns (the 4th one). My initial attempt was this:
//table[#id='myTable']/tbody/tr/td[text() = 'Yes']
but it finds it rows from both columns. I thought I could try something like this but it's not a valid expression:
//table[#id='myTable']/tbody/tr/td[4]/text()='Yes'
Any suggestions? Thanks.
You can try this way :
//table[#id='myTable']/tbody/tr[td[4][. = 'Yes']]
The XPath return row (tr) having the forth td child value equals "Yes".

How can I more efficiently find the height of a table using Python

I am using openpyxl to copy data from an Excel spreadsheet. The data is a table for an inventory database, where each row is an entry in the database. I read the table one row at a time using a for loop. In order to determine the range of the for loop, I wrote a function that examines each cell in the table to find the height of the table.
Code:
def find_max(self, sheet, row, column):
max_row = 0
cell_top = sheet.cell(row = row - 1, column = column)
while cell_top.value != None:
cell = sheet.cell(row = row, column = column)
max = 0
while cell.value != None or sheet.cell(row = row + 1, column = column).value != None:
row += 1
max = max + 1
cell = sheet.cell(row = row, column = column)
if max > max_row:
max_row = max
cell_top = sheet.cell(row = row, column = column + 1)
return max_row
To summarize the function, I move to the next column in the worksheet and then iterate through every cell in that sheet, keeping track of its height until there are no more columns. The catch about this function is that it has to find two empty cells in a row in order to fail the condition. In a previous version I used a similar approach, but only used one column and stopped as soon as I found a blank cell. I had to change it so the program would still run if the user forgot to fill out a column. This function works okay for a small table, but on a table with several hundred entries this makes the program run much slower.
My question is this: What can I do to make this more efficient? I know nesting a while loop like that makes a program take longer but I do not see how to get around it. I have to make the program as foolproof as possible, so I need to check more than one column to stop user errors from failing the program
This is untested, but every time I've used openpyxl, I iterate over all rows like so:
for row in active_worksheet:
do_something_to(row)
so you could count like:
count = 0
for row in active_worksheet:
count += 1
EDIT: This is a better solution: Is it possible to get an Excel document's row count without loading the entire document into memory?
Read-only mode works row-by-row on the source so you probably want to hook it into it. Alternatively, you could pass the cells of the of a worksheet into something like a Pandas matrix which has indices for empty cells.

Tables got over-written

I want to loop thru a dbf and create word table for each record meeting the condition, and I got a one-page report with only the last rec in a single table. Look like all records are written to the same table. I tried to use n = n + 1 to place the variable as an element to the table
oTable = oDoc.tables[n]
But seems it only support numerical rather than variable ?
You have to add each table as you go, making sure to leave space in between them (because Word likes to combine tables).
You'll need something like this inside your loop:
* Assumes you start with oDoc pointing to the document,
* oRange set to an empty range at the beginning of the area where you want to add the tables,
* and that nRows and nCols give you the size of the table.
oTable = oDoc.Tables.Add(m.oRange, m.nRows, m.nCols)
oRange = oTable.Range()
oRange.Collapse(0)
oRange.InsertParagraphAfter()
oRange.Collapse(0)
After this code, you can use oTable to add the data you want to add. Then, on the next time through the loop, you're ready to add another table below the one you just filled.

Resources