Handle StaleElement exception - ruby

I have a table in which data can be refreshed by selecting some filter checkboxes. One or more checkboxes can be selected and after each is selected a spinner is displayed on the page. Subsequent filters can only be selected once the previous selection has refreshed the table. The issue I am facing is that I keep getting StaleElementException intermittently.
This is what I do in capybara -
visit('/table-page') # table with default values is displayed
# select all filters one by one. Wait for spinner to disappear after each selection
filters.each {|filter| check(filter); has_no_css?('.loading-overlay', wait: 15)}
# get table data as array of arrays. Added *minimum* so it waits for table
all('tbody tr', minimum: 1).map { |row| row.all('th,td').map(&:text) }
I am struggling to understand why am I seeing StaleElementException. AFAIK Capybara uses synchronize to reload node when using text method on a given node. It also happens that sometimes the table data returns stale data(i.e the one before the last filter update)

The use of all or first disables reloading of any elements returned (If you use find the element is reloadable since the query used to locate the element is fully known). This means that if the page changes at all during the time the last line of your code is running you'll end up with the StaleElement errors. This is possible in your code because has_no_css? can run before the overlay appears. One solution to this is to use has_css? with a short wait time, to detect the overlay before checking that it disappears. The has_xxx? methods just return true/false and don't raise errors so worst case has_css? misses the appearance/disappearance of the overlay completely and basically devolves into a sleep for the specified wait time.
visit('/table-page') # table with default values is displayed
# select all filters one by one. Wait for spinner to disappear after each selection
filters.each do |filter|
check(filter);
has_css?('.loading_overlay', wait: 1)
assert_no_selector('.loading-overlay', wait: 15)
end
# get table data as array of arrays. Added *minimum* so it waits for table
all('tbody tr', minimum: 1).map { |row| row.all('th,td').map(&:text) }

Related

How do I use cy.each() when each time the page reloads the images?

Desired Outcome: I would like to write a test that clicks on each of these "X"s. I would like to do this until there are not any images left.
Use Case:
I reload the list of images each time, to ensure I backfill up to 15.
The user has 18 images
On the first page I show 15
When the user deletes 1 image, I reload the images so there are 15 on page 1 again, and now only 2 images on page 2.
Error:
Because of the reload of the images, it is causing the .each functionality from Cypress to break with the following error message:
cy.click() failed because the page updated as a result of this
command, but you tried to continue the command chain. The subject is
no longer attached to the DOM, and Cypress cannot requery the page
after commands such as cy.click().
Cypress Code Implementation:
cy.get('[datacy="deleteImageIconX"]').each(($el) => cy.wrap($el).click());
What can I do to run a successful test that meets my use-case?
I've seen this before, this answer helped me tremendously waiting-for-dom-to-load-cypress.
The trick is not to iterate the elements, rather use a loop over the total (18) and to confirm deletion within the loop.
for (let i = 0; i < 18; i++) {
cy.get('[datacy="deleteImageIconX"]')
.first()
.click()
.should('not.exist');
}
This strategy eliminates the problems:
trying to work with stale element references (detached)
iterating too fast (confirm current action completes first)
page count being less than total count (page adjusts between iterations)

Iterate over array while adding new elements to array

I'm writing a web scraping script in Ruby that opens a used car website, searches for a make/model of car, loops over the pages of results, and then scrapes the data on each page.
The problem I'm having is that I don't necessarily know the max # of pages at the beginning, and only as I iterate closer to the last few known pages does the pagination increase and reveal more pages.
I've defined cleanpages as an array and populated it with what I know are the available pages when first opening the site. Then I use cleanpages.each do to iterate over those "pages". Each time I'm on a new page I add all known pages back into cleanpages and then run cleanpages.uniq to remove duplicates. The problem seems to be that cleanpages.each do only iterates as many times as its original length.
Can I make it so that within the each do loop, I increase the number of times it will iterate?
Rather than using Array#each, try using your array as a queue. The general idea is:
queue = initial_pages
while queue.any?
page = queue.shift
new_pages = process(page)
queue.concat(get_unprocessed_pages(new_pages))
end
The idea here is that you just keep taking items from the head of your queue until it's empty. You can push new items into the end of the queue during processing and they'll be processed correctly.
You'll want to be sure to remove pages from new_pages which are already in the queue or were already processed.
You could also do it by just keeping your array data structure, but manually keep a pointer to the current element in your list. This has the advantage of maintaining a full list of "seen" pages so you can remove them from your new_pages list before appending anything remaining to the list:
index = 0
queue = initial_pages
while true do
page = queue[index]
break if page.nil?
index += 1
new_pages = get_new_pages(page) - queue
queue.concat(new_pages)
end

Elasticsearch Delete by Query Version Conflict

I am using Elasticsearch version 5.6.10. I have a query that deletes records for a given agency, so they can later be updated by a nightly script.
The query is in elasticsearch-dsl and look like this:
def remove_employees_from_search(jurisdiction_slug, year):
s = EmployeeDocument.search()
s = s.filter('term', year=year)
s = s.query('nested', path='jurisdiction', query=Q("term", **{'jurisdiction.slug': jurisdiction_slug}))
response = s.delete()
return response
The problem is I am getting a ConflictError exception when trying to delete the records via that function. I have read this occurs because the documents were different between the time the delete process started and executed. But I don't know how this can be, because nothing else is modifying the records during the delete process.
I am going to add s = s.params(conflicts='proceed') in order to silence the exception. But this is a band-aid as I do not understand why the delete is not processing as expected. Any ideas on how to troubleshoot this? A snapshot of the error is below:
ConflictError:TransportError(409,
u'{
"took":10,
"timed_out":false,
"total":55,
"deleted":0,
"batches":1,
"version_conflicts":55,
"noops":0,
"retries":{
"bulk":0,
"search":0
},
"throttled_millis":0,
"requests_per_second":-1.0,
"throttled_until_millis":0,
"failures":[
{
"index":"employees",
"type":"employee_document",
"id":"24681043",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681043]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
},
{
"index":"employees",
"type":"employee_document",
"id":"24681063",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681063]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
}
You could try making it do a refresh first
client.indices.refresh(index='your-index')
source https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_indices_refresh
First, this is a question that was asked 2 years ago, so take my response with a grain of salt due to the time gap.
I am using the javascript API, but I would bet that the flags are similar. When you index or delete there is a refresh flag which allows you to force the index to have the result appear to search.
I am not an Elasticsearch guru, but the engine must perform some systematic maintenance on the indices and shards so that it moves the indices to a stable state. It's probably done over time, so you would not necessarily get an immediate state update. Furthermore, from personal experience, I have seen when delete does not seemingly remove the item from the index. It might mark it as "deleted", give the document a new version number, but it seems to "stick around" (probably until general maintenance sweeps run).
Here I am showing the js API for delete, but it is the same for index and some of the other calls.
client.delete({
id: string,
index: string,
type: string,
wait_for_active_shards: string,
refresh: 'true' | 'false' | 'wait_for',
routing: string,
timeout: string,
if_seq_no: number,
if_primary_term: number,
version: number,
version_type: 'internal' | 'external' | 'external_gte' | 'force'
})
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_delete
refresh
'true' | 'false' | 'wait_for' - If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes.
For additional reference, here is the page on Elasticsearch refresh info and what might be a fairly relevant blurb for you.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
Use the refresh API to explicitly refresh one or more indices. If the request targets a data stream, it refreshes the stream’s backing indices. A refresh makes all operations performed on an index since the last refresh available for search.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. You can change this default interval using the index.refresh_interval setting.

Get current no from prooph event store

I try to update a projection from event store. The following line will load all events:
$events = $this->eventStore->load(new StreamName('mystream'));
Currently i try to load only not handled events by passing the fromNumber parameter:
$events = $this->eventStore->load(new StreamName('mystream'), 10);
This will load all events eg from 15 to 40. But i found no way to figure out which is the current/highest "no" of the results. But this is necessary for me to load only from this entry on the next time.
If the database is truncated (with restarted sequences) this is not a real problem cause i know that the events will start with 1. But if the primary key starts with a number higher than 1 can not figure out which event has which number in the event store
When you are using pdo-event-store, you have a key _position in the event metadata after loading, so your read model can track which position was the last you were working on. Other then that, if you are working with proophs event-store projections, you don't need to take care of that at all. The projector will track the current event position for all needed streams internally, you just need to provide callbacks for each event where you need to do something.

Can't insert new data in HBase when using Delete and Put at same time

I am using Hbase mapreduce to calculate a report.
In the reducer, I try to clear the 'result' column family, and then add a new 'total' column. But I find the column family is delete, but new data is not insert. It seems the Put action doesn't work. Do you know why?
sample code in reducer class:
Delete del = new Delete(rowkey.getBytes());
del.addFamily(RESULT);
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), del);
Put put = new Put(rowkey.getBytes());
put.addColumn(RESULT, TOTAL, totalNum);
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
It is hbase limitation:
Deletes mask Puts
27.3.1. Deletes mask Puts
Deletes mask puts, even puts that happened after the delete was entered. See HBASE-2256. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond.

Resources