Can't insert new data in HBase when using Delete and Put at same time - hadoop

I am using Hbase mapreduce to calculate a report.
In the reducer, I try to clear the 'result' column family, and then add a new 'total' column. But I find the column family is delete, but new data is not insert. It seems the Put action doesn't work. Do you know why?
sample code in reducer class:
Delete del = new Delete(rowkey.getBytes());
del.addFamily(RESULT);
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), del);
Put put = new Put(rowkey.getBytes());
put.addColumn(RESULT, TOTAL, totalNum);
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);

It is hbase limitation:
Deletes mask Puts
27.3.1. Deletes mask Puts
Deletes mask puts, even puts that happened after the delete was entered. See HBASE-2256. Remember that a delete writes a tombstone, which only disappears after then next major compaction has run. Suppose you do a delete of everything ⇐ T. After this you do a new put with a timestamp ⇐ T. This put, even if it happened after the delete, will be masked by the delete tombstone. Performing the put will not fail, but when you do a get you will notice the put did have no effect. It will start working again after the major compaction has run. These issues should not be a problem if you use always-increasing versions for new puts to a row. But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond.

Related

Getting duplicates with NiFi HBase_1_1_2_ClientMapCacheService

I need to remove duplicates from a flow I've developed, it can receive the same ${filename} multiple times. I tried using HBase_1_1_2_ClientMapCacheService with DetectDuplicate (I am using NiFi v1.4), but found that it lets a few duplicates through. If I use DistributedMapCache (ClientService and Server), I do not get any duplicates. Why would I receive some duplicates with the HBase Cache?
As a test, I listed a directory (ListSFTP) with 20,000 files on all cluster nodes (4 nodes) and passed to DetectDuplicate (using the HBase Cache service). It routed 20,020 to "non-duplicate", and interestingly the table actually has 20,000 rows.
Unfortunately I think this is due to a limitation in the operations that are offered by HBase.
The DetectDuplicate processor relies on an operation "getAndPutIfAbsent" which is expected to return the original value, and then set the new value if it wasn't there. For example, first time through it would return null and set the new value, indicating it wasn't a duplicate.
HBase doesn't natively support this operation, so the implementation of this method in the HBase map cache client does this:
V got = get(key, keySerializer, valueDeserializer);
boolean wasAbsent = putIfAbsent(key, value, keySerializer, valueSerializer);
if (! wasAbsent) return got;
else return null;
So because it is two separate calls there is a possible race condition...
Imagine node 1 calls the first line and gets null, but then node 2 performs the get and the putIfAbsent, now when node 1 calls putIfAbsent it gets false because node 2 just populated the cache, so now node 1 returns the null value from the original get... both of these look like non-duplicates to DetectDuplicate.
In the DistributedMapCacheServer, it locks the entire cache per operation so it can provide an atomic getAndPutIfAbsent.

Handle StaleElement exception

I have a table in which data can be refreshed by selecting some filter checkboxes. One or more checkboxes can be selected and after each is selected a spinner is displayed on the page. Subsequent filters can only be selected once the previous selection has refreshed the table. The issue I am facing is that I keep getting StaleElementException intermittently.
This is what I do in capybara -
visit('/table-page') # table with default values is displayed
# select all filters one by one. Wait for spinner to disappear after each selection
filters.each {|filter| check(filter); has_no_css?('.loading-overlay', wait: 15)}
# get table data as array of arrays. Added *minimum* so it waits for table
all('tbody tr', minimum: 1).map { |row| row.all('th,td').map(&:text) }
I am struggling to understand why am I seeing StaleElementException. AFAIK Capybara uses synchronize to reload node when using text method on a given node. It also happens that sometimes the table data returns stale data(i.e the one before the last filter update)
The use of all or first disables reloading of any elements returned (If you use find the element is reloadable since the query used to locate the element is fully known). This means that if the page changes at all during the time the last line of your code is running you'll end up with the StaleElement errors. This is possible in your code because has_no_css? can run before the overlay appears. One solution to this is to use has_css? with a short wait time, to detect the overlay before checking that it disappears. The has_xxx? methods just return true/false and don't raise errors so worst case has_css? misses the appearance/disappearance of the overlay completely and basically devolves into a sleep for the specified wait time.
visit('/table-page') # table with default values is displayed
# select all filters one by one. Wait for spinner to disappear after each selection
filters.each do |filter|
check(filter);
has_css?('.loading_overlay', wait: 1)
assert_no_selector('.loading-overlay', wait: 15)
end
# get table data as array of arrays. Added *minimum* so it waits for table
all('tbody tr', minimum: 1).map { |row| row.all('th,td').map(&:text) }

how do i do an atomic update with etcd

I am trying to understand what an 'atomic' update is in terms on etcd.
When I think 'atomic', I think there is a 'before' and an 'after' (there isn't a during, and if the update fails, it is still 'before').
Here is an example:
curl -s -XPUT http://localhost:2379/v2/keys/message -d value='Hidee Ho'
So, at this point, anyone can access that message and get the current value:
curl -s http://localhost:2379/v2/keys/message
{"action":"get","node":{"key":"/message","value":"Hidee Ho","modifiedIndex":4748,"createdIndex":4748}}
Later on, I can modify this value, like this:
curl -s -XPUT http://localhost:2379/v2/keys/message -d value='Mr Hanky'
And the result can be fetched, just like before. Before my change the value 'Hidee Ho' comes back, after the change the value 'Mr Hanky' comes back. So, my question is am I guaranteed one or the other of the results? That is, I want to confirm that one or the other will be returned (and not a nil value between the result).
I don't particularly care about the timing. If I do the Mr Hanky update and subsequent fetchers of the value continue to get Hidee Ho for a (short) period of time, that's OK.
I am confused because there is an Atomic CompareAndSwap function in the protocol. As far as I can tell, it isn't so much Atomic as it is 'only do the update if the value is what I say it is'. In my case I don't much care what the value used to be. I just want to know that it is changed and that no readers will see anything other than the 'before' or 'after' values.
You are correct in that a plain PUT is atomic in that a client will only see the previous value or the new value.
The CompareAndSwap feature allows you to do optimistic locking so that you can write new values which depend on the prior value, e.g. a counter. If you were to implement a counter without using CompareAndSwap, you'd have something like write("count", 1 + read("count")) , in this case the read and write are separate, if 2 callers did this at the same time, then its possible they'd both see the same starting value, and you'd loose one of the increments. using CAS the caller can say set it to 12 only if the previous value is 11, now if this happens concurrently, one of the writes will fail, and it can then re-read and reapply its delta, so that you don't loose any increments.

Troubleshooting HBase batch puts

Is it possible to troubleshoot HBase batch puts? I'm using HBase batch puts of 5000 records at a time, and I would like to, on put failure, find out which row or rows is causing a problem and to log it.
The method HTable.batch(List actions) receives a list of Puts and returns an array in the same size of actions list (your puts list you gave to the function). If actions(i) failed, then the result[i] will be null.
Please note that when the failure inside batch() is due to maximum number of attempts to write, you need to catch RetriesExhaustedWithDetailsException, and call getExceptions(), to get the array which contains the mapping of the error to the put causing it.
See code here

Delay / Lag between Commit and select with Distributed Transactions when two connections are enlisted to the transaction in Oracle with ODAC

We have our application calling to two Oracle databases using two connections (which are kept open through out the application). For certain functionality, we use distributed transactions. We have Enlist=false in the connection string and manually enlist the connection to the transaction.
The problem comes with a scenario where, we update the same record very frequently within a distributed transaction, on which we see a delay to see the commited data in the previous run.
ex.
using (OracleConnection connection1 = new OracleConnection())
{
using(OracleConnection connection2 = new OracleConnection())
{
connection1.ConnectionString = connection1String;
connection1.Open();
connection2.ConnectionString = connection2String;
connection2.Open();
//for 100 times, do an update
{
.. check the previously updated value
connection1.EnlistTransaction(currentTransaction);
connection2.EnlistTransaction(currentTransaction);
.. do an update using connection1
.. do some updates with connection2
}
}
}
as in the above code fragment, we do update and check the previously updated value in the next iteration. The issues comes up when we run this for a single record frequently, on which we don't see the committed update in the last iteration in the next iteration even though it was committed in the previous iteration. But when this happens this update is visible in other applications in a very very small delay, and even within our code it's visible if we were to debug and run the line again.
It's almost like delay in the commit even though previous commit returned from the code.
Any one has any ideas ?
It turned out that I there's no way to control this behavior through ODAC. So the only viable solution was to implement a retry behavior in our code, since this occurs very rarely and when it happens, delay 10 seconds and retry the same.
Additional details on things I that I found on this can be found here.

Resources