Row Level Locks in HBase - hadoop

I have started using hbase recently, just wanted to check here if anyone came across the scenario which i have been facing right now.
I have a webservice deployed in couple of servers and accessing the HBase to update a field. Now this field update is conditional means i have to read the field from hbase and if its value is "A",then update to "B". If the concurrent update is "C" do not update. But since different servers and concurrent requests, possible that both read existing value as A and one update with B and other with "C".
If there are requests coming concurrently from different servers, then there is no use of thread level locking. Also multiple request from same server.
Is there a way to lock at the HBase level, so that i can aquire the lock at service layer and lock the row and then update it.
There is RowLock in HBase API, but we are using the higher version (1.1.2.3) of hbase where that class is removed.
Appreciate if someone could show a direction!!
Thanks in advance.

Related

How to run laravel queue jobs in multiple databases?

I have multiple databases in my project based on company we are giving new database for that company.i am developing automation workflows in my current project for that i was planned to implemented queue-jobs concept to achieve this one.
we are maintaining one more database which will contain all databases list (companies which are using them).i was little confused how to approach these kind of scenarios,either i have to maintain jobs table in my commonDatabase or if i have to create jobs table inside each database separately.
Note:- EVery time user tried to login he has to give company name(All requests we are sending in headers) that indicates database name.
My doubts are:-
i created jobs table in each database but it's not inserting records
in particular database,instead of it's inserting in commonDatabase
jobs table?
what is the best way to acheive this kind of scenario ?
if any user logged out the queue will run background or not ?
The thing I understand from you question is that you want to covert your project to multi-tenant multi-database and every company will generate a request to build tenant for them. The answers of your question are following:
I created jobs table in each database but it's not inserting records in particular database,instead of it's inserting in commonDatabase jobs table?
I must said you to watch this youtube play list.
If the Job is related to a company process i.e. you want to process any company invoice email then you will dispatch job in company database and if job is related to commonDatabase i.e. you want to configure any company database then run migrations & seeder into it, then it should be dispatch in commonDatabase.
if any user logged out the queue will run background or not?
yes, the queue will still run in background because the queue worker run on server and it doesn't have any concern with login session or any other authentication medium. You must need to read following articles/threads
Official Laravel Doc on queue
How to setup laravel queue worker

How do we reset the state associated with a Kafka Connect source connector?

We are working with Kafka Connect 2.5.
We are using the Confluent JDBC source connector (although I think this question is mostly agnostic to the connector type) and are consuming some data from an IBM DB2 database onto a topic, using 'incrementing mode' (primary keys) as unique IDs for each record.
That works fine in the normal course of events; the first time the connector starts all records are consumed and placed on a topic, then, when new records are added, they are added to our topic. In our development environment, when we change connector parameters etc., we want to effectively reset the connector on-demand; i.e. have it consume data from the “beginning” of the table again.
We thought that deleting the connector (using the Kafka Connect REST API) would do this - and would have the side-effect of deleting all information regarding that connector configuration from the Kafka Connect connect-* metadata topics too.
However, this doesn’t appear to be what happens. The metadata remains in those topics, and when we recreate/re-add the connector configuration (again using the REST API), it 'remembers' the offset it was consuming from in the table. This seems confusing and unhelpful - deleting the connector doesn’t delete its state. Is there a way to more permanently wipe the connector and/or reset its consumption position, short of pulling down the whole Kafka Connect environment, which seems drastic? Ideally we’d like not to have to meddle with the internal topics directly.
Partial answer to this question: it seems the behaviour we are seeing is to be expected:
If you’re using incremental ingest, what offset does Kafka Connect
have stored? If you delete and recreate a connector with the same
name, the offset from the previous instance will be preserved.
Consider the scenario in which you create a connector. It successfully
ingests all data up to a given ID or timestamp value in the source
table, and then you delete and recreate it. The new version of the
connector will get the offset from the previous version and thus only
ingest newer data than that which was previously processed. You can
verify this by looking at the offset.storage.topic and the values
stored in it for the table in question.
At least for the Confluent JDBC connector, there is a workaround to reset the pointer.
Personally, I'm still confused why Kafka Connect retains state for the connector at all when it's deleted, but seems that is designed behaviour. Would still be interested if there is a better (and supported) way to remove that state.
Another related blog article: https://rmoff.net/2019/08/15/reset-kafka-connect-source-connector-offsets/

Datasource changes to secondary on run time if primary is offline

I have to deal with the following scenario for spring application with Oracle database:
Spring application uses the primary database. In the meantime the secondary database stores data for disaster recovery (from primary).
The first step is currently provided. At this moment I have to implement:
When the primary database gets offline application should change the connection to the secondary database).
The implementation should be programmatically. How can I achieve that without changing the code that currently exists? Is there any working solution (library)?
I think about AbstractRoutingDataSource and ping databases (e.g. every 5 seconds) but I'm not sure about this solution.
So, let's to summarize the issue. I was unable to use Oracle RAC (Real Application Cluster). If the implementation should be programmatically you can try AbstractRoutingDataSource approche.
I have implemented timer that pings current database every 1 second (you can use validation query and check if you can read from database... if no we assume there is no connection and we can switch a datasource).
Thanks to that I was able to change datasource on runtime when current datasource is offline. What is more important it was automatic.
On the other hand, there are disadvantages:
For short time user can see the errors if the database is not
switched yet.
Some part of application may stop working if it is not properly
secured against the lack of connection to the database.

Manually logging database event in datastage job

i have a parallel job that writes in oracle table. I want to manually write warnings in Datastage's log if some event occur. For example if a certain value for a certain column is inserted i want to track this information in the log. Could this be achieved somehow?
To write custom messages into the logs for a particular jobs data stream, you can use a combination of a copy stage, transformer, and peak stage. The peak stage is the one that writes to the logs. I like to set the peak stage to run in sequential mode, so that your messages are kept together in single entries in the log, instead across nodes.
Also, you can peak the rejects of the oracle stage. maybe combine this with the above option (using a funnel stage and a standard column schema).
Lastly, if you'd actually like to query the logs themselves and write those logs out somewhere else or use them in a job (amoungst allother data kept about jobs in the repository). You can directly query the DSODB schema in the XMETA database. I.e. the DataStage repository (by default DB2).
You would need to have the DataStage Operations Console up and running for that (not sure what version of DataStage you're running). If DataStage is running on a single tier and using the default DB2 database. You can simply catalog the DSODB database so that it's available as a connection in the DB2 connector. Else you'd need to install a DB2 client on the DataStage engine tier and catalog the database there.
All the best!
Twitter: #InforgeAcademy
DataStage tips and Tricks: https://www.inforgeacademy.com/blog/

What should be approach?

Try to be more clear, I'm in lack of ideas in this problem, even it sounds like a classic.
My application is running on weblogic 10.3.3 application server, and for database I am using Oracle database 11g. My problem is that there is table in db, let's say "user.", there is column, let's say "columnA", in this table. This table is updating by some module of application.
What I want if when value of column is "abc.", then I have to show alert to console(IP). {IP can be retrieved from DB as it is configured in DB. this ip will be other linux system other than linux machine where oracle database is installed.} Updating is continuously done on my table from module of application. Please tell me from where should I start?, what should I read. I am not able to understand what should be approach. Any help is much appreciated.
A trigger on the table can call UTL_HTTP to communicate with another machine (eg call a RESTful API).
The architectural questions are :
This will happen PRIOR to the commit so you may get false alerts if a change is rolled back
If you wait for a response, it will slow the system down.
What do you do if you get an non-standard response (eg the other server isn't available)

Resources