Good morning,
I am new to ZooKeeper and its protocols and I am interested in its broadcast protocol Zab.
Could you provide me with a simple java code that uses the Zab protocol of Zookeeper? I have been searching about that but I did not succeed to find a code that shows how can I use Zab.
In fact what I need is simple, I have a MapReduce code and I want all the mappers to update a variable (let's say X) whenever they succeed to find a better value of X (i.e. a bigger value). In this case, the leader has to compare the old value and the new value and then to broadcast the actual best value to all mappers. How can I do such a thing in Java?
Thanks in advance,
Regards
You don't need to use the Zab protocol. Instead you may follow the below steps:
You have a Znode say /bigvalue on Zookeeper. All the mappers when starts reads the value stored in it. They also put an watch for data change on the Znode. Whenever a mapper gets a better value, it updates the Znode with the better value. All the mappers will get notification for the data change event and they read the new best value and they re-establish the watch for data changes again. That way they are in sync with the latest best value and may update the latest best value whenever there is a better value.
Actually zkclient is a very good library to work with Zookeeper and it hides a lot of complexities ( https://github.com/sgroschupf/zkclient ). Below is an example that demonstrates how you may watch a Znode "/bigvalue" for any data change.
package geet.org;
import java.io.UnsupportedEncodingException;
import org.I0Itec.zkclient.IZkDataListener;
import org.I0Itec.zkclient.ZkClient;
import org.I0Itec.zkclient.exception.ZkMarshallingError;
import org.I0Itec.zkclient.exception.ZkNodeExistsException;
import org.I0Itec.zkclient.serialize.ZkSerializer;
import org.apache.zookeeper.data.Stat;
public class ZkExample implements IZkDataListener, ZkSerializer {
public static void main(String[] args) {
String znode = "/bigvalue";
ZkExample ins = new ZkExample();
ZkClient cl = new ZkClient("127.0.0.1", 30000, 30000,
ins);
try {
cl.createPersistent(znode);
} catch (ZkNodeExistsException e) {
System.out.println(e.getMessage());
}
// Change the data for fun
Stat stat = new Stat();
String data = cl.readData(znode, stat);
System.out.println("Current data " + data + "version = " + stat.getVersion());
cl.writeData(znode, "My new data ", stat.getVersion());
cl.subscribeDataChanges(znode, ins);
try {
Thread.sleep(36000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
#Override
public void handleDataChange(String dataPath, Object data) throws Exception {
System.out.println("Detected data change");
System.out.println("New data for " + dataPath + " " + (String)data);
}
#Override
public void handleDataDeleted(String dataPath) throws Exception {
System.out.println("Data deleted " + dataPath);
}
#Override
public byte[] serialize(Object data) throws ZkMarshallingError {
if (data instanceof String){
try {
return ((String) data).getBytes("UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
return null;
}
#Override
public Object deserialize(byte[] bytes) throws ZkMarshallingError {
try {
return new String(bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return null;
}
}
Related
I have a springboot Kafka Consumer & Producer. The consumer is expected to read data from topic 1 by 1, process(time consuming) it & write it to another topic and then manually commit the offset.
In order to avoid rebalancing, I have tried to call pause() and resume() on KafkaContainer but the consumer is always running & never responds to pause() call, tried it even with a while loop and faced no success(unable to pause the consumer). KafkaListenerEndpointRegistry is Autowired.
Springboot version = 2.6.9, spring-kafka version = 2.8.7
#KafkaListener(id = "c1", topics = "${app.topics.topic1}", containerFactory = "listenerContainerFactory1")
public void poll(ConsumerRecord<String, String> record, Acknowledgment ack) {
log.info("Received Message by consumer of topic1: " + value);
String result = process(record.value());
producer.sendMessage(result + " topic2");
log.info("Message sent from " + topicIn + " to " + topicOut);
ack.acknowledge();
log.info("Offset committed by consumer 1");
}
private String process(String value) {
try {
pauseConsumer();
// Perform time intensive network IO operations
resumeConsumer();
} catch (InterruptedException e) {
log.error(e.getMessage());
}
return value;
}
private void pauseConsumer() throws InterruptedException {
if (registry.getListenerContainer("c1").isRunning()) {
log.info("Attempting to pause consumer");
Objects.requireNonNull(registry.getListenerContainer("c1")).pause();
Thread.sleep(5000);
log.info("kafkalistener container state - " + registry.getListenerContainer("c1").isRunning());
}
}
private void resumeConsumer() throws InterruptedException {
if (registry.getListenerContainer("c1").isContainerPaused() || registry.getListenerContainer("c1").isPauseRequested()) {
log.info("Attempting to resume consumer");
Objects.requireNonNull(registry.getListenerContainer("c1")).resume();
Thread.sleep(5000);
log.info("kafkalistener container state - " + registry.getListenerContainer("c1").isRunning());
}
}
Am I missing something? Could someone please guide me with the right way of achieving the required behaviour?
You are running the process() method on the listener thread so pause/resume will not have any effect; the pause only takes place when the listener thread exits the listener method (and after it has processed all the records received by the previous poll).
The next version (2.9), due later this month, has a new property pauseImmediate, which causes the pause to take effect after the current record is processed.
You can try like this. This work for me
public class kafkaConsumer {
public void run(String topicName) {
try {
Consumer<String, String> consumer = new KafkaConsumer<>(config);
consumer.subscribe(Collections.singleton(topicName));
while (true) {
try {
ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofMillis(80000));
for (TopicPartition partition : consumerRecords.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = consumerRecords.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) {
kafkaEvent = record.value();
consumer.pause(consumer.assignment());
/** Implement Your Business Logic Here **/
Once your processing done
consumer.resume(consumer.assignment());
try {
consumer.commitSync();
} catch (CommitFailedException e) {
}
}
}
} catch (Exception e) {
continue;
}
}
} catch (Exception e) {
}
}
I can't seem to get an unbounded source to work with bufferTimeout without falling into unlimited demand. My source has a lot of data, but I can selectively pull from it, so there is no need to buffer a lot of data in memory if it isn't requested. However, I cannot figure out how to get reactor to both 1. Not request unlimited demand. 2. Not overflow when the source is a bit slow to respond.
Here is a JUnit test case:
#Test
void bufferAllowsRequested() throws InterruptedException {
ExecutorService workers = Executors.newFixedThreadPool(4);
AtomicBoolean down = new AtomicBoolean();
Flux.generate(sink -> {
produceRequestedTo(down, sink);
}).concatMap(Flux::fromIterable).bufferTimeout(400, Duration.ofMillis(200))
.doOnError(t -> {
t.printStackTrace();
down.set(true);
})
.publishOn(Schedulers.fromExecutor(workers), 4)
.subscribe(this::processBuffer);
Thread.sleep(3500);
workers.shutdownNow();
assertFalse(down.get());
}
private void processBuffer(List<Object> buf) {
System.out.println("Received " + buf.size());
try {
Thread.sleep(400);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private void produceRequestedTo(AtomicBoolean down, SynchronousSink<Object> sink) {
Thread.sleep(new Random().nextInt(1000));
try {
sink.next(IntStream.range(0, 500).boxed().collect(Collectors.toList()));
} catch (Exception e) {
e.printStackTrace();
down.set(true);
}
}
I've tried both Flux.create and Flux.generate, but both seem to suffer from this problem. I don't understand how this isn't a common use case.
I filed an issue here: https://github.com/reactor/reactor-core/issues/1557
I want to read and write hbase without using any reducer.
I followed the example in "The Apache HBaseâ„¢ Reference Guide", but there are exceptions.
Here is my code:
public class CreateHbaseIndex {
static final String SRCTABLENAME="sourceTable";
static final String SRCCOLFAMILY="info";
static final String SRCCOL1="name";
static final String SRCCOL2="email";
static final String SRCCOL3="power";
static final String DSTTABLENAME="dstTable";
static final String DSTCOLNAME="index";
static final String DSTCOL1="key";
public static void main(String[] args) {
System.out.println("CreateHbaseIndex Program starts!...");
try {
Configuration config = HBaseConfiguration.create();
Scan scan = new Scan();
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.addColumn(Bytes.toBytes(SRCCOLFAMILY), Bytes.toBytes(SRCCOL1));//info:name
HBaseAdmin admin = new HBaseAdmin(config);
if (admin.tableExists(DSTTABLENAME)) {
System.out.println("table Exists.");
}
else{
HTableDescriptor tableDesc = new HTableDescriptor(DSTTABLENAME);
tableDesc.addFamily(new HColumnDescriptor(DSTCOLNAME));
admin.createTable(tableDesc);
System.out.println("create table ok.");
}
Job job = new Job(config, "CreateHbaseIndex");
job.setJarByClass(CreateHbaseIndex.class);
TableMapReduceUtil.initTableMapperJob(
SRCTABLENAME, // input HBase table name
scan, // Scan instance to control CF and attribute selection
HbaseMapper.class, // mapper
ImmutableBytesWritable.class, // mapper output key
Put.class, // mapper output value
job);
job.waitForCompletion(true);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
System.out.println("Program ends!...");
}
public static class HbaseMapper extends TableMapper<ImmutableBytesWritable, Put> {
private HTable dstHt;
private Configuration dstConfig;
#Override
public void setup(Context context) throws IOException{
dstConfig=HBaseConfiguration.create();
dstHt = new HTable(dstConfig,SRCTABLENAME);
}
#Override
public void map(ImmutableBytesWritable row, Result value, Context context) throws IOException, InterruptedException {
// this is just copying the data from the source table...
context.write(row, resultToPut(row,value));
}
private static Put resultToPut(ImmutableBytesWritable key, Result result) throws IOException {
Put put = new Put(key.get());
for (KeyValue kv : result.raw()) {
put.add(kv);
}
return put;
}
#Override
protected void cleanup(Context context) throws IOException, InterruptedException {
dstHt.close();
super.cleanup(context);
}
}
}
By the way, "souceTable" is like this:
key name email
1 peter a#a.com
2 sam b#b.com
"dstTable" will be like this:
key value
peter 1
sam 2
I am a newbie in this field and need you help. Thx~
You are correct that you don't need a reducer to write to HBase, but there are some instances where a reducer might help. If you are creating an index, you might run into situations where two mappers are trying to write the same row. Unless you are careful to ensure that they are writing into different column qualifiers, you could overwrite one update with another due to race conditions. While HBase does do row level locking, it won't help if your application logic is faulty.
Without seeing your exceptions, I would guess that you are failing because you are trying to write key-value pairs from your source table into your index table, where the column family doesn't exist.
In this code you are not specifying the output format. You need to add the following code
job.setOutputFormatClass(TableOutputFormat.class);
job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
DSTTABLENAME);
Also, we are not supposed to create new configuration in the set up, we need to use the same configuration from context.
I want to attach different files to different reducers. Is it possible using distributed cache technology in hadoop?
I able to attach the same file(files) to all the reducers. But due to memory constraints, I want to know if I can attach different files to different reducers.
Forgive me if its an ignorant question.
Pls help!
Thanks in advance!
Also it may be worth trying to use an in-memory compute/data grid technology like GridGain, Infinispan, etc... This way you can load your data in memory and you would not have any limits on how to map your computational jobs (map/reduce) to any data using data affinity.
It is a strange desire since any reducer is not bound to a particular node and during the execution a reducer can be run on any node or even nodes (if there is a failure or speculative execution). Therefore all reducers should be homogeneous, the only thing that differs them is data they process.
So I suppose when you say that you want to put different files on different reducers you actually want to put different files on reducer and those files should correspond to the data (keys) those reducers will be processing.
The only one way I know to do it is put your data on HDFS and read it from reducer when it start processing data.
package com.a;
import javax.security.auth.login.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class PrefixNew4Reduce4 extends MapReduceBase implements Reducer<Text, Text, Text, Text>{
// #SuppressWarnings("unchecked")
ArrayList<String> al = new ArrayList<String>();
public void configure(JobConf conf4)
{
String from = "home/users/mlakshm/haship";
OutputStream dst = null;
try {
dst = new BufferedOutputStream(new FileOutputStream(to, false));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} /* src (hdfs file) something like hdfs://127.0.0.1:8020/home/rystsov/hi */
FileSystem fs = null;
try {
fs = FileSystem.get(new URI(from), conf4);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (URISyntaxException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
FSDataInputStream src;
try {
src = fs.open(new Path(from));
String val = src.readLine();
StringTokenizer st = new StringTokenizer(val);
al.add(val);
System.out.println("val:----------------->"+val);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void reduce (Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
StringTokenizer stk = new StringTokenizer(key.toString());
String t = stk.nextToken();
String i = stk.nextToken();
String j = stk.nextToken();
ArrayList<String> al1 = new ArrayList<String>();
for(int i = 0; i<al.size(); i++)
{
boolean a = (al.get(i).equals(i)) || (al.get(i).equals(j));
if(a==true)
{
output.collect(key, new Text(al.get(i));
}
while(values.hasNext())
{
String val = values.next().toString();
al1.add(val);
}
for(int i = 0; i<al1.size(); i++)
{
output.collect(key, new Text(al1.get(i));
}
my problem is: this design is working fine for one ball but i m unable to get it work for multiple balls, i have basically problem in replacing the "this" keyword in updateClients ().
i thought i need to do something like this but i m failed:
System.out.println("in ballimpl" + j.size());
for (ICallback aClient : j) {
aClient.updateClients(BallImpl[i]);
}
The current situation of code is :
The model remote object, which is iterating client list and calling update method of them,
public class BallImpl extends UnicastRemoteObject implements Ball,Runnable {
private List<ICallback> clients = new ArrayList<ICallback>();
protected static ServerServices chatServer;
static ServerServices si;
BallImpl() throws RemoteException {
super();
}
....
public synchronized void move() throws RemoteException {
loc.translate((int) changeInX, (int) changeInY);
}
public void start() throws RemoteException {
if (gameThread.isAlive()==false )
if (run==false){
gameThread.start();
}
}
/** Start the ball bouncing. */
// Run the game logic in its own thread.
public void run() {
while (true) {
run=true;
// Execute one game step
try {
updateClients();
} catch (RemoteException e) {
e.printStackTrace();
}
try {
Thread.sleep(50);
} catch (InterruptedException ex) {
}
}
}
public void updateClients() throws RemoteException {
si = new ServerServicesImpl();
List<ICallback> j = si.getClientNames();
System.out.println("in messimpl " + j.size());
if (j != null) {
System.out.println("in ballimpl" + j.size());
for (ICallback aClient : j) {
aClient.updateClients(this);
}
} else
System.err.println("Clientlist is empty");
}
}
The client which is implementing callback interface and has update method implementation :
public final class thenewBallWhatIwant implements Runnable, ICallback {
.....
#Override
public void updateClients(final Ball ball) throws RemoteException {
try {
ball.move();
try {
Thread.sleep(50);
} catch (Exception e) {
System.exit(0);
}
} catch (Exception e) {
System.out.println("Exception: " + e);
}
}
.....
}
thanks for any feedback.
jibbylala
Separate your RMI logic from your Ball logic.
You should be able to run your ball simulation without needing any RMI modules. Just to run it locally, to test it. Then you should find a way to wrap that process in RMI so that you can still run it locally to test it without any sort of RMI interface. This block of code is the engine, and it is very important to be able to test it in as atomic a form as possible. Having extra parts integrated with it just increases the complexity of what will undoubtedly be some of the most complex code.
Don't let any extra interfaces into your engine. It should be very specific and few the packages required to use your engine. Any new functionality your software needs, implement it appropriately in the engine to support generic design. Wrap that to provide specific functionality outside the core of the engine. This protects the engine design against changes to the environment. It also allows for more complete testing of the engine.
We make exceptions sometimes in cases where something will only ever be used in one way. But in this case, testing without RMI would seem to be critical to getting your engine working correctly. If your engine runs faster than the network can keep up due to large numbers of clients connecting, do you want the whole game to slow down, or do you want the clients to lag behind? I say, you want to be able to make that choice.