Testing connection to HDFS - hadoop

In order to test connection to HDFS from a java program, is it sufficient enough to rely on FileSystem.get(configuration) or additional sanity checks should be done to do so?(fo ex: some file-based operations like list,copy,delete)

FileSystem.get(Configuration) creates a DistrubutedFileSystem object, which in turn relies on a DFSClient to talk to the NameNode. Buried deep down in the source (1.0.2 is the version i'm looking through), is a call to create an RPC for the NameNode, which in turn creates a Proxy for the ClientProtocol interface.
When this proxy is created, (org.apache.hadoop.ipc.RPC.getProxy(Class<? extends VersionedProtocol>, long, InetSocketAddress, UserGroupInformation, Configuration, SocketFactory, int)), a call is made to ensure the server and client both talk the same 'version', so this confirmation affirms that a NameNode is running at the configured address:
VersionedProtocol proxy =
(VersionedProtocol) Proxy.newProxyInstance(
protocol.getClassLoader(), new Class[] { protocol },
new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));
long serverVersion = proxy.getProtocolVersion(protocol.getName(),
clientVersion);
if (serverVersion == clientVersion) {
return proxy;
} else {
throw new VersionMismatch(protocol.getName(), clientVersion,
serverVersion);
}
Of course, whether the NameNode has sufficient datanodes running to perform some actions (such as create / open files) is not reported by this version match check.

Related

Correct usage of LoadbalanceRSocketClient with Spring's RSocketRequester

I'm trying to understand the correct configuration and usage pattern of LoadbalanceRSocketClient in a context of SpringBoot application (RSocketRequester).
I have two RSocket server backends (SpringBoot, RSocket messaging) running and configuring the RSocketRequester on a client side like this:
List<LoadbalanceTarget> servers = new ArrayList<>();
for (String url: backendUrls) {
HttpClient httpClient = HttpClient.create()
.baseUrl(url)
.secure(ssl ->
ssl.sslContext(SslContextBuilder.forClient().trustManager(InsecureTrustManagerFactory.INSTANCE)));
servers.add(LoadbalanceTarget.from(url, WebsocketClientTransport.create(httpClient, url)));
}
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
//.rsocketConnector(connector -> connector.reconnect(Retry.fixedDelay(60, Duration.ofSeconds(1))))
.transports(Flux.just(servers), new RoundRobinLoadbalanceStrategy());
Once configured, the requester is being used repeatedly form the timer loop, as following:
#Scheduled(fixedDelay = 10000, initialDelay = 1000)
public void timer() {
requester.route("/foo").data(Data).send().block();
}
It works - client starts, connects to one of the servers and pushes messages to it. If I kill the server that clients connected to, client reconnects to another server on the next timer event. If I start first server again and kill a second one though, client doesn't connect anymore and the following exeption is observed on a client side:
java.util.concurrent.CancellationException: Pool is exhausted
at io.rsocket.loadbalance.RSocketPool.select(RSocketPool.java:202) ~[rsocket-core-1.1.0.jar:na]
at io.rsocket.loadbalance.LoadbalanceRSocketClient.lambda$fireAndForget$0(LoadbalanceRSocketClient.java:49) ~[rsocket-core-1.1.0.jar:na]
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:125) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:220) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1784) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:251) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:336) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1784) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:61) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.subscribe(Mono.java:3987) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.MonoZip.subscribe(MonoZip.java:128) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.subscribe(Mono.java:3987) ~[reactor-core-3.4.0.jar:3.4.0]
at reactor.core.publisher.Mono.block(Mono.java:1678) ~[reactor-core-3.4.0.jar:3.4.0]
I suspect that I'm either not configuring the requester correctly or not using it properly. Would appreciate any hints as documentation and tests are seems to be pretty thin in this area.
Ideally I would want a client to transparently switch to any next available server upon server/connectivity failure. Right now re-connection attempt seems to be happening only on the next call to timer() method, which is not ideal as client needs to handle incoming messages from the server. Another thing I observed is that even so "/foo" is a FnF route, unless I do block() after a send() server never receives the call.
Update Endpoints List Continuously
LoadbalanceClient is designed to be integrated with the Discovery service which is responsible for keeping a List of alive Instances. That said if one of the services disappears from the cluster, the Discovery service updates its List of available Instances.
On the other hand, to implement client-side loadblancing, we have to know the list of available services in the cluster. It is obvious, that to setup loadbalancing, we can retrieve the list of services and supply it to the Loadbalancer API.
ReactiveDiscoveryClient discoveryClient = ...
Mono<List<LoadbalanceTarget>> serversMono = discoveryClient
.getInstances(serviceGroupName)
.map(si -> {
HttpClient httpClient = HttpClient.create()
.baseUrl(si.getUri())
.secure(ssl -> ssl.sslContext(
SslContextBuilder.forClient()
.trustManager(InsecureTrustManagerFactory.INSTANCE)
));
return LoadbalanceTarget.from(si.getUri(), WebsocketClientTransport.create(httpClient, "/rsocket")));
})
.collectList()
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
.transports(serversMono.flux(), new RoundRobinLoadbalanceStrategy());
However, imagine that we are in a fully distributed environment, and now every service that disappears and appears again - runs on the absolutely new host and port (e.g. kubernates cluster which does not stick to a particular IP address). That said, Loadbalancing has to consider such a scenario and to avoid dead nodes in the pool, it removes unhealthy nodes from the pool completely.
Now, if all the nodes disappeared and appeared after some time, they are not included in the pool anymore (and if the Flux, which provides updates is completed, effectively, the pool is exhausted because no new update will come in from the Flux<List<LodbalanceTarget>>).
However, the nodes register themselves into the Discovery service and become available for observation. All that said we have to periodically pull info from the Discovery service to be up to date and update pool state continuously
ReactiveDiscoveryClient discoveryClient = ...
Flux<List<LoadbalanceTarget>> serversFlux = discoveryClient
.getInstances(serviceGroupName)
.map(si -> {
HttpClient httpClient = HttpClient.create()
.baseUrl(si.getUri())
.secure(ssl -> ssl.sslContext(
SslContextBuilder.forClient()
.trustManager(InsecureTrustManagerFactory.INSTANCE)
));
return LoadbalanceTarget.from(si.getUri(), WebsocketClientTransport.create(httpClient, "/rsocket")));
})
.collectList()
.repeatWhen(f -> f.delayElements(Duration.ofSeconds(1))) // <- continuously retrieve new List of ServiceInstances
// RSocketRequester.Builder is autowired by Spring boot
RSocketRequester requester = builder
.setupRoute("/connect")
.setupData("test")
.transports(servers, new RoundRobinLoadbalanceStrategy());
With such a setup, the RSocketPool will not be exhausted if all the nodes disappear from the cluster, because the Flux<List<LoadbalanceTraget>> has not completed yet and may provide new updates eventually.
Note, the implementation is smart enough to keep active nodes on every update from the discovery service. That said if there is such a service instance in the pool, you will not get 2 connections at the same time.
Side note on reconnect feature
You may notice, that RSocketConnector provides such a great feature called .reconnect. At first glance, it may seem that the usage of reconnect will keep your connection up and running infinitely. Unfortunately, that is not true. The .reconnect feature is designed to keep your Mono<RSocket> reusable with cache semantic, which means that you may create a #Bean Mono<RSocket> ... and autowire it in a various place and subscribe multiple times without worrying that the result RSocket instance will be different on every Mono<RSocket>.subscribe. On the other hand, .reconnect, if given RSocket becomes disconnected (e.g. lost connection case) the next subscription to such a Mono<RSocket> will resistible a new RSocket only once for all concurrent .subscribe calls.
Though it sounds useful feature, in RSocketPool we do not rely on it much and use Mono<RSocket> only once to resolve and cache an instance of RSocket inside RSocketPool. That said if such RSocket will be disconnected, we will not be trying to subscribe to the given Mono<RSocket> again (we assume, that set up host and port will be changed)
For the question around FnF, this is part of the Rx model. Without a subscribe the event doesn't happen. You are free to call an API returning a Mono without side effects before the subscribe, any other behaviour is a bug.
/**
* Perform a Fire-and-Forget interaction via {#link RSocket#fireAndForget(Payload)}. Allows
* multiple subscriptions and performs a request per subscriber.
*/
Mono<Void> fireAndForget(Mono<Payload> payloadMono);
If you call this method once, and then subscribe 3 times on the result it will execute it 3 times.
Oleh, I tried what you suggested and it works to some extent, although I still can't quite get the behavior I need.
What I want to do is:
Client connects to a single (random) backend at a time
If backend or connectivity to the backend fails, client should try to connect to the next available backend.
I guess I can't use RoundRobinLoadbalanceStrategy as it connects the client to all available backends. Should I use WeightedLoadbalanceStrategy instead? Or should discoveryClient abstraction only return a single server every time - but that no longer would be a 'pool' client, right?
Perhaps I should re-think by approach in general. I have a few dozens of thousands of clients so I want to balance the load on the back end - spread it across multiple instances of the backend, so each client randomly connects to one instance of the backend but is capable of re-connecting to another instance, if instance it conneced to fails. I assume that this is not a good idea to connect all clients to every backend instance at the same time, but maybe I'm wrong?

Sharing connection pools between client instances in finagle

Given two or more finagle clients that have different destination names, if those names happen to resolve to the same inet address how to I get finagle to only maintain a single pool of connections to that endpoint?
Overly Simple Sample Code
The below code registers a very simple (and mostly useless) Resolver that always resolves to the same address. In practice this would be more like a many to one relationship (instead of all to one). There's another simple application that starts a server and two clients that both use the resolver to find the address to talk to the server.
// Registered with finagle Resolver via META-INF/services
class SmartResolver extends AbstractResolver {
#Override public String scheme() { return "smart"; }
#Override public Var<Addr> bind(String arg) {
return Vars.newConstVar(Addrs.newBoundAddr(Addresses.newInetAddress(
// assume this is more complicated and maps many names to
// one address
new InetSocketAddress("127.0.0.1", 9000)))));
}
}
class Main {
public static void main(String[] args) {
// Create a server, we record stats so we can see how many connections
// there are
InMemoryStatsReceiver stats = new InMemoryStatsReceiver();
ListeningServer server = Thrift.server()
.withStatsReceiver(stats)
.serveIface(":9000", (EchoService.ServiceIface) Future::value);
// create a few clients that all connect to a resolved server, the
// resolver ensures that they all are communicating with our server
EchoService.ServiceIface c1 = Thrift.client()
.newIface("smart!c1", EchoService.ServiceIface.class);
EchoService.ServiceIface c2 = Thrift.client()
.newIface("smart!c2", EchoService.ServiceIface.class);
// make sure any lazy connections have been opened
Await.result(c1.echo("c1"));
Await.result(c2.echo("c2"));
// I'm not sure how to see how many physical connections there are
// incoming to the server, or if it's possible to do this.
assertEquals(1, stats.counter(JavaConversions.asScalaBuffer(Arrays.asList("connects"))))
Await.result(server.close());
}
}
// echo.thrift
service EchoService {
string echo(1: string quack);
}
All the details
Where I work we have a microservice architecture using finagle and thrift, we use Consul for service discovery. Some of the external systems we interact with are very restrictive about the number and frequency of tcp connections they accept, for this reason some service instances are 'assigned responsibility' for these connections. To make sure that requests that need to use specific connections are sent to the correct service, that service registers a new service name in consul representing the connection it is responsible for. Clients then lookup the service by the connection name instead of the service name.
To make that clearer: Say you have a device-service that opens TCP connections to a configured list of devices, the devices only support a single connection at a time. You might have a few instances of this device-service with some scheme in place to divide the devices between the device-service instances. So instance A connects to devices foo and bar, instance B connects to baz.
You now have another service, say poke-service that needs to talk to specific devices (via the device-service). The way I've solved this issue is to have the device-service instance A register foo and bar, and instance B register baz, all against their own local address. So looking up foo resolves to the address for device-service instance A instead of the more generic cluster of all device-service instances.
The problem I'd like to solve is an optimisation problem, I'd like finagle to recognise that both foo and bar actually resolve to the same address and to reuse all the resources that it allocates and maintains as part of the connection. Additionally if foo and bar get re-assigned to different device-service instances, I'd like everything to just work based on the information in Consul which would reflect this change.

How does the HDFS Client knows the block size while writing?

The HDFS Client is outside the HDFS Cluster. When the HDFS Client write the file to hadoop the HDFS clients split the files into blocks and then it will write the block to datanode.
The question here is how the HDFS Client knows the Blocksize ? Block size is configured in the Name node and the HDFS Client has no idea about the block size then how it will split the file into blocks ?
HDFS is designed in a way where the block size for a particular file is part of the MetaData.
Let's just check what does this mean?
The client can tell the NameNode that it will put data to HDFS with a particular block size.
The client has its own hdfs-site.xml that can contain this value, and can specify it on a per-request basis as well using the -Ddfs.blocksize parameter.
If the client configuration does not define this parameter, then it defaults to the org.apache.hadoop.hdfs.DFSConfigKeys.DFS_BLOCK_SIZE_DEFAULT value which is 128MB.
NameNode can throw an error for the client if it specifies a blocksize that is smaller then dfs.namenode.fs-limits.min-block-size (1MB by default).
There is nothing magical in this, NameNode does know nothing about the data and let the client to decide the optimal splitting, as well as to define the replication factor for blocks of a file.
In simple words, When you do client URI deploy, it will place server URI into Client or you download and manually replace in client. So whenever client request for info, it will go to the NameNode and fetch the required info or place new info on DataNodes.
P.S: Client = EdgeNode
Some more details below (from the Hadoop Definitive Guide 4th edition)
"The client creates the file by calling create() on DistributedFileSystem (step 1 in
Figure 3-4). DistributedFileSystem makes an RPC call to the namenode to create a new
file in the filesystem’s namespace, with no blocks associated with it (step 2). The
namenode performs various checks to make sure the file doesn’t already exist and that the
client has the right permissions to create the file. If these checks pass, the namenode
makes a record of the new file; otherwise, file creation fails and the client is thrown an
IOException. The DistributedFileSystem returns an FSDataOutputStream for the client
to start writing data to. Just as in the read case, FSDataOutputStream wraps a
DFSOutputStream, which handles communication with the datanodes and namenode.
As the client writes data (step 3), the DFSOutputStream splits it into packets, which it writes to an internal queue called the data queue."
Adding more info in response to comment on this post:
Here is a sample client program to copy a file to HDFS (Source-Hadoop Definitive Guide)
public class FileCopyWithProgress {
public static void main(String[] args) throws Exception {
String localSrc = args[0];
String dst = args[1];
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst), new Progressable() {
public void progress() {
System.out.print(".");
}
});
IOUtils.copyBytes(in, out, 4096, true);
}
}
If you look at create() method implementation in FileSystem class, it has getDefaultBlockSize() as one of its arguments, which inturn fetches the values from configuration which is turn is provided by the namenode.
This is how client gets to know the block size configured on hadoop cluster.
Hope this helps

what difference of managed and unmanaged hconnection in hbase?

When i tried to create a HTable instance in this way.
Configuration conf = HBaseConfiguration.create();
HConnection conn = HConnectionManager.getConnection(conf);
conn.getTable("TABLE_NAME");
Then i got a Exception.
#Override
public HTableInterface getTable(TableName tableName, ExecutorService pool) throws IOException {
if (managed) {
throw new IOException("The connection has to be unmanaged.");
}
return new HTable(tableName, this, pool);
}
So , i wants to know the concrete reflection of managed and 'unmanaged' Hconnection?
Before call HConnectionManager.getConnection you have to create connection using HConnectionManager.createConnection passing to it earlier created HBaseConfiguration instance. HConnectionManager.getConnection return connection which is already exists. A bit of HConnectionManager javadoc about how it handle connection pool:
This class has a static Map of HConnection instances keyed by Configuration; all invocations of getConnection(Configuration) that pass the sameConfiguration instance will be returned the sameHConnection instance
In your case, you can simply create connection using HConnectionManager.createConnection and use returned connection to open HTable
Edit:
#ifiddddddbest, I found javadocs for HConnectionImplementation which has description of managed flag(may be it will help you to understand):
#param managed If true, does not do full shutdown on close; i.e.
cleanup of connection to zk and shutdown of all services; we just
close down the resources this connection was responsible for and
decrement usage counters. It is up to the caller to do the full
cleanup. It is set when we want have connection sharing going on --
reuse of zk connection, and cached region locations, established
regionserver connections, etc. When connections are shared, we have
reference counting going on and will only do full cleanup when no more
users of an HConnectionImplementation instance.
In the newer versions of HBase(>1.0), managed flag was disappeared and all connection management now on client side,e.g. client responsible to close it and if it do this, it close all internal connections to ZK,to HBase master, etc, not only decrease reference counter.

Using zmq_conect a port befor zmq_bind, return suncces

I`m using zero mq 3.2.0 C++ libary. I use zmq_connect to connect a port before zmq_bild. But this function return success. How can I know connect fail? My code is:
void *ctx = zmq_ctx_new(1);
void *skt = zmq_socket(ctx, ZMQ_SUB);
int ret = zmq_connect(skt, "tcp://192.168.9.97:5561"); // 192.168.9.97:5561 is not binded
// zmq_connect return zero
This is actually a feature of zeromq, connection status and so on is abstracted away from you. There is no exposed information you can check to see if you're connected or not AFAIK. This means that you can connect even if the server is temporarily down, and zeromq will handle everything when the server comes available later. This can be both a blessing and a curse.
What most people end up doing if they need to know connection status is to implement some sort of heartbeat. REQ/REP ping/pong for example.
Have a look at the lazy pirate pattern for an example of how to ensure reliability from a client perspective.

Resources