Configure GremlinServer to JanusGraph with HBase and Elasticsearch - janusgraph

Can't create instance of GremlinServer with HBase and Elasticsearch.
When i run shell script: bin/gremlin-server.sh config/gremlin.yaml. I get exception:
Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
Gremlin-server logs
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/slf4j-log4j12-1.7.12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user/janusgraph/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
0 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer -
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
135 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Configuring Gremlin Server from config/gremlin.yaml
211 [main] INFO org.apache.tinkerpop.gremlin.server.util.MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
557 [main] INFO org.janusgraph.diskstorage.hbase.HBaseCompatLoader - Instantiated HBase compatibility layer supporting runtime HBase version 1.2.6: org.janusgraph.diskstorage.hbase.HBaseCompat1_0
835 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure
836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local
836 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181
866 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1214 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x1e44b638 connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:host.name=main.local
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_212
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Oracle Corporation
1220 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre
1221 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/home/user/janusgraph/conf/gremlin-server:/home/user/janusgraph/lib/slf4j-log4j12-
// Here hanusgraph download very many dependencies
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:os.version=3.10.0-862.el7.x86_64
1256 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.name=user
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.home=/home/user
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/home/user/janusgraph
1257 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x1e44b6380x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure
1274 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server data2.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error)
1394 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to data2.local/xxx.xxx.xxx.xxx, initiating session
1537 [main-SendThread(data2.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server data2.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x26b266353e50014, negotiated timeout = 60000
3996 [main] INFO org.janusgraph.core.util.ReflectiveConfigOptionLoader - Loaded and initialized config classes: 13 OK out of 13 attempts in PT0.631S
4103 [main] INFO org.reflections.Reflections - Reflections took 60 ms to scan 2 urls, producing 0 keys and 0 values
4400 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-time=180000 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (10000). Use the ManagementSystem interface instead of the local configuration to control this setting.
4453 [main] WARN org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Local setting cache.db-cache-clean-wait=20 (Type: GLOBAL_OFFLINE) is overridden by globally managed value (50). Use the ManagementSystem interface instead of the local configuration to control this setting.
4473 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing master protocol: MasterService
4474 [main] INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation - Closing zookeeper sessionid=0x26b266353e50014
4485 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Session: 0x26b266353e50014 closed
4485 [main-EventThread] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - EventThread shut down
4500 [main] INFO org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration - Generated unique-instance-id=c0a8873843641-main-local1
4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - HBase configuration: setting zookeeper.znode.parent=/hbase-unsecure
4530 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied host list from root.storage.hostname to hbase.zookeeper.quorum: main.local,data1.local,data2.local
4531 [main] INFO org.janusgraph.diskstorage.hbase.HBaseStoreManager - Copied Zookeeper Port from root.storage.port to hbase.zookeeper.property.clientPort: 2181
4532 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x5bb3d42d connecting to ZooKeeper ensemble=main.local:2181,data1.local:2181,data2.local:2181
4532 [main] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=main.local:2181,data1.local:2181,data2.local:2181 sessionTimeout=90000 watcher=hconnection-0x5bb3d42d0x0, quorum=main.local:2181,data1.local:2181,data2.local:2181, baseZNode=/hbase-unsecure
4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Opening socket connection to server main.local/xxx.xxx.xxx.xxx:2181. Will not attempt to authenticate using SASL (unknown error)
4534 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Socket connection established to main.local/xxx.xxx.xxx.xxx:2181, initiating session
4611 [main-SendThread(main.local:2181)] INFO org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ClientCnxn - Session establishment complete on server main.local/xxx.xxx.xxx.xxx:2181, sessionid = 0x36b266353fd0021, negotiated timeout = 60000
4616 [main] INFO org.janusgraph.diskstorage.Backend - Configuring index [search]
5781 [main] INFO org.janusgraph.diskstorage.Backend - Initiated backend operations thread pool of size 16
6322 [main] INFO org.janusgraph.diskstorage.Backend - Configuring total store cache size: 186687592
7555 [main] INFO org.janusgraph.graphdb.database.IndexSerializer - Hashing index keys
7925 [main] INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2019-06-13T09:54:08.929Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller#656d10a4
7927 [main] INFO org.apache.tinkerpop.gremlin.server.GremlinServer - Graph [graph] was successfully configured via [config/db.properties].
7927 [main] INFO org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
Exception in thread "main" java.lang.IllegalStateException: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:522)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:126)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.<init>(GremlinExecutor.java:83)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor$Builder.create(GremlinExecutor.java:813)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:169)
at org.apache.tinkerpop.gremlin.server.util.ServerGremlinExecutor.<init>(ServerGremlinExecutor.java:89)
at org.apache.tinkerpop.gremlin.server.GremlinServer.<init>(GremlinServer.java:110)
at org.apache.tinkerpop.gremlin.server.GremlinServer.main(GremlinServer.java:363)
Caused by: java.lang.NoSuchMethodException: org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin.build()
at java.lang.Class.getMethod(Class.java:1786)
at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.initializeGremlinScriptEngineManager(GremlinExecutor.java:492)
... 7 more
Graph configuration:
storage.backend=hbase
storage.hostname=main.local,data1.local,data2.local
storage.port=2181
storage.hbase.ext.zookeeper.znode.parent=/hbase-unsecure
cache.db-cache=true
cache.db-cache-clean-wait=20
cache.db-cache-time=180000
cache.db-cache-size=0.5
index.search.backend=elasticsearch
index.search.hostname=xxx.xxx.xxx.xxx
index.search.port=9200
index.search.elasticsearch.client-only=false
gremlin.graph=org.janusgraph.core.JanusGraphFactory
host=0.0.0.0
Gremlin-server configuration
host: localhost
port: 8182
channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
graphs: { graph: config/db.properties }
scriptEngines: {
gremlin-groovy: {
plugins: {
org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: { classImports: [java.lang.Math], methodImports: [java.lang.Math#*] },
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: { files: [scripts/janusgraph.groovy] }
}
}
}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } }
- { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true } }
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] } }
metrics: {
slf4jReporter: {enabled: true, interval: 180000}
}
What do I need to do to server start without error?

Related

Is it to be expected that the client will discover each node twice for the Hazelcast sidecar caching pattern?

I'm pretty new to using Hazelcast for its interesting feature of auto-sync with other cache instances. My queries are bottom of the description.
Here was my initial goal:
Design an environment following Hazelcast sidecar caching pattern.
There will be no cache on the application container side. Basically, I don't want to use "near-cache" just to avoid my JVM being heavy and reduce GC time.
Application Container in each Node will communicate with its own sidecar cache container via localhost IP.
Hazelcast management center will be a separate node that communicates with all the nodes containing Hazelcast sidecar cache container.
Here is the target design:
I prepared Hazelcast configuration [hazelcast.yaml] for Hazelcast container,
hazelcast:
cluster-name: dev
network:
port:
auto-increment: false
port-count: 3
port: 5701
I also prepared another hazelcast.yaml for my application container,
hazelcast:
map:
default:
backup-count: 0
async-backup-count: 1
read-backup-data: true
network:
reuse-address: true
port:
auto-increment: true
port: 5701
join:
multicast:
enabled: true
kubernetes:
enabled: false
tcp-ip:
enabled: false
interaface: 127.0.0.1
member-list:
- 127.0.0.1:5701
Here is the client part, I used SpringBoot for it.
#Component
public class CacheClient {
private static final String ITEMS = "items";
private HazelcastInstance client;
CacheClient() throws IOException {
ClientConfig config = new YamlClientConfigBuilder("hazelcast.yaml").build();
config.setInstanceName(UUID.randomUUID().toString());
client = HazelcastClient.getOrCreateHazelcastClient(config);
}
public Item put(String number, Item item){
IMap<String, Item> map = client.getMap(ITEMS);
return map.putIfAbsent(number, item);
}
public Item get(String key){
IMap<String, Item> map = client.getMap(ITEMS);
return map.get(key);
}
}
Here is the dockerfile, I used to build my application container image,
FROM adoptopenjdk/openjdk11:jdk-11.0.5_10-alpine-slim
# Expose port 8081 to Docker host
EXPOSE 8081
WORKDIR /opt
COPY /build/libs/hazelcast-client-0.0.1-SNAPSHOT.jar /opt/app.jar
COPY /src/main/resources/hazelcast.yaml /opt/hazelcast.yaml
COPY /src/main/resources/application.properties /opt/application.properties
ENTRYPOINT ["java","-Dhazelcast.socket.server.bind.any=false","-Dhazelcast.initial.min.cluster.size=1","-Dhazelcast.socket.bind.any=false","-Dhazelcast.socket.server.bind.any=false","-Dhazelcast.socket.client.bind=false","-Dhazelcast.socket.client.bind.any=false","-Dhazelcast.logging.type=slf4j","-jar","app.jar"]
Here is the deployment script I used,
apiVersion: v1 # Kubernetes API version
kind: Service # Kubernetes resource kind we are creating
metadata: # Metadata of the resource kind we are creating
name: spring-hazelcast-service
spec:
selector:
app: spring-hazelcast-app
ports:
- protocol: "TCP"
name: http-app
port: 8081 # The port that the service is running on in the cluster
targetPort: 8081 # The port exposed by the service
type: LoadBalancer # type of the service. LoadBalancer indicates that our service will be external.
---
apiVersion: apps/v1
kind: Deployment # Kubernetes resource kind we are creating
metadata:
name: spring-hazelcast-app
spec:
selector:
matchLabels:
app: spring-hazelcast-app
replicas: 1 # Number of replicas that will be created for this deployment
template:
metadata:
labels:
app: spring-hazelcast-app
spec:
containers:
- name: hazelcast
image: hazelcast/hazelcast:4.0.2
workingDir: /opt
ports:
- name: hazelcast
containerPort: 5701
env:
- name: HZ_CLUSTERNAME
value: dev
- name: JAVA_OPTS
value: -Dhazelcast.config=/opt/config/hazelcast.yml
volumeMounts:
- mountPath: "/opt/config/"
name: allconf
- name: spring-hazelcast-app
image: spring-hazelcast:1.0.3
imagePullPolicy: Never #IfNotPresent
ports:
- containerPort: 8081 # The port that the container is running on in the cluster
volumes:
- name: allconf
hostPath:
path: /opt/config/ # directory location on host
type: Directory # this field is optional
---
apiVersion: v1 # Kubernetes API version
kind: Service # Kubernetes resource kind we are creating
metadata: # Metadata of the resource kind we are creating
name: hazelcast-mc-service
spec:
selector:
app: hazelcast-mc
ports:
- protocol: "TCP"
name: mc-app
port: 8080 # The port that the service is running on in the cluster
targetPort: 8080 # The port exposed by the service
type: LoadBalancer # type of the
loadBalancerIP: "127.0.0.1"
---
apiVersion: apps/v1
kind: Deployment # Kubernetes resource kind we are creating
metadata:
name: hazelcast-mc
spec:
selector:
matchLabels:
app: hazelcast-mc
replicas: 1 # Number of replicas that will be created for this deployment
template:
metadata:
labels:
app: hazelcast-mc
spec:
containers:
- name: hazelcast-mc
image: hazelcast/management-center
ports:
- containerPort: 8080 # The port that the container is running on in the cluster
Here is my application logs,
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.5.4)
2021-09-27 06:42:51.274 INFO 1 --- [ main] com.caching.Application : Starting Application using Java 11.0.5 on spring-hazelcast-app-7bdc8b7f7-bqdlt with PID 1 (/opt/app.jar started by root in /opt)
2021-09-27 06:42:51.278 INFO 1 --- [ main] com.caching.Application : No active profile set, falling back to default profiles: default
2021-09-27 06:42:55.986 INFO 1 --- [ main] c.h.c.impl.spi.ClientInvocationService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Running with 2 response threads, dynamic=true
2021-09-27 06:42:56.199 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] HazelcastClient 4.0.2 (20200702 - 2de3027) is STARTING
2021-09-27 06:42:56.202 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] HazelcastClient 4.0.2 (20200702 - 2de3027) is STARTED
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.hazelcast.internal.networking.nio.SelectorOptimizer (jar:file:/opt/app.jar!/BOOT-INF/lib/hazelcast-all-4.0.2.jar!/) to field sun.nio.ch.SelectorImpl.selectedKeys
WARNING: Please consider reporting this to the maintainers of com.hazelcast.internal.networking.nio.SelectorOptimizer
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-09-27 06:42:56.277 INFO 1 --- [ main] c.h.c.i.c.ClientConnectionManager : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Trying to connect to cluster: dev
2021-09-27 06:42:56.302 INFO 1 --- [ main] c.h.c.i.c.ClientConnectionManager : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Trying to connect to [127.0.0.1]:5701
2021-09-27 06:42:56.429 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] HazelcastClient 4.0.2 (20200702 - 2de3027) is CLIENT_CONNECTED
2021-09-27 06:42:56.429 INFO 1 --- [ main] c.h.c.i.c.ClientConnectionManager : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Authenticated with server [172.17.0.3]:5701:c967f642-a7aa-4deb-a530-b56fb8f68c78, server version: 4.0.2, local address: /127.0.0.1:54373
2021-09-27 06:42:56.436 INFO 1 --- [ main] c.h.internal.diagnostics.Diagnostics : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2021-09-27 06:42:56.461 INFO 1 --- [21ad30a.event-4] c.h.c.impl.spi.ClientClusterService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2]
Members [1] {
Member [172.17.0.3]:5701 - c967f642-a7aa-4deb-a530-b56fb8f68c78
}
2021-09-27 06:42:56.803 INFO 1 --- [ main] c.h.c.i.s.ClientStatisticsService : Client statistics is enabled with period 5 seconds.
2021-09-27 06:42:57.878 INFO 1 --- [ main] c.h.i.config.AbstractConfigLocator : Loading 'hazelcast.yaml' from the working directory.
2021-09-27 06:42:57.934 WARN 1 --- [ main] c.h.i.impl.HazelcastInstanceFactory : Hazelcast is starting in a Java modular environment (Java 9 and newer) but without proper access to required Java packages. Use additional Java arguments to provide Hazelcast access to Java internal API. The internal API access is used to get the best performance results. Arguments to be used:
--add-modules java.se --add-exports java.base/jdk.internal.ref=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/sun.nio.ch=ALL-UNNAMED --add-opens java.management/sun.management=ALL-UNNAMED --add-opens jdk.management/com.sun.management.internal=ALL-UNNAMED
2021-09-27 06:42:57.976 INFO 1 --- [ main] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [4.0.2] Prefer IPv4 stack is true, prefer IPv6 addresses is false
2021-09-27 06:42:57.987 INFO 1 --- [ main] com.hazelcast.instance.AddressPicker : [LOCAL] [dev] [4.0.2] Picked [172.17.0.3]:5702, using socket ServerSocket[addr=/172.17.0.3,localport=5702], bind any local is false
2021-09-27 06:42:58.004 INFO 1 --- [ main] com.hazelcast.system : [172.17.0.3]:5702 [dev] [4.0.2] Hazelcast 4.0.2 (20200702 - 2de3027) starting at [172.17.0.3]:5702
2021-09-27 06:42:58.005 INFO 1 --- [ main] com.hazelcast.system : [172.17.0.3]:5702 [dev] [4.0.2] Copyright (c) 2008-2020, Hazelcast, Inc. All Rights Reserved.
2021-09-27 06:42:58.047 INFO 1 --- [ main] c.h.s.i.o.impl.BackpressureRegulator : [172.17.0.3]:5702 [dev] [4.0.2] Backpressure is disabled
2021-09-27 06:42:58.373 INFO 1 --- [ main] com.hazelcast.instance.impl.Node : [172.17.0.3]:5702 [dev] [4.0.2] Creating MulticastJoiner
2021-09-27 06:42:58.380 WARN 1 --- [ main] com.hazelcast.cp.CPSubsystem : [172.17.0.3]:5702 [dev] [4.0.2] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
2021-09-27 06:42:58.676 INFO 1 --- [ main] c.h.s.i.o.impl.OperationExecutorImpl : [172.17.0.3]:5702 [dev] [4.0.2] Starting 2 partition threads and 3 generic threads (1 dedicated for priority tasks)
2021-09-27 06:42:58.682 INFO 1 --- [ main] c.h.internal.diagnostics.Diagnostics : [172.17.0.3]:5702 [dev] [4.0.2] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2021-09-27 06:42:58.687 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : [172.17.0.3]:5702 [dev] [4.0.2] [172.17.0.3]:5702 is STARTING
2021-09-27 06:42:58.923 INFO 1 --- [ main] c.h.i.cluster.impl.MulticastJoiner : [172.17.0.3]:5702 [dev] [4.0.2] Trying to join to discovered node: [172.17.0.3]:5701
2021-09-27 06:42:58.932 INFO 1 --- [cached.thread-3] c.h.internal.nio.tcp.TcpIpConnector : [172.17.0.3]:5702 [dev] [4.0.2] Connecting to /172.17.0.3:5701, timeout: 10000, bind-any: false
2021-09-27 06:42:58.955 INFO 1 --- [.IO.thread-in-0] c.h.internal.nio.tcp.TcpIpConnection : [172.17.0.3]:5702 [dev] [4.0.2] Initialized new cluster connection between /172.17.0.3:40242 and /172.17.0.3:5701
2021-09-27 06:43:04.948 INFO 1 --- [21ad30a.event-3] c.h.c.impl.spi.ClientClusterService : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2]
Members [2] {
Member [172.17.0.3]:5701 - c967f642-a7aa-4deb-a530-b56fb8f68c78
Member [172.17.0.3]:5702 - 08dfe633-46b2-4581-94c7-81b6d0bc3ce3
}
2021-09-27 06:43:04.959 WARN 1 --- [ration.thread-0] c.h.c.i.operation.OnJoinCacheOperation : [172.17.0.3]:5702 [dev] [4.0.2] This member is joining a cluster whose members support JCache, however the cache-api artifact is missing from this member's classpath. In case JCache API will be used, add cache-api artifact in this member's classpath and restart the member.
2021-09-27 06:43:04.963 INFO 1 --- [ration.thread-0] c.h.internal.cluster.ClusterService : [172.17.0.3]:5702 [dev] [4.0.2]
Members {size:2, ver:2} [
Member [172.17.0.3]:5701 - c967f642-a7aa-4deb-a530-b56fb8f68c78
Member [172.17.0.3]:5702 - 08dfe633-46b2-4581-94c7-81b6d0bc3ce3 this
]
2021-09-27 06:43:05.466 INFO 1 --- [ration.thread-1] c.h.c.i.p.t.AuthenticationMessageTask : [172.17.0.3]:5702 [dev] [4.0.2] Received auth from Connection[id=2, /172.17.0.3:5702->/172.17.0.3:40773, qualifier=null, endpoint=[172.17.0.3]:40773, alive=true, connectionType=JVM], successfully authenticated, clientUuid: 8843f057-c856-4739-80ae-4bc930559bd5, client version: 4.0.2
2021-09-27 06:43:05.468 INFO 1 --- [d30a.internal-3] c.h.c.i.c.ClientConnectionManager : b1bdd9bb-2879-4161-95fd-2b6e321ad30a [dev] [4.0.2] Authenticated with server [172.17.0.3]:5702:08dfe633-46b2-4581-94c7-81b6d0bc3ce3, server version: 4.0.2, local address: /172.17.0.3:40773
2021-09-27 06:43:05.968 INFO 1 --- [ main] com.hazelcast.core.LifecycleService : [172.17.0.3]:5702 [dev] [4.0.2] [172.17.0.3]:5702 is STARTED
2021-09-27 06:43:06.237 INFO 1 --- [ main] o.s.b.web.embedded.netty.NettyWebServer : Netty started on port 8081
2021-09-27 06:43:06.251 INFO 1 --- [ main] com.caching.Application : Started Application in 17.32 seconds (JVM running for 21.02)
Here is the Hazelcast management center member list,
Finally my question is,
Why I'm seeing 2 members, where there is only one sidecar cache container deployed?
What modification I will be required to reach my initial goal?
According to Spring Boot documentation for Hazelcast feature:
If a client can’t be created, Spring Boot attempts to configure an embedded server.
Spring Boot starts an embedded server from your hazelcast.yaml from the application container and joins to Hazelcast container using multicast.
You should replace your hazelcast.yaml in the Spring Boot app container with hazelcast-client.yaml with the following content:
hazelcast-client:
cluster-name: "dev"
network:
cluster-members:
- "127.0.0.1:5701"
After doing that Spring Boot will autoconfigure client HazelcastInstance bean and you will be able to change your cache client like this:
#Component
public class CacheClient {
private static final String ITEMS = "items";
private final HazelcastInstance client;
public CacheClient(HazelcastInstance client) {
this.client = client;
}
public Item put(String number, Item item){
IMap<String, Item> map = client.getMap(ITEMS);
return map.putIfAbsent(number, item);
}
public Item get(String key){
IMap<String, Item> map = client.getMap(ITEMS);
return map.get(key);
}
}

Spark-submit job fails on yarn nodemanager with error Client cannot authenticate via:[TOKEN, KERBEROS]

I am running spark-submit in yarn client mode. Yarn has been setup with HDP sandbox with kerberos enabled. HDP Sandbox is running on docker container on Mac host.
When spark submit is run from within the docker container of the sandbox, it’s runs successfully but when spark submit is run from the host machine it fails immediately after ACCEPTED state with error:
19/07/28 00:41:21 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:22 INFO yarn.Client: Application report for application_1564298049378_0008 (state: ACCEPTED)
19/07/28 00:41:23 INFO yarn.Client: Application report for application_1564298049378_0008 (state: FAILED)
19/07/28 00:41:23 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1564298049378_0008 failed 2 times due to AM Container for appattempt_1564298049378_0008_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: (Client.java:1558)
... 37 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
I could not find any more information about the failure. Any help will be greatly appreciated.
Here is the resourcemanager log:
2019-07-28 22:39:04,654 INFO resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(341)) - Allocated new applicationId: 20
2019-07-28 22:39:10,982 INFO capacity.CapacityScheduler (CapacityScheduler.java:checkAndGetApplicationPriority(2526)) - Application 'application_1564332457320_0020' is submitted without priority hence considering default queue/cluster priority: 0
2019-07-28 22:39:10,982 INFO capacity.CapacityScheduler (CapacityScheduler.java:checkAndGetApplicationPriority(2547)) - Priority '0' is acceptable in queue : santosh for application: application_1564332457320_0020
2019-07-28 22:39:10,983 WARN rmapp.RMAppImpl (RMAppImpl.java:(473)) - The specific max attempts: 0 for application: 20 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2019-07-28 22:39:10,983 INFO collector.TimelineCollectorManager (TimelineCollectorManager.java:putIfAbsent(142)) - the collector for application_1564332457320_0020 was added
2019-07-28 22:39:10,984 INFO resourcemanager.ClientRMService (ClientRMService.java:submitApplication(648)) - Application with id 20 submitted by user santosh
2019-07-28 22:39:10,984 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleAppSubmitEvent(458)) - application_1564332457320_0020 found existing hdfs token Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20)
2019-07-28 22:39:11,011 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= [Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020]]
2019-07-28 22:39:11,011 INFO security.DelegationTokenRenewer (DelegationTokenRenewer.java:setTimerForTokenRenewal(613)) - Renew Kind: HDFS_DELEGATION_TOKEN, Service: 192.168.50.1:8020, Ident: (token for santosh: HDFS_DELEGATION_TOKEN owner=santosh#XXX.XX, renewer=yarn, realUser=, issueDate=1564353550169, maxDate=1564958350169, sequenceNumber=125, masterKeyId=20);exp=1564439951007; apps=[application_1564332457320_0020] in 86399996 ms, appId = [application_1564332457320_0020]
2019-07-28 22:39:11,011 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1259)) - Storing application with id application_1564332457320_0020
2019-07-28 22:39:11,012 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from NEW to NEW_SAVING on event = START
2019-07-28 22:39:11,012 INFO recovery.RMStateStore (RMStateStore.java:transition(222)) - Storing info for app: application_1564332457320_0020
2019-07-28 22:39:11,022 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
2019-07-28 22:39:11,022 INFO capacity.ParentQueue (ParentQueue.java:addApplication(494)) - Application added - appId: application_1564332457320_0020 user: santosh leaf-queue of parent: root #applications: 1
2019-07-28 22:39:11,023 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplication(990)) - Accepted application application_1564332457320_0020 from user: santosh, in queue: santosh
2019-07-28 22:39:11,023 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
2019-07-28 22:39:11,023 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(479)) - Registering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,024 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:11,024 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(911)) - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:11,025 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(941)) - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:11,025 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(1036)) - Added Application Attempt appattempt_1564332457320_0020_000001 to scheduler from user santosh in queue santosh
2019-07-28 22:39:11,028 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:11,033 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1564332457320_0020_000001 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:11,034 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:11,035 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e20_1564332457320_0020_01_000001 of capacity on host sandbox-hdp.hortonworks.com:45454, which has 1 containers, used and available after allocation
2019-07-28 22:39:11,038 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : sandbox-hdp.hortonworks.com:45454 for container : container_e20_1564332457320_0020_01_000001
2019-07-28 22:39:11,043 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:11,043 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,044 INFO capacity.ParentQueue (ParentQueue.java:apply(1332)) - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:11,044 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2890)) - Allocation proposal accepted
2019-07-28 22:39:11,044 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(2213)) - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000001 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:11,051 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:11,057 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:11,060 INFO amlauncher.AMLauncher (AMLauncher.java:run(307)) - Launching masterappattempt_1564332457320_0020_000001
2019-07-28 22:39:11,068 INFO amlauncher.AMLauncher (AMLauncher.java:launch(109)) - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,069 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO amlauncher.AMLauncher (AMLauncher.java:launch(130)) - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_01_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000001
2019-07-28 22:39:11,265 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:11,852 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:updateAppCollectorsMap(713)) - Update collector information for application application_1564332457320_0020 with new address: sandbox-hdp.hortonworks.com:35197 timestamp: 1564332457320, 36
2019-07-28 22:39:11,854 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:12,833 INFO provider.BaseAuditHandler (BaseAuditHandler.java:logStatus(312)) - Audit Status Log: name=yarn.async.batch.hdfs, interval=01:11.979 minutes, events=162, succcessCount=162, totalEvents=17347, totalSuccessCount=17347
2019-07-28 22:39:12,834 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:logJSON(179)) - Flushing HDFS audit. Event Size:1
2019-07-28 22:39:12,857 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:updateAppCollectorsMap(713)) - Update collector information for application application_1564332457320_0020 with new address: sandbox-hdp.hortonworks.com:35197 timestamp: 1564332457320, 37
2019-07-28 22:39:14,054 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_01_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:14,055 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1412)) - Updating application attempt appattempt_1564332457320_0020_000001 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:14,055 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:14,066 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(496)) - Unregistering app attempt : appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1564332457320_0020_000001
2019-07-28 22:39:14,066 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000001 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:14,067 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1538)) - The number of failed attempts is 1. The max attempts is 2
2019-07-28 22:39:14,067 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(479)) - Registering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,067 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from NEW to SUBMITTED on event = START
2019-07-28 22:39:14,067 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(1085)) - Application Attempt appattempt_1564332457320_0020_000001 is done. finalState=FAILED
2019-07-28 22:39:14,067 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(159)) - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:14,067 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(1003)) - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:14,068 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(911)) - Application application_1564332457320_0020 from user: santosh activated in queue: santosh
2019-07-28 22:39:14,068 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(941)) - Application added - appId: application_1564332457320_0020 user: santosh, leaf-queue: santosh #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1
2019-07-28 22:39:14,068 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(1036)) - Added Application Attempt appattempt_1564332457320_0020_000002 to scheduler from user santosh in queue santosh
2019-07-28 22:39:14,068 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
2019-07-28 22:39:14,074 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(129)) - assignedContainer application attempt=appattempt_1564332457320_0020_000002 container=null queue=santosh clusterResource= type=OFF_SWITCH requestedPartition=
2019-07-28 22:39:14,074 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from NEW to ALLOCATED
2019-07-28 22:39:14,075 INFO fica.FiCaSchedulerNode (FiCaSchedulerNode.java:allocateContainer(169)) - Assigned container container_e20_1564332457320_0020_02_000001 of capacity on host sandbox-hdp.hortonworks.com:45454, which has 1 containers, used and available after allocation
2019-07-28 22:39:14,075 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : sandbox-hdp.hortonworks.com:45454 for container : container_e20_1564332457320_0020_02_000001
2019-07-28 22:39:14,076 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from ALLOCATED to ACQUIRED
2019-07-28 22:39:14,076 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,076 INFO capacity.ParentQueue (ParentQueue.java:apply(1332)) - assignedContainer queue=root usedCapacity=0.25 absoluteUsedCapacity=0.25 used= cluster=
2019-07-28 22:39:14,076 INFO capacity.CapacityScheduler (CapacityScheduler.java:tryCommit(2890)) - Allocation proposal accepted
2019-07-28 22:39:14,076 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(2213)) - Storing attempt: AppId: application_1564332457320_0020 AttemptId: appattempt_1564332457320_0020_000002 MasterContainer: Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ]
2019-07-28 22:39:14,077 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2019-07-28 22:39:14,088 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2019-07-28 22:39:14,089 INFO amlauncher.AMLauncher (AMLauncher.java:run(307)) - Launching masterappattempt_1564332457320_0020_000002
2019-07-28 22:39:14,091 INFO amlauncher.AMLauncher (AMLauncher.java:launch(109)) - Setting up container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,092 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO amlauncher.AMLauncher (AMLauncher.java:launch(130)) - Done launching container Container: [ContainerId: container_e20_1564332457320_0020_02_000001, AllocationRequestId: -1, Version: 0, NodeId: sandbox-hdp.hortonworks.com:45454, NodeHttpAddress: sandbox-hdp.hortonworks.com:8042, Resource: , Priority: 0, Token: Token { kind: ContainerToken, service: 172.18.0.3:45454 }, ExecutionType: GUARANTEED, ] for AM appattempt_1564332457320_0020_000002
2019-07-28 22:39:14,110 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2019-07-28 22:39:15,056 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from ACQUIRED to RUNNING
2019-07-28 22:39:16,752 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(490)) - container_e20_1564332457320_0020_02_000001 Container Transitioned from RUNNING to COMPLETED
2019-07-28 22:39:16,755 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1412)) - Updating application attempt appattempt_1564332457320_0020_000002 with final state: FAILED, and exit status: -1000
2019-07-28 22:39:16,755 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
2019-07-28 22:39:16,899 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(496)) - Unregistering app attempt : appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1564332457320_0020_000002
2019-07-28 22:39:16,900 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(925)) - appattempt_1564332457320_0020_000002 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1538)) - The number of failed attempts is 2. The max attempts is 2
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(1278)) - Updating application application_1564332457320_0020 with final state: FAILED
2019-07-28 22:39:16,900 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(912)) - application_1564332457320_0020 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
2019-07-28 22:39:16,900 INFO recovery.RMStateStore (RMStateStore.java:transition(260)) - Updating info for app: application_1564332457320_0020
2019-07-28 22:39:16,900 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(1085)) - Application Attempt appattempt_1564332457320_0020_000002 is done. finalState=FAILED
2019-07-28 22:39:16,901 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(159)) - Application application_1564332457320_0020 requests cleared
2019-07-28 22:39:16,901 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(1003)) - Application removed - appId: application_1564332457320_0020 user: santosh queue: santosh #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2019-07-28 22:39:16,916 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1197)) - Application application_1564332457320_0020 failed 2 times due to AM Container for appattempt_1564332457320_0020_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: (Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1389)
... 37 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:173)
at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:390)
at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:410)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:796)
... 40 more
Caused by: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

Why does a UnicastProcessor plus ConnectableFlux send previously emitted items downstream on autoConnect but not on connect()

I have this upstream Publisher that emits a number every second:
private fun counter(emissionIntervalMillis: Long) =
Flux.interval(Duration.ofMillis(emissionIntervalMillis))
.map { it }.log()
Consider this implementation in which a UnicastProcessor subscribes to the previous Flux. In addition there is a ConnectableFlux generated with processor.publish().autoConnect(). Finally I subscribe to this ConnectableFlux:
val latch = CountDownLatch(15)
val numberGenerator: Flux<Long> = counter(1000)
val processor = UnicastProcessor.create<Long>()
numberGenerator.subscribeWith(processor)
val connectableFlux = processor.doOnSubscribe { println("subscribed!") }.publish().autoConnect()
Thread.sleep(5000)
connectableFlux.subscribe {
logger.info("Element [{}]", it)
latch.countDown()
}
latch.await()
Logs:
15:58:26.941 [main] DEBUG reactor.util.Loggers$LoggerFactory - Using Slf4j logging framework
15:58:26.967 [main] INFO reactor.Flux.Map.1 - onSubscribe(FluxMap.MapSubscriber)
15:58:26.969 [main] INFO reactor.Flux.Map.1 - request(unbounded)
15:58:27.973 [parallel-1] INFO reactor.Flux.Map.1 - onNext(0)
15:58:28.973 [parallel-1] INFO reactor.Flux.Map.1 - onNext(1)
15:58:29.975 [parallel-1] INFO reactor.Flux.Map.1 - onNext(2)
15:58:30.974 [parallel-1] INFO reactor.Flux.Map.1 - onNext(3)
15:58:31.974 [parallel-1] INFO reactor.Flux.Map.1 - onNext(4)
subscribed!
15:58:31.979 [main] INFO com.codependent.processors.Tests - Element [0]
15:58:31.980 [main] INFO com.codependent.processors.Tests - Element [1]
15:58:31.980 [main] INFO com.codependent.processors.Tests - Element [2]
15:58:31.980 [main] INFO com.codependent.processors.Tests - Element [3]
15:58:31.980 [main] INFO com.codependent.processors.Tests - Element [4]
15:58:32.972 [parallel-1] INFO reactor.Flux.Map.1 - onNext(5)
15:58:32.972 [parallel-1] INFO com.codependent.processors.Tests - Element [5]
As you see, when there is a subscriber to the connectableFlux, it gets the previously generated items which were cached by the UnicastProcessor. I guess this is the expected behaviour:
if you push any amount of data through it while its Subscriber has not
yet requested data, it will buffer all of the data.
Now, instead of using autoConnect I use connect():
val latch = CountDownLatch(15)
val numberGenerator: Flux<Long> = counter(1000)
val processor = UnicastProcessor.create<Long>()
numberGenerator.subscribeWith(processor)
val connectableFlux = processor.doOnSubscribe { println("subscribed!") }.publish()
connectableFlux.connect()
Thread.sleep(5000)
connectableFlux.subscribe {
logger.info("Element [{}]", it)
latch.countDown()
}
The result now quite different, the subscriber doesn't get the items that should've been cached by the UnicastProcessor. Can someone explain the difference?
16:08:44.299 [main] DEBUG reactor.util.Loggers$LoggerFactory - Using Slf4j logging framework
16:08:44.324 [main] INFO reactor.Flux.Map.1 - onSubscribe(FluxMap.MapSubscriber)
16:08:44.326 [main] INFO reactor.Flux.Map.1 - request(unbounded)
subscribed!
16:08:45.330 [parallel-1] INFO reactor.Flux.Map.1 - onNext(0)
16:08:46.329 [parallel-1] INFO reactor.Flux.Map.1 - onNext(1)
16:08:47.329 [parallel-1] INFO reactor.Flux.Map.1 - onNext(2)
16:08:48.331 [parallel-1] INFO reactor.Flux.Map.1 - onNext(3)
16:08:49.330 [parallel-1] INFO reactor.Flux.Map.1 - onNext(4)
16:08:50.328 [parallel-1] INFO reactor.Flux.Map.1 - onNext(5)
16:08:50.328 [parallel-1] INFO com.codependent.processors.Tests - Element [5]
16:08:51.332 [parallel-1] INFO reactor.Flux.Map.1 - onNext(6)
16:08:51.332 [parallel-1] INFO com.codependent.processors.Tests - Element [6]
After rereading the docs I found that autoConnect() can pass the minimum number of subscribers necessary to subscribe to the upstream. Changing it to autoConnect(0) has the same effect as connect(), not passing the previous items to the subscriber:
val latch = CountDownLatch(15)
val numberGenerator: Flux<Long> = counter(1000)
val processor = UnicastProcessor.create<Long>()
numberGenerator.subscribeWith(processor)
val connectableFlux = processor.doOnSubscribe { println("subscribed!") }.log().publish().autoConnect(0)
Thread.sleep(5000)
connectableFlux.subscribe {
logger.info("Element [{}]", it)
latch.countDown()
}
latch.await()
It seems that since the connectableFlux is ready (connected), the processor gets the OnSubscribe signal, and as there aren't any actual subscribers for the connectableFlux, it discards the items.
Changing publish() to replay() would make the subscriber get the items from the beginning , as stated in the doc.
val connectableFlux = processor.doOnSubscribe { println("subscribed!") }.log().replay().autoConnect(0)

Flume :Exec source cat command is not writing on HDFS

i'm trying to write data into Hdfs using Flume-ng for exec source.But it always ended with exit code 127.and it's also showing warning like
Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null).
This is exec.conf file
execAgent.sources=e
execAgent.channels=memchannel
execAgent.sinks=HDFS
execAgent.sources.e.type=org.apache.flume.source.ExecSource
execAgent.sources.e.channels=memchannel
execAgent.sources.e.shell=/bin/bash
execAgent.sources.e.command=tail -f /home/sample.txt
execAgent.sinks.HDFS.type=hdfs
execAgent.sinks.HDFS.channel=memchannel
execAgent.sinks.HDFS.hdfs.path=hdfs://ip:address:port/user/flume/
execAgent.sinks.HDFS.hdfs.fileType=DataStream
execAgent.sinks.HDFS.hdfs.writeFormat=Text
execAgent.channels.memchannel.type=file
execAgent.channels.memchannel.capacity=1000
execAgent.channels.memchannel.transactionCapacity=100
execAgent.sources.e.channels=memchannel
execAgent.sinks.HDFS.channel=memchannel
this is the output i'm getting on console
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
15/04/17 06:24:54 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:exec.conf
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: execAgent
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:54 INFO conf.FlumeConfiguration: Processing:HDFS
15/04/17 06:24:55 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [execAgent]
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Creating channels
15/04/17 06:24:55 INFO channel.DefaultChannelFactory: Creating instance of channel memchannel type file
15/04/17 06:24:55 INFO node.AbstractConfigurationProvider: Created channel memchannel
15/04/17 06:24:55 INFO source.DefaultSourceFactory: Creating instance of source e, type org.apache.flume.source.ExecSource
15/04/17 06:24:55 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
15/04/17 06:24:56 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
15/04/17 06:24:56 INFO node.AbstractConfigurationProvider: Channel memchannel connected to [e, HDFS]
15/04/17 06:24:56 INFO node.Application: Starting new configuration:{ sourceRunners:{e=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:e,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor#2577d2c2 counterGroup:{ name:null counters:{} } }} channels:{memchannel=FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }} }
15/04/17 06:24:56 INFO node.Application: Starting Channel memchannel
15/04/17 06:24:56 INFO file.FileChannel: Starting FileChannel memchannel { dataDirs: [/root/.flume/file-channel/data] }...
15/04/17 06:24:56 INFO file.Log: Encryption is not enabled
15/04/17 06:24:56 INFO file.Log: Replay started
15/04/17 06:24:56 INFO file.Log: Found NextFileID 0, from []
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFile: Preallocated /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 to 16232 for capacity 1000
15/04/17 06:24:56 INFO file.EventQueueBackingStoreFileV3: Starting up with /root/.flume/file-channel/checkpoint/checkpoint_1429251896225 and /root/.flume/file-channel/checkpoint/checkpoint_1429251896225.meta
15/04/17 06:24:57 INFO file.Log: Last Checkpoint Fri Apr 17 06:24:56 UTC 2015, queue depth = 0
15/04/17 06:24:57 INFO file.Log: Replaying logs with v2 replay logic
15/04/17 06:24:57 INFO file.ReplayHandler: Starting replay of []
15/04/17 06:24:57 INFO file.ReplayHandler: read: 0, put: 0, take: 0, rollback: 0, commit: 0, skip: 0, eventCount:0
15/04/17 06:24:57 INFO file.Log: Rolling /root/.flume/file-channel/data
15/04/17 06:24:57 INFO file.Log: Roll start /root/.flume/file-channel/data
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Unable to get maxDirectMemory from VM: NoSuchMethodException: sun.misc.VM.maxDirectMemory(null)
15/04/17 06:24:57 INFO tools.DirectMemoryUtils: Direct Memory Allocation: Allocation = 1048576, Allocated = 0, MaxDirectMemorySize = 18874368, Remaining = 18874368
15/04/17 06:24:57 INFO file.LogFile: Opened /root/.flume/file-channel/data/log-1
15/04/17 06:24:57 INFO file.Log: Roll end
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/.flume/file-channel/checkpoint/checkpoint_1429251896225, elements to sync = 0
15/04/17 06:24:57 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1429251897136, queueSize: 0, queueHead: 0
15/04/17 06:24:57 INFO file.Log: Updated checkpoint for file: /root/.flume/file-channel/data/log-1 position: 0 logWriteOrderID: 1429251897136
15/04/17 06:24:57 INFO file.FileChannel: Queue Size after replay: 0 [channel=memchannel]
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: memchannel, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memchannel started
15/04/17 06:24:57 INFO node.Application: Starting Sink HDFS
15/04/17 06:24:57 INFO node.Application: Starting Source e
15/04/17 06:24:57 INFO source.ExecSource: Exec source starting with command:tail -f /home/sample.txt
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SOURCE, name: e, registered successfully.
15/04/17 06:24:57 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: e started
15/04/17 06:24:57 INFO source.ExecSource: Command [tail -f /home/brillio/sample.txt] exited with 127
From the source documentation
1) Modify the parameter : execAgent.sources.e.type to exec
2) Remove the execAgent.sources.e.shell parameter from your configuration
Check permission to see if user can run tail -f /home/brillio/sample.txt on your target dir

TestDFSIO fails with exitcode -1000

I setup a two node hadoop cluster. After having started the cluster it looks like this:
machine namenode:
hadoop#namenode:~$ jps
5691 Jps
3531 DataNode
3424 NameNode
3669 SecondaryNameNode
3822 ResourceManager
3908 NodeManager
second machine datanode:
hadoop#datanode:~$ jps
3716 Jps
2137 DataNode
2231 NodeManager
So, after having started the cluster I tried to perform a standard benchmark:
hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10
However the job fails and the config file contain the following messages:
On the datanode:
hadoop#datanode:~$ cat /opt/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-datanode.log
...
2014-02-18 16:37:41,567 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 3547 for container-id container_1392741263071_0001_02_000001: 26.2 MB of 2 GB physical memory used; 1.2 GB of 4.2 GB virtual memory used
2014-02-18 16:37:42,158 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:43,166 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:44,171 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:44,579 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 3547 for container-id container_1392741263071_0001_02_000001: 95.3 MB of 2 GB physical memory used; 1.3 GB of 4.2 GB virtual memory used
2014-02-18 16:37:45,180 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:46,183 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:47,189 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:47,584 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 3547 for container-id container_1392741263071_0001_02_000001: 108.1 MB of 2 GB physical memory used; 1.3 GB of 4.2 GB virtual memory used
2014-02-18 16:37:48,196 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 2 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:49,157 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1392741263071_0001_02_000001 is : 1
2014-02-18 16:37:49,157 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1392741263071_0001_02_000001 and exit code: 1
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2014-02-18 16:37:49,159 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor:
2014-02-18 16:37:49,159 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 1
2014-02-18 16:37:49,160 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1392741263071_0001_02_000001 transitioned from RUNNING to EXITED_WITH_FAILURE
2014-02-18 16:37:49,160 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1392741263071_0001_02_000001
2014-02-18 16:37:49,172 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /home/hadoop/hadoop/yarn-data/usercache/hadoop/appcache/application_1392741263071_0001/container_1392741263071_0001_02_000001
2014-02-18 16:37:49,173 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hadoop OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_1392741263071_0001 CONTAINERID=container_1392741263071_0001_02_000001
...
On the namenode:
hadoop#namenode:/opt/hadoop-2.2.0/logs$ cat yarn-hadoop-*.log
2014-02-18 16:34:25,054 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: STARTUP_MSG:
...
2014-02-18 16:37:37,441 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 4493 for container-id container_1392741263071_0001_01_000001: 131.1 MB of 2 GB physical memory used; 1.4 GB of 4.2 GB virtual memory used
2014-02-18 16:37:38,367 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
2014-02-18 16:37:39,369 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out status for container: container_id { app_attempt_id { application_id { id: 1 cluster_timestamp: 1392741263071 } attemptId: 1 } id: 1 } state: C_RUNNING diagnostics: "" exit_status: -1000
...
2014-02-18 16:34:23,131 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: STARTUP_MSG:
...
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Released container container_1392741263071_0001_02_000001 of capacity <memory:2048, vCores:1> on host datanode.c.forward-camera-473.internal:43994, which currently has 0 containers, <memory:0, vCores:0> used and <memory:8192, vCores:8> available, release resources=true
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: default used=<memory:0, vCores:0> numContainers=0 user=hadoop user-resources=<memory:0, vCores:0>
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_1392741263071_0001_02_000001, NodeId: datanode.c.forward-camera-473.internal:43994, NodeHttpAddress: datanode.c.forward-camera-473.internal:8042, Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 10.240.110.76:43994 }, ] resource=<memory:2048, vCores:1> queue=default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0 usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:16384, vCores:16>
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used=<memory:0, vCores:0> cluster=<memory:16384, vCores:16>
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.default stats: default: capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=1, numContainers=0
2014-02-18 16:37:49,186 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1392741263071_0001_000002 released container container_1392741263071_0001_02_000001 on node: host: datanode.c.forward-camera-473.internal:43994 #containers=0 available=8192 used=0 with event: FINISHED
2014-02-18 16:37:49,187 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1392741263071_0001_000002
2014-02-18 16:37:49,187 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1392741263071_0001_000002 State change from RUNNING to FAILED
2014-02-18 16:37:49,187 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1392741263071_0001 failed 2 times due to AM Container for appattempt_1392741263071_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application.
2014-02-18 16:37:49,189 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Removing info for app: application_1392741263071_0001
2014-02-18 16:37:49,194 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1392741263071_0001 State change from RUNNING to FAILED
2014-02-18 16:37:49,194 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application appattempt_1392741263071_0001_000002 is done. finalState=FAILED
2014-02-18 16:37:49,194 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo: Application application_1392741263071_0001 requests cleared
2014-02-18 16:37:49,194 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application removed - appId: application_1392741263071_0001 user: hadoop queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0
2014-02-18 16:37:49,194 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Application removed - appId: application_1392741263071_0001 user: hadoop leaf-queue of parent: root #applications: 0
2014-02-18 16:37:49,204 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1392741263071_0001 failed 2 times due to AM Container for appattempt_1392741263071_0001_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:464)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
.Failing this attempt.. Failing the application. APPID=application_1392741263071_0001
2014-02-18 16:37:49,205 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_1392741263071_0001,name=hadoop-mapreduce-client-jobclient-2.2.0-tests.jar,user=hadoop,queue=default,state=FAILED,trackingUrl=namenode:8088/cluster/app/application_1392741263071_0001,appMasterHost=,startTime=1392741381131,finishTime=1392741469188,finalStatus=FAILED
2014-02-18 16:37:49,205 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Cleaning master appattempt_1392741263071_0001_000002
What is happening?
Look like it can't spawn new java process. Probably your .profile or .bashrc do not setup JAVA_HOME or PATH correctly, and thus the java executable is not accessible.

Resources