Kerberos problem: GSSException: No valid credentials provided - spring-boot

My application is sending data to Kafka, Kerberos is used for authentication. Everything works fine for around 20 days, then I get the following exception:
2020-01-07 22:22:08.481 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient : Initiating connection to node mkav2.dc.ex.com:9092 (id: 101 rack: null)
2020-01-07 22:22:08.481 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.security.authenticator.SaslClientAuthenticator : Set SASL client state to SEND_HANDSHAKE_REQUEST
2020-01-07 22:22:08.481 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.security.authenticator.SaslClientAuthenticator : Creating SaslClient: client=lpa/appX.dc.ex.com#DC.EX.COM;service=kafka;serviceHostname=mkav2.dc.ex.com;mechs=[GSSAPI]
2020-01-07 22:22:08.482 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.network.Selector : Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 131072, SO_TIMEOUT = 0 to node 101
2020-01-07 22:22:08.482 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.security.authenticator.SaslClientAuthenticator : Set SASL client state to RECEIVE_HANDSHAKE_RESPONSE
2020-01-07 22:22:08.482 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient : Completed connection to node 101. Fetching API versions.
2020-01-07 22:22:08.484 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.security.authenticator.SaslClientAuthenticator : Set SASL client state to INITIAL
2020-01-07 22:22:08.484 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.common.network.Selector : Connection with mkav2.dc.ex.com/172.10.15.44 disconnected
javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTH_FAILED state.
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslToken(SaslClientAuthenticator.java:298)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.sendSaslToken(SaslClientAuthenticator.java:215)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:183)
at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:76)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:376)
at org.apache.kafka.common.network.Selector.poll(Selector.java:326)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:433)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:224)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.security.sasl.SaslException: GSS initiate failed
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator$2.run(SaslClientAuthenticator.java:280)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator$2.run(SaslClientAuthenticator.java:278)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslToken(SaslClientAuthenticator.java:278)
... 9 common frames omitted
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 14 common frames omitted
2020-01-07 22:22:08.484 DEBUG 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient : Node 101 disconnected.
2020-01-07 22:22:08.484 WARN 24987 --- [fka-producer-network-thread | producer-1] org.apache.kafka.clients.NetworkClient : Connection to node 101 terminated during authentication. This may indicate that authentication failed due to invalid credentials.
After restarting the application everything works fine for another 20 days or so and then I get the same exception again. These are the ticket properties in krb5.conf file:
ticket_lifetime = 86400
renew_lifetime = 604800
Any ideas on why this could be happening?

Related

ZAP API Scan failing with error Read timed out

I am able to do an API scan as well as generate a report when I run the below command from Windows :
docker run -v "$(pwd):/zap/wrk/:rw" -t owasp/zap2docker-weekly zap-api-scan.py -t http://10.170.170.170:1700 /account?field4=448808888888"&"field7=GENERIC01"&"field10=ABC076 -f openapi -r ZAP_Report.htm
Once I switch to running the same command :
docker run -v $(pwd):/zap/wrk/:rw -t owasp/zap2docker-weekly zap-api-scan.py -t http://10.170.170.170:1700/account?field4=448808888888"&"field7=GENERIC01"&"field10=DCF43 -f openapi -r ~/serverkeys/ZAP_REPORT.htm
from Debian I get an error, not quite sure what I'm missing :
.....
[ZAP-ActiveScanner-1] WARN org.zaproxy.zap.extension.ascanrules.CommandInjectionScanRule - Command Injection vulnerability check failed for parameter [field10] and payload [';cat /etc/passwd;'] due to an I/O error
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
at java.io.BufferedInputStream.fill(BufferedInputStream.java:252) ~[?:?]
at java.io.BufferedInputStream.read(BufferedInputStream.java:271) ~[?:?]
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) ~[commons-httpclient-3.1.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) ~[commons-httpclient-3.1.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1153) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) ~[commons-httpclient-3.1.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:2138) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.zaproxy.zap.ZapGetMethod.readResponse(ZapGetMethod.java:112) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1162) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:470) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:207) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) ~[commons-httpclient-3.1.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.executeMethod(HttpSender.java:430) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.runMethod(HttpSender.java:672) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.send(HttpSender.java:627) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.sendAuthenticated(HttpSender.java:602) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.sendAuthenticated(HttpSender.java:585) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.network.HttpSender.sendAndReceive(HttpSender.java:490) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.core.scanner.AbstractPlugin.sendAndReceive(AbstractPlugin.java:315) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.core.scanner.AbstractPlugin.sendAndReceive(AbstractPlugin.java:246) ~[zap-D-2021-10-25.jar:D-2021-10-25]
at org.zaproxy.zap.extension.ascanrules.CommandInjectionScanRule.testCommandInjection(CommandInjectionScanRule.java:524) [ascanrules-release-42.zap:?]
at org.zaproxy.zap.extension.ascanrules.CommandInjectionScanRule.scan(CommandInjectionScanRule.java:431) [ascanrules-release-42.zap:?]
at org.parosproxy.paros.core.scanner.AbstractAppParamPlugin.scan(AbstractAppParamPlugin.java:201) [zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.core.scanner.AbstractAppParamPlugin.scan(AbstractAppParamPlugin.java:126) [zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.core.scanner.AbstractAppParamPlugin.scan(AbstractAppParamPlugin.java:87) [zap-D-2021-10-25.jar:D-2021-10-25]
at org.parosproxy.paros.core.scanner.AbstractPlugin.run(AbstractPlugin.java:333) [zap-D-2021-10-25.jar:D-2021-10-25]
at java.lang.Thread.run(Thread.java:829) [?:?]
493852 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - completed host/plugin http://10.170.4.117:8002 | CommandInjectionScanRule in 421.201s with 84 message(s) sent and 0 alert(s) raised.
493853 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - start host http://10.170.170.170:1700 | DirectoryBrowsingScanRule strength MEDIUM threshold MEDIUM
493988 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - completed host/plugin http://10.170.170.170:1700 | DirectoryBrowsingScanRule in 0.136s with 2 message(s) sent and 0 alert(s) raised.
493988 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - start host http://10.170.170.170:1700 | BufferOverflowScanRule strength MEDIUM threshold MEDIUM
494126 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - completed host/plugin http://10.170.170.170:1700 | BufferOverflowScanRule in 0.137s with 3 message(s) sent and 0 alert(s) raised.
494126 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - start host http://10.170.170.170:1700 | FormatStringScanRule strength MEDIUM threshold MEDIUM
494287 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - completed host/plugin http://10.170.170.170:1700 | FormatStringScanRule in 0.161s with 9 message(s) sent and 0 alert(s) raised.
494287 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - start host http://10.170.170.170:1700 | CrlfInjectionScanRule strength MEDIUM threshold MEDIUM
494560 [Thread-6] INFO org.parosproxy.paros.core.scanner.HostProcess - completed host/plugin http://10.170.170.170:1700 | CrlfInjectionScanRule in 0.273s with 21 message(s) sent and 0 alert(s) raised.
........
........
Is they any additional tracing I can do on the scan - why its timing out?
It appears the scan is terminating before completing and its also pointing to /etc/passwd ??
You are not necessarily missing anything.
ZAP typically makes loads of requests to the target. Some of those may timeout - thats all this warning is telling you. If you keep getting these then it might be an indication that your site has become unresponsive.

Why is Spring cloud data flow tutorial not showing expected result

I'm trying to complete the first spring cloud dataflow tutorial and I'm not getting the result in the tutorial.
https://dataflow.spring.io/docs/stream-developer-guides/streams/
The tutorial has me use curl to a http source and see the result in the log sink with a tail of a file of stdout.
I do not see the result. I see the startup in the log.
I tail the log
docker exec -it skipper tail -f /path/from/stdout/textbox/in/dashboard
I enter
curl http://localhost:20100 -H "Content-type: text/plain" -d "Happy streaming"
all I see is
2020-10-05 16:30:03.315 INFO 110 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2020-10-05 16:30:03.316 INFO 110 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2020-10-05 16:30:03.322 INFO 110 --- [ main] o.s.s.c.ThreadPoolTaskScheduler : Initializing ExecutorService
2020-10-05 16:30:03.338 INFO 110 --- [ main] s.i.k.i.KafkaMessageDrivenChannelAdapter : started org.springframework.integration.kafka.inbound.KafkaMessageDrivenChannelAdapter#106faf11
2020-10-05 16:30:03.364 INFO 110 --- [container-0-C-1] org.apache.kafka.clients.Metadata : Cluster ID: 2J0QTxzQQmm2bLxFKgRwmA
2020-10-05 16:30:03.574 INFO 110 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 20041 (http) with context path ''
2020-10-05 16:30:03.584 INFO 110 --- [ main] o.s.c.s.a.l.s.k.LogSinkKafkaApplication : Started LogSinkKafkaApplication in 38.086 seconds (JVM running for 40.251)
2020-10-05 16:30:05.852 INFO 110 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=http-ingest] Discovered group coordinator kafka-broker:9092 (id: 2147482646 rack: null)
2020-10-05 16:30:05.857 INFO 110 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-3, groupId=http-ingest] Revoking previously assigned partitions []
2020-10-05 16:30:05.858 INFO 110 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : partitions revoked: []
2020-10-05 16:30:05.858 INFO 110 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=http-ingest] (Re-)joining group
2020-10-05 16:30:08.943 INFO 110 --- [container-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-3, groupId=http-ingest] Successfully joined group with generation 1
2020-10-05 16:30:08.945 INFO 110 --- [container-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-3, groupId=http-ingest] Setting newly assigned partitions [http-ingest.http-0]
2020-10-05 16:30:08.964 INFO 110 --- [container-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-3, groupId=http-ingest] Resetting offset for partition http-ingest.http-0 to offset 0.
2020-10-05 16:30:08.981 INFO 110 --- [container-0-C-1] o.s.c.s.b.k.KafkaMessageChannelBinder$1 : partitions assigned: [http-ingest.http-0]
No Happy streaming
Any suggestions?
Thank you for trying out the developer guides!
From what I can tell, it appears the http | log stream definition in SCDF is submitted without an explicit port. When that is the case, a port gets randomly assigned by Spring Boot when the http-source and log-sink applications start.
If you navigate to your http-source application logs, you will see the application port listed, and that is the port you'd use on the CURL command.
There's this following note about this in the guide for your reference.
If you use the local Data Flow Server, add the following deployment property to set the port to avoid a port collision.
Alternatively, you can deploy the stream with an explicit port in the definition. For example: http --server.port=9004 | log. With that, your CURL would then be:
curl http://localhost:9004 -H "Content-type: text/plain" -d "Happy streaming"

Deploy Spring Eureka Service Registry on Azure with docker image

I created a Docker Image and loaded it via the docker hub. Then is tried to run it as a docker container and exposed the port 8080. The container is listed and the port column shows the following: 0.0.0.0:8080->8080/tcp.
So in my opinion the service is running but I cant access it with the ip-addres/eureka.
How is it possible to open the Eureka dashboard?
Edit:
I changed the port and now the port column of the containers shows: 0.0.0.0:8761->8761/tcp
This is the log:
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v2.1.5.RELEASE)
2019-07-03 12:04:04.714 INFO 1 --- [ main] d.h.d.DiscoveryServiceApplication : No active profile set, falling back to default profiles: default
2019-07-03 12:04:06.481 WARN 1 --- [ main] o.s.boot.actuate.endpoint.EndpointId : Endpoint ID 'service-registry' contains invalid characters, please migrate to a valid format.
2019-07-03 12:04:07.232 INFO 1 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=ce95a042-2fd4-339b-a733-0cc54c83f3f1
2019-07-03 12:04:07.484 INFO 1 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration' of type [org.springframework.cloud.autoconfigure.ConfigurationPropertiesRebinderAutoConfiguration$$EnhancerBySpringCGLIB$$f3fe9d60] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2019-07-03 12:04:08.092 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8761 (http)
2019-07-03 12:04:08.166 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2019-07-03 12:04:08.170 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.19]
2019-07-03 12:04:08.352 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2019-07-03 12:04:08.352 INFO 1 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 3607 ms
2019-07-03 12:04:08.606 WARN 1 --- [ main] c.n.c.sources.URLConfigurationSource : No URLs will be polled as dynamic configuration sources.
2019-07-03 12:04:08.620 INFO 1 --- [ main] c.n.c.sources.URLConfigurationSource : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
2019-07-03 12:04:08.669 INFO 1 --- [ main] c.netflix.config.DynamicPropertyFactory : DynamicPropertyFactory is initialized with configuration sources: com.netflix.config.ConcurrentCompositeConfiguration#1863d2fe
2019-07-03 12:04:10.133 INFO 1 --- [ main] c.s.j.s.i.a.WebApplicationImpl : Initiating Jersey application, version 'Jersey: 1.19.1 03/11/2016 02:08 PM'
2019-07-03 12:04:10.331 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using JSON encoding codec LegacyJacksonJson
2019-07-03 12:04:10.338 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using JSON decoding codec LegacyJacksonJson
2019-07-03 12:04:10.685 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using XML encoding codec XStreamXml
2019-07-03 12:04:10.694 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using XML decoding codec XStreamXml
2019-07-03 12:04:11.443 WARN 1 --- [ main] o.s.c.n.a.ArchaiusAutoConfiguration : No spring.application.name found, defaulting to 'application'
2019-07-03 12:04:11.445 WARN 1 --- [ main] c.n.c.sources.URLConfigurationSource : No URLs will be polled as dynamic configuration sources.
2019-07-03 12:04:11.445 INFO 1 --- [ main] c.n.c.sources.URLConfigurationSource : To enable URLs as dynamic configuration sources, define System property archaius.configurationSource.additionalUrls or make config.properties available on classpath.
2019-07-03 12:04:11.843 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor'
2019-07-03 12:04:12.932 INFO 1 --- [ main] o.s.c.n.eureka.InstanceInfoFactory : Setting initial instance status as: STARTING
2019-07-03 12:04:13.018 INFO 1 --- [ main] com.netflix.discovery.DiscoveryClient : Initializing Eureka in region us-east-1
2019-07-03 12:04:13.018 INFO 1 --- [ main] com.netflix.discovery.DiscoveryClient : Client configured to neither register nor query for data.
2019-07-03 12:04:13.041 INFO 1 --- [ main] com.netflix.discovery.DiscoveryClient : Discovery Client initialized at timestamp 1562155453040 with initial instances count: 0
2019-07-03 12:04:13.162 INFO 1 --- [ main] c.n.eureka.DefaultEurekaServerContext : Initializing ...
2019-07-03 12:04:13.169 INFO 1 --- [ main] c.n.eureka.cluster.PeerEurekaNodes : Adding new peer nodes [http://localhost:8761/eureka/]
2019-07-03 12:04:13.569 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using JSON encoding codec LegacyJacksonJson
2019-07-03 12:04:13.570 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using JSON decoding codec LegacyJacksonJson
2019-07-03 12:04:13.571 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using XML encoding codec XStreamXml
2019-07-03 12:04:13.571 INFO 1 --- [ main] c.n.d.provider.DiscoveryJerseyProvider : Using XML decoding codec XStreamXml
2019-07-03 12:04:13.841 INFO 1 --- [ main] c.n.eureka.cluster.PeerEurekaNodes : Replica node URL: http://localhost:8761/eureka/
2019-07-03 12:04:13.859 INFO 1 --- [ main] c.n.e.registry.AbstractInstanceRegistry : Finished initializing remote region registries. All known remote regions: []
2019-07-03 12:04:13.860 INFO 1 --- [ main] c.n.eureka.DefaultEurekaServerContext : Initialized
2019-07-03 12:04:13.890 INFO 1 --- [ main] o.s.b.a.e.web.EndpointLinksResolver : Exposing 2 endpoint(s) beneath base path '/actuator'
2019-07-03 12:04:14.103 INFO 1 --- [ main] o.s.c.n.e.s.EurekaServiceRegistry : Registering application unknown with eureka with status UP
2019-07-03 12:04:14.123 INFO 1 --- [ Thread-11] o.s.c.n.e.server.EurekaServerBootstrap : Setting the eureka configuration..
2019-07-03 12:04:14.138 INFO 1 --- [ Thread-11] o.s.c.n.e.server.EurekaServerBootstrap : Eureka data center value eureka.datacenter is not set, defaulting to default
2019-07-03 12:04:14.139 INFO 1 --- [ Thread-11] o.s.c.n.e.server.EurekaServerBootstrap : Eureka environment value eureka.environment is not set, defaulting to test
2019-07-03 12:04:14.181 INFO 1 --- [ Thread-11] o.s.c.n.e.server.EurekaServerBootstrap : isAws returned false
2019-07-03 12:04:14.182 INFO 1 --- [ Thread-11] o.s.c.n.e.server.EurekaServerBootstrap : Initialized server context
2019-07-03 12:04:14.182 INFO 1 --- [ Thread-11] c.n.e.r.PeerAwareInstanceRegistryImpl : Got 1 instances from neighboring DS node
2019-07-03 12:04:14.182 INFO 1 --- [ Thread-11] c.n.e.r.PeerAwareInstanceRegistryImpl : Renew threshold is: 1
2019-07-03 12:04:14.183 INFO 1 --- [ Thread-11] c.n.e.r.PeerAwareInstanceRegistryImpl : Changing status to UP
2019-07-03 12:04:14.225 INFO 1 --- [ Thread-11] e.s.EurekaServerInitializerConfiguration : Started Eureka Server
2019-07-03 12:04:14.253 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8761 (http) with context path ''
2019-07-03 12:04:14.259 INFO 1 --- [ main] d.h.d.DiscoveryServiceApplication : Started DiscoveryServiceApplication in 12.423 seconds (JVM running for 13.3)
2019-07-03 12:05:14.187 INFO 1 --- [a-EvictionTimer] c.n.e.registry.AbstractInstanceRegistry : Running the evict task with compensationTime 0ms
2019-07-03 12:06:14.187 INFO 1 --- [a-EvictionTimer] c.n.e.registry.AbstractInstanceRegistry : Running the evict task with compensationTime 0ms
Container information
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8994d80a4ee2 marcelv93/service-discovery:latest "java -Djava.securit…" 6 minutes ago Up 6 minutes 0.0.0.0:8761->8761/tcp api
For your issue, you cannot access the container. As I know the possible reasons below:
the container is not running well.
the container is running well, but the application is not running well inside the container.
both container and the application is running well, but you expose a wrong port. The port is not the one that the application listens to inside the container.
So you need to check it for the point above. And I find the default port is 8761 and you need to set the configuration. Take a look at the Containerize-Spring-Cloud-Eureka-Server.

Apache Atlas quickstart - kafka error

Env: no kerberos, no ranger, no hdfs. EC2 with ssl.
Getting this error after running $ATLAS_HOME/bin/quick_start.py https://$componentPrivateDNSRecord:21443 with correct user/pass
Creating sample types:
Created type [DB]
Created type [Table]
Created type [StorageDesc]
Created type [Column]
Created type [LoadProcess]
Created type [View]
Created type [JdbcAccess]
Created type [ETL]
Created type [Metric]
Created type [PII]
Created type [Fact]
Created type [Dimension]
Created type [Log Data]
Creating sample entities:
Exception in thread "main" com.sun.jersey.api.client.ClientHandlerException: java.net.SocketTimeoutException: Read timed out
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.filter.HTTPBasicAuthFilter.handle(HTTPBasicAuthFilter.java:105)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at com.sun.jersey.api.client.WebResource$Builder.method(WebResource.java:634)
at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:334)
at org.apache.atlas.AtlasBaseClient.callAPIWithResource(AtlasBaseClient.java:311)
at org.apache.atlas.AtlasBaseClient.callAPI(AtlasBaseClient.java:199)
at org.apache.atlas.AtlasClientV2.createEntity(AtlasClientV2.java:277)
at org.apache.atlas.examples.QuickStartV2.createInstance(QuickStartV2.java:339)
at org.apache.atlas.examples.QuickStartV2.createDatabase(QuickStartV2.java:362)
at org.apache.atlas.examples.QuickStartV2.createEntities(QuickStartV2.java:268)
at org.apache.atlas.examples.QuickStartV2.runQuickstart(QuickStartV2.java:150)
at org.apache.atlas.examples.QuickStartV2.main(QuickStartV2.java:132)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:347)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:253)
at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 14 more
No sample data added to Apache Atlas Server.
Relevant code:
https://github.com/apache/incubator-atlas/blob/master/webapp/src/main/java/org/apache/atlas/examples/QuickStartV2.java
#This works
quickStartV2.createTypes();
#This errors
quickStartV2.createEntities();
First I thought atlas->kafka connectivity was issue but then I see:
[ec2-user#ip-10-160-187-181 logs]$ cat atlas_kafka_setup.log
2018-07-25 00:06:14,923 INFO - [main:] ~ Looking for atlas-application.properties in classpath (ApplicationProperties:78)
2018-07-25 00:06:14,926 INFO - [main:] ~ Loading atlas-application.properties from file:/home/ec2-user/atlas/distro/target/apache-atlas-1.0.0-SNAPSHOT-bin/apache-atlas-1.0.0-SNAPSHOT/conf/atlas-application.properties (ApplicationProperties:91)
2018-07-25 00:06:16,512 WARN - [main:] ~ Attempting to create topic ATLAS_HOOK (AtlasTopicCreator:72)
2018-07-25 00:06:17,004 WARN - [main:] ~ Created topic ATLAS_HOOK with partitions 1 and replicas 1 (AtlasTopicCreator:119)
2018-07-25 00:06:17,004 WARN - [main:] ~ Attempting to create topic ATLAS_ENTITIES (AtlasTopicCreator:72)
2018-07-25 00:06:17,024 WARN - [main:] ~ Created topic ATLAS_ENTITIES with partitions 1 and replicas 1 (AtlasTopicCreator:119)
2018-07-25 01:49:45,147 DEBUG - [main:] ~ Calling API [ GET : api/atlas/v2/types/typedefs ] (AtlasBaseClient:319)
2018-07-25 01:49:45,147 DEBUG - [main:] ~ Attempting to configure HTTPS connection using client configuration (SecureClientUtils$4:221)
2018-07-25 01:49:45,166 INFO - [main:] ~ Unable to configure HTTPS connection from configuration. Leveraging JDK properties. (SecureClientUtils$4:240)
2018-07-25 01:49:45,269 DEBUG - [main:] ~ API https://mydns:21443/api/atlas/v2/types/typedefs?name=Dimension returned status 200 (AtlasBaseClient:337)
2018-07-25 01:49:45,270 DEBUG - [main:] ~ Calling API [ GET : api/atlas/v2/types/typedefs ] (AtlasBaseClient:319)
2018-07-25 01:49:45,271 DEBUG - [main:] ~ Attempting to configure HTTPS connection using client configuration (SecureClientUtils$4:221)
2018-07-25 01:49:45,291 INFO - [main:] ~ Unable to configure HTTPS connection from configuration. Leveraging JDK properties. (SecureClientUtils$4:240)
2018-07-25 01:49:45,450 DEBUG - [main:] ~ API https://mydns:21443/api/atlas/v2/types/typedefs?name=Log+Data returned status 200 (AtlasBaseClient:337)
2018-07-25 01:49:45,455 DEBUG - [main:] ~ Calling API [ POST : api/atlas/v2/entity ] <== AtlasEntityWithExtInfo{entity=AtlasEntity{AtlasStruct{typeName='DB', attributes=[owner:John ETL, createTime:1532483385453, name:Sales, description:sales database, locationuri:hdfs://host:8000/apps/warehouse/sales]}guid='-6466195619848', status=null, createdBy='null', updatedBy='null', createTime=null, updateTime=null, version=0, relationshipAttributes=[], classifications=[], },AtlasEntityExtInfo{referredEntities={}}} (AtlasBaseClient:319)
2018-07-25 01:49:45,455 DEBUG - [main:] ~ Attempting to configure HTTPS connection using client configuration (SecureClientUtils$4:221)
2018-07-25 01:49:45,474 INFO - [main:] ~ Unable to configure HTTPS connection from configuration. Leveraging JDK properties. (SecureClientUtils$4:240)
2018-07-25 01:49:33,256 Audit: myuser/10.160.189.35-10.160.189.35 performed request POST https://mydns:21443/api/atlas/v2/types/typedefs (10.160.187.181) at time 2018-07-25T01:49Z
2018-07-25 01:49:45,445 Audit: myuser/10.160.189.35-10.160.189.35 performed request GET https://mydns:21443/api/atlas/v2/types/typedefs?name=Log+Data (10.160.187.181) at time 2018-07-25T01:49Z
2018-07-25 01:49:45,678 Audit: myuser/10.160.189.35-10.160.189.35 performed request POST https://mydns:21443/api/atlas/v2/entity (10.160.187.181) at time 2018-07-25T01:49Z
The 2 topics are returned by this:
$KAFKA_HOME/bin/kafka-topics.sh --list --zookeeper localhost:2181
atlas' application.log does have this, not sure why:
2018-07-25 02:18:14,991 DEBUG - [NotificationHookConsumer thread-0:] ~ Give up sending metadata request since no node is available (NetworkClient$DefaultMetadataUpdater:625)
2018-07-25 02:18:15,018 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Initialize connection to node -1 for sending metadata request (NetworkClient$DefaultMetadataUpdater:644)
2018-07-25 02:18:15,018 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Initiating connection to node -1 at localhost:9027. (NetworkClient:496)
2018-07-25 02:18:15,018 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Connection with localhost/127.0.0.1 disconnected (Selector:345)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:309)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Thread.java:748)
2018-07-25 02:18:15,018 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Node -1 disconnected. (NetworkClient:463)
2018-07-25 02:18:15,018 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Give up sending metadata request since no node is available (NetworkClient$DefaultMetadataUpdater:625)
2018-07-25 02:18:15,092 DEBUG - [NotificationHookConsumer thread-0:] ~ Initialize connection to node -1 for sending metadata request (NetworkClient$DefaultMetadataUpdater:644)
2018-07-25 02:18:15,092 DEBUG - [NotificationHookConsumer thread-0:] ~ Initiating connection to node -1 at localhost:9027. (NetworkClient:496)
2018-07-25 02:18:15,092 DEBUG - [NotificationHookConsumer thread-0:] ~ Connection with localhost/127.0.0.1 disconnected (Selector:345)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:309)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:134)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:183)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:973)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
at org.apache.atlas.kafka.AtlasKafkaConsumer.receive(AtlasKafkaConsumer.java:63)
at org.apache.atlas.kafka.AtlasKafkaConsumer.receive(AtlasKafkaConsumer.java:55)
at org.apache.atlas.notification.NotificationHookConsumer$HookConsumer.doWork(NotificationHookConsumer.java:305)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-07-25 02:18:15,092 DEBUG - [NotificationHookConsumer thread-0:] ~ Node -1 disconnected. (NetworkClient:463)
2018-07-25 02:18:15,092 DEBUG - [NotificationHookConsumer thread-0:] ~ Give up sending metadata request since no node is available (NetworkClient$DefaultMetadataUpdater:625)
2018-07-25 02:18:15,119 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Initialize connection to node -1 for sending metadata request (NetworkClient$DefaultMetadataUpdater:644)
2018-07-25 02:18:15,119 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Initiating connection to node -1 at localhost:9027. (NetworkClient:496)
2018-07-25 02:18:15,119 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Connection with localhost/127.0.0.1 disconnected (Selector:345)
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.kafka.common.network.PlaintextTransportLayer.finishConnect(PlaintextTransportLayer.java:51)
at org.apache.kafka.common.network.KafkaChannel.finishConnect(KafkaChannel.java:73)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:309)
at org.apache.kafka.common.network.Selector.poll(Selector.java:283)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Thread.java:748)
2018-07-25 02:18:15,119 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Node -1 disconnected. (NetworkClient:463)
2018-07-25 02:18:15,119 DEBUG - [kafka-producer-network-thread | producer-1:] ~ Give up sending metadata request since no node is available (NetworkClient$DefaultMetadataUpdater:625)
This fixed it!
sed -i 's/atlas.kafka.bootstrap.servers=localhost:9027/atlas.kafka.bootstrap.servers=localhost:9092/' $ATLAS_HOME/conf/atlas-application.properties```

Spark on Yarn job failed with ExitCode:1 and stderr says "Can't find main class"

We tried to submit a simple SparkPI example onto Spark on Yarn. The bat is written as below:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 3 --driver-memory 4g --executor-memory 1g --executor-cores 1 .\examples\target\spark-examples_2.10-1.4.0.jar 10
pause
Our HDFS and Yarn works well. We are using Hadoop 2.7.0 and Spark 1.4.1. We have only 1 node that acts as both NameNode and DataNode.
When we execute it, it fails with log says the following:
2015-08-21 11:07:22,044 DEBUG [main] | ===============================================================================
2015-08-21 11:07:22,044 DEBUG [main] | Yarn AM launch context:
2015-08-21 11:07:22,044 DEBUG [main] | user class: org.apache.spark.examples.SparkPi
2015-08-21 11:07:22,044 DEBUG [main] | env:
2015-08-21 11:07:22,044 DEBUG [main] | CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__hadoop_conf__<CPS>{{PWD}}/__spark__.jar<CPS>%HADOOP_HOME%\etc\hadoop<CPS>%HADOOP_HOME%\share\hadoop\common\*<CPS>%HADOOP_HOME%\share\hadoop\common\lib\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_HOME%\share\hadoop\mapreduce\lib\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\*<CPS>%HADOOP_HOME%\share\hadoop\hdfs\lib\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\*<CPS>%HADOOP_HOME%\share\hadoop\yarn\lib\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\*<CPS>%HADOOP_MAPRED_HOME%\share\hadoop\mapreduce\lib\*
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_FILE_SIZES -> 165181064,1420218
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1440062075415_0026
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_VISIBILITIES -> PRIVATE,PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_USER -> msrabi
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_MODE -> true
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1440126441200,1440126441575
2015-08-21 11:07:22,060 DEBUG [main] | SPARK_YARN_CACHE_FILES -> hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar#__spark__.jar,hdfs://msra-sa-44:9000/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar#__app__.jar
2015-08-21 11:07:22,060 DEBUG [main] | resources:
2015-08-21 11:07:22,060 DEBUG [main] | __app__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-examples_2.10-1.4.0.jar" } size: 1420218 timestamp: 1440126441575 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | __spark__.jar -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/spark-assembly-1.4.0-hadoop2.7.0.jar" } size: 165181064 timestamp: 1440126441200 type: FILE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | __hadoop_conf__ -> resource { scheme: "hdfs" host: "msra-sa-44" port: 9000 file: "/user/msrabi/.sparkStaging/application_1440062075415_0026/__hadoop_conf__7908628615251032149.zip" } size: 82888 timestamp: 1440126441794 type: ARCHIVE visibility: PRIVATE
2015-08-21 11:07:22,060 DEBUG [main] | command:
2015-08-21 11:07:22,075 DEBUG [main] | {{JAVA_HOME}}/bin/java -server -Xmx4096m -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.app.name=org.apache.spark.examples.SparkPi' '-Dspark.executor.memory=1g' '-Dspark.driver.memory=4g' '-Dspark.master=yarn-cluster' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.deploy.yarn.ApplicationMaster --class 'org.apache.spark.examples.SparkPi' --jar file:/D:/sp/./examples/target/spark-examples_2.10-1.4.0.jar --arg '10' --executor-memory 1024m --executor-cores 1 --num-executors 3 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
2015-08-21 11:07:22,075 DEBUG [main] | ===============================================================================
...........(omitting some lines)......
2015-08-21 11:07:23,231 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
2015-08-21 11:07:23,247 DEBUG [main] |
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1440126442169
final status: UNDEFINED
tracking URL: http://msra-sa-44:8088/proxy/application_1440062075415_0026/
user: msrabi
2015-08-21 11:07:24,263 TRACE [main] | 1: Call -> MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_id { id: 26 cluster_timestamp: 1440062075415 }}
2015-08-21 11:07:24,263 DEBUG [IPC Parameter Sending Thread #0] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi sending #37
2015-08-21 11:07:24,263 DEBUG [IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi] | IPC Client (443384617) connection to MSRA-SA-44/10.190.173.181:8032 from msrabi got value #37
2015-08-21 11:07:24,263 DEBUG [main] | Call: getApplicationReport took 0ms
2015-08-21 11:07:24,263 TRACE [main] | 1: Response <- MSRA-SA-44/10.190.173.181:8032: getApplicationReport {application_report { applicationId { id: 26 cluster_timestamp: 1440062075415 } user: "msrabi" queue: "default" name: "org.apache.spark.examples.SparkPi" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED trackingUrl: "http://msra-sa-44:8088/proxy/application_1440062075415_0026/" diagnostics: "" startTime: 1440126442169 finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { num_used_containers: 1 num_reserved_containers: 0 used_resources { memory: 4608 virtual_cores: 1 } reserved_resources { memory: 0 virtual_cores: 0 } needed_resources { memory: 4608 virtual_cores: 1 } memory_seconds: 0 vcore_seconds: 0 } originalTrackingUrl: "N/A" currentApplicationAttemptId { application_id { id: 26 cluster_timestamp: 1440062075415 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" }}
2015-08-21 11:07:24,263 INFO [main] | Application report for application_1440062075415_0026 (state: ACCEPTED)
.......(omitting some lines where the state are all ACCEPTED and final status are all UNDEFINED).....
2015-08-21 11:07:30,359 INFO [main] | Application report for application_1440062075415_0026 (state: FAILED)
2015-08-21 11:07:30,359 DEBUG [main] |
client token: N/A
diagnostics: Application application_1440062075415_0026 failed 2 times due to AM Container for appattempt_1440062075415_0026_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://msra-sa-44:8088/cluster/app/application_1440062075415_0026Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1440062075415_0026_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Shell output: 1 file(s) moved.
And then we opened stderr, it says:
Error: Could not find or load main class 'Dspark.app.name=org.apache.spark.examples.SparkPi'
It's so strange, this should be a parameter passed to java, and it seems that java recognized it as the main class. There should be a main class parameter in the command section of the log, but there is not.
How can that happen? What should we do to know what's wrong with it?
Thank you!
We solved this problem.
The root cause is that when generating the java command line, our Spark uses single quote('-Dxxxx') to wrap the parameters. Single quote works only in Linux. On Windows, the parameters are either not wrapped, or wrapped with double quotes("-Dxxxx"). The only way to solve this is to edit the source code of Spark and re-compile it.
It seems that this is currently an issue of Spark. (https://issues.apache.org/jira/browse/SPARK-5754)

Resources