We are using MarkLogic 9.0-7.2
Recently when we looked at MarkLogic log files, we found that there are lot of logging happening for Restarting XDQPServer, Stopping XDQPServerConnection, Starting domestic XDQPServerConnection.
This is happening on one of the node in 3 node cluster and at the same time there was spike in CPU usage on same node
2019-07-19 02:22:05.730 Info: Merging 12 MB from /data/Forests/test-002-1/00004965 and /data/Forests/test-002-1/00004964 to /data/Forests/test-002-1/00004966, timestamp=15635023204760890
2019-07-19 02:22:05.907 Info: Merged 4 MB at 23 MB/sec to /data/Forests/test-002-1/00004966
2019-07-19 02:22:10.092 Info: Deleted 4 MB at 1559 MB/sec /data/Forests/test-002-1/00004964
2019-07-19 02:22:10.094 Info: Deleted 8 MB at 3340 MB/sec /data/Forests/test-002-1/00004965
2019-07-19 02:33:40.108 Notice: Restarting XDQPServer, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:51376, recvTicks=20, sendTicks=17, sslChange=0
2019-07-19 02:33:40.108 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:51366, requests=0, recvTicks=0, sendTicks=14, recvs=45474194, sends=37524064, recvBytes=22595104264, sendBytes=36091745084
2019-07-19 02:33:40.108 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:51370, requests=0, recvTicks=0, sendTicks=10, recvs=45782975, sends=37524025, recvBytes=119571059752, sendBytes=36127613500
2019-07-19 02:33:40.108 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:51376, requests=0, recvTicks=0, sendTicks=18, recvs=45441834, sends=37524005, recvBytes=22565188068, sendBytes=36066592356
2019-07-19 02:33:42.900 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:54600
2019-07-19 02:33:43.125 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46748, requests=0, recvTicks=0, sendTicks=0, recvs=4419607, sends=3677988, recvBytes=2305124592, sendBytes=3496374804
giving more
2019-07-19 02:33:43.125 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46754, requests=0, recvTicks=0, sendTicks=0, recvs=4398854, sends=3677936, recvBytes=2407770572, sendBytes=3503131292
2019-07-19 02:33:43.125 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46752, requests=0, recvTicks=0, sendTicks=0, recvs=4398978, sends=3677956, recvBytes=2270303720, sendBytes=3497036524
2019-07-19 02:33:43.176 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46886
2019-07-19 02:33:44.807 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46888
2019-07-19 02:33:45.085 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46890
2019-07-19 02:33:48.372 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:54602
2019-07-19 02:33:53.633 Info: Saving /data/failover/Forests/test-001-1-2/00004935
2019-07-19 02:33:53.894 Info: Saved 8 MB at 31 MB/sec to /data/failover/Forests/test-001-1-2/00004935
2019-07-19 02:33:53.947 Info: Merging 30 MB from /data/failover/Forests/test-001-1-2/00004934 and /data/failover/Forests/test-001-1-2/00004935 to /data/failover/Forests/test-001-1-2/00004936, timestamp=15635029812437490
2019-07-19 02:34:01.953 Info: Merged 22 MB in 8 sec at 3 MB/sec to /data/failover/Forests/test-001-1-2/00004936
2019-07-19 02:34:04.214 Info: Deleted 22 MB at 6685 MB/sec /data/failover/Forests/test-001-1-2/00004934
2019-07-19 02:34:04.218 Info: Deleted 8 MB at 2427 MB/sec /data/failover/Forests/test-001-1-2/00004935
2019-07-19 02:34:05.041 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46886, requests=0, recvTicks=0, sendTicks=9, recvs=60, sends=66, recvBytes=140384, sendBytes=40020
2019-07-19 02:34:05.041 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46890, requests=0, recvTicks=0, sendTicks=4, recvs=57, sends=20, recvBytes=167120, sendBytes=63332
2019-07-19 02:34:05.041 Info: Stopping XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46888, requests=0, recvTicks=0, sendTicks=5, recvs=20, sends=36, recvBytes=4456, sendBytes=11980
2019-07-19 02:34:11.127 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46894
2019-07-19 02:34:11.156 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:54610
2019-07-19 02:34:15.017 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46896
2019-07-19 02:34:16.447 Info: Starting domestic XDQPServerConnection, client=ip-10-x-xx-xxx.eu-west-1, conn=10.x.xx.xxx:7999-10.x.xx.xxx:46898
Now we want to understand how & what to investigate, what is solution for this?
When reading log files it's good to keep the log levels in mind. I generally would not recommend you be concerned about any logs messages below a "Warning" level.
XDQP is a proprietary MarkLogic protocol for intra-cluster communication. This knowledge base article should cover everything you need to know about it.
I would bet your CPU spike has more to do with the merge activity in your log. I would not be surprised if you also found someone writing data to those merging forests during this period.
Related
I can not change local ES index location - can not modify path.data .
That's probably some elementary mistake, but I am stuck and greatly appreciate any assistance.
So:
Fresh local installation of ES 7.8.1 under Centos 7, everything runs correctly, if no changes were done in elasticsearch.yml:
But if I try change elasticsearch.yml:
# path.data: /var/lib/elasticsearch'
path.data: /run/media/admin/bvv2/elasticsearch/
(i.e. try to point to external disk), I get after systemctl start elasticsearch:
Job for elasticsearch.service failed because the control process exited with error code. See "systemctl status elasticsearch.service" and "journalctl -xe" for details.
where in "systemctl status elasticsearch.service" :
● elasticsearch.service - Elasticsearch
Loaded: loaded (/etc/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2020-08-17 16:23:16 MSK; 5min ago
Docs: https://www.elastic.co
Process: 12951 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=1/FAILURE)
Main PID: 12951 (code=exited, status=1/FAILURE)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: at org.elasticsearch.cli.Command.main(Command.java:90)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92)
Aug 17 16:23:16 bvvcomp systemd-entrypoint[12951]: For complete error details, refer to the log at /var/log/elasticsearch/elasticsearch.log
Aug 17 16:23:16 bvvcomp systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
Aug 17 16:23:16 bvvcomp systemd[1]: Failed to start Elasticsearch.
Aug 17 16:23:16 bvvcomp systemd[1]: Unit elasticsearch.service entered failed state.
Aug 17 16:23:16 bvvcomp systemd[1]: elasticsearch.service failed.
And in journalctl-xe:
Aug 17 16:29:20 bvvcomp NetworkManager[1112]: <info> [1597670960.1568] dhcp4 (wlp2s0): gateway 192.168.1.1
Aug 17 16:29:20 bvvcomp NetworkManager[1112]: <info> [1597670960.1569] dhcp4 (wlp2s0): lease time 25200
Aug 17 16:29:20 bvvcomp NetworkManager[1112]: <info> [1597670960.1569] dhcp4 (wlp2s0): nameserver '192.168.1.1'
Aug 17 16:29:20 bvvcomp NetworkManager[1112]: <info> [1597670960.1569] dhcp4 (wlp2s0): state changed bound -> bound
Aug 17 16:29:20 bvvcomp dbus[904]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Aug 17 16:29:20 bvvcomp dhclient[1325]: bound to 192.168.1.141 -- renewal in 12352 seconds.
Aug 17 16:29:20 bvvcomp systemd[1]: Starting Network Manager Script Dispatcher Service...
-- Subject: Unit NetworkManager-dispatcher.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit NetworkManager-dispatcher.service has begun starting up.
Aug 17 16:29:20 bvvcomp dbus[904]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Aug 17 16:29:20 bvvcomp systemd[1]: Started Network Manager Script Dispatcher Service.
-- Subject: Unit NetworkManager-dispatcher.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit NetworkManager-dispatcher.service has finished starting up.
--
-- The start-up result is done.
Aug 17 16:29:20 bvvcomp nm-dispatcher[13569]: req:1 'dhcp4-change' [wlp2s0]: new request (4 scripts)
Aug 17 16:29:20 bvvcomp nm-dispatcher[13569]: req:1 'dhcp4-change' [wlp2s0]: start running ordered scripts...
Unfortunately, these advice did not help:
How to move elasticsearch data directory? ;
elasticsearch changing path.logs and/or path.data - fails to start ;
Elasticsearch after change path.data, unable to access 'default.path.data' ;
thats probably new issue, version 7.x bounded ?
Thank you
Update 1 - error log (/var/log/elasticsearch/elasticsearch.log):
[2020-08-18T01:30:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [bvvcomp] triggering scheduled [ML] maintenance tasks
[2020-08-18T01:30:00,014][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [bvvcomp] Deleting expired data
[2020-08-18T01:30:00,052][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [bvvcomp] Completed deletion of expired ML data
[2020-08-18T01:30:00,053][INFO ][o.e.x.m.MlDailyMaintenanceService] [bvvcomp] Successfully completed [ML] maintenance tasks
[2020-08-18T04:30:00,017][INFO ][o.e.x.s.SnapshotRetentionTask] [bvvcomp] starting SLM retention snapshot cleanup task
[2020-08-18T04:30:00,025][INFO ][o.e.x.s.SnapshotRetentionTask] [bvvcomp] there are no repositories to fetch, SLM retention snapshot cleanup task complete
[2020-08-18T05:27:08,457][INFO ][o.e.n.Node ] [bvvcomp] stopping ...
[2020-08-18T05:27:08,482][INFO ][o.e.x.w.WatcherService ] [bvvcomp] stopping watch service, reason [shutdown initiated]
[2020-08-18T05:27:08,483][INFO ][o.e.x.w.WatcherLifeCycleService] [bvvcomp] watcher has stopped and shutdown
[2020-08-18T05:27:08,495][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [bvvcomp] [controller/21903] [Main.cc#155] ML controller exiting
[2020-08-18T05:27:08,497][INFO ][o.e.x.m.p.NativeController] [bvvcomp] Native controller process has stopped - no new native processes can be started
[2020-08-18T05:27:08,540][INFO ][o.e.n.Node ] [bvvcomp] stopped
[2020-08-18T05:27:08,541][INFO ][o.e.n.Node ] [bvvcomp] closing ...
[2020-08-18T05:27:08,585][INFO ][o.e.n.Node ] [bvvcomp] closed
[2020-08-18T05:27:19,077][ERROR][o.e.b.Bootstrap ] [bvvcomp] Exception
java.lang.IllegalStateException: Unable to access 'path.data' (/run/media/admin/bvv2/elasticsearch)
at org.elasticsearch.bootstrap.FilePermissionUtils.addDirectoryPath(FilePermissionUtils.java:70) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Security.addFilePermissions(Security.java:297) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Security.createPermissions(Security.java:252) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Security.configure(Security.java:121) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:222) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:393) [elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) [elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) [elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) [elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) [elasticsearch-cli-7.8.1.jar:7.8.1]
at org.elasticsearch.cli.Command.main(Command.java:90) [elasticsearch-cli-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) [elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) [elasticsearch-7.8.1.jar:7.8.1]
Caused by: java.nio.file.AccessDeniedException: /run/media/admin/bvv2
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:90) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
at sun.nio.fs.UnixFileSystemProvider.checkAccess(UnixFileSystemProvider.java:313) ~[?:?]
at java.nio.file.Files.createDirectories(Files.java:766) ~[?:?]
at org.elasticsearch.bootstrap.Security.ensureDirectoryExists(Security.java:389) ~[elasticsearch-7.8.1.jar:7.8.1]
at org.elasticsearch.bootstrap.FilePermissionUtils.addDirectoryPath(FilePermissionUtils.java:68) ~[elasticsearch-7.8.1.jar:7.8.1]
... 12 more
Permissions:
ls -l /run/media/admin/bvv2
drwxrwsrwx 3 elasticsearch elasticsearch 4096 Aug 17 17:26 elasticsearch
ls -l /run/media/admin
total 4
drwxr-xr-x 11 admin admin 4096 Aug 17 13:22 bvv2
I encountered a similar error and it was caused by incorrect parent directory permissions.
One of the parent directory doesn't allow other unix users to access the directory, more specifically the directory's permission is drwx--x---+. Elasticsearch started after change the permission to drwx--x--x+(chmod 711). You can also try it.
I installed tomcat 9.0.14 on my system(Windows 10, Windows server 2016 R2)
I've no issue while starting the tomcat service(start in 2-3 sec).
However, it takes 1 minute to stop.
I thought one of my project residing under webapps is taking time so I removed all my project but result is same.
After that I make it empty webapps folder empty to check further still tomcat took 1 min to stop.
I check the log file and their are no errors.Tomcat is idle for 1 minute while stopping.
Common-deamon.log-------
[2019-01-08 16:30:02] [info] [13948] Stopping service...
[2019-01-08 16:30:03] [info] [13948] Service stop thread completed.
[2019-01-08 16:31:03] [info] [ 1940] Run service finished.
[2019-01-08 16:31:03] [info] [ 1940] Commons Daemon procrun finished
catalina.log--------
08-Jan-2019 16:30:02.399 INFO [Thread-6] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["http-nio-8080"]
08-Jan-2019 16:30:02.431 INFO [Thread-6] org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler ["ajp-nio-8009"]
08-Jan-2019 16:30:02.453 INFO [Thread-6] org.apache.catalina.core.StandardService.stopInternal Stopping service [Catalina]
08-Jan-2019 16:30:02.453 INFO [Thread-6] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["http-nio-8080"]
08-Jan-2019 16:30:02.453 INFO [Thread-6] org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler ["ajp-nio-8009"]
Is their any way I can reduce the sopping time of tomcat 9.
In tomcat 8 stopping time was 3-5 sec
Any help is appreciated.....
I was abel to reproduce this by
Downloading and extracting the apache-tomcat-9.0.14-windows-x64.zip
cd to apache-tomcat/bin
service.bat install
Starting the Service is quick, stopping it delays exactly 60 seconds.
This seemes to be an issue of Tomcat, but current developer snapchot (trunk) changelog suggests it has been already fixed for not yet released Tomcat 9.0.15+ without explicit bug report assigned:
Tomcat 9.0.15 (markt) in development / Catalina:
Correct a bug exposed in 9.0.14 and ensure that the Tomcat terminates in a timely manner when running as a service. (markt)
We had the same problem with Tomcat v9.0.26. Tomcat took exactly 60 seconds to finish once you terminated the server. We tried hard to close and shutdown everything we had in our application and in the end we realized we had a ThreadPoolExecutor that created a newCachedThreadPool() and this cachepool has a "keepAliveTime" of 60 seconds.
So after terminating the tomcat the threadpool was waiting 60 seconds to check if the threads are still needed to be reused. Only after this time it really shut down. So the solution was to shut down the cached thread pool once we shut down the application.
I have successfully upgraded SonarQube to ver. 6.5, including the database upgrade, and I am currently trying to upgrade SonarQube to ver. 6.7.1 LTS. The New SonarQube version is being installed on a Linux 64 bit system and is connected to a 2014 Microsoft SQL database. Every time I try to launch the 6.7.1 version of SonarQube it fails with the error "Background initialization failed". If I run the new SonarQube using an empty Microsoft SQL database, then it will start up fine with no issues. The "Background initialization failed" issue only occurs when I connect the new SonarQube to the upgraded database. I have tried adding memory to the heap for ElasticSearch and reducing the number of issues being processed. Any help to resolve this issue would be greatly appreciated.
Web log:
web[][o.s.p.ProcessEntryPoint] Starting web
web[][o.a.t.u.n.NioSelectorPool] Using a shared selector for servlet write/read
web[][o.e.p.PluginsService] no modules loaded
web[][o.e.p.PluginsService] loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
web[][o.e.p.PluginsService] loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
web[][o.e.p.PluginsService] loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
web[][o.e.p.PluginsService] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
web[][i.n.c.MultithreadEventLoopGroup] -Dio.netty.eventLoopThreads: 64
web[][i.n.u.i.PlatformDependent0] -Dio.netty.noUnsafe: false
web[][i.n.u.i.PlatformDependent0] Java version: 8
web[][i.n.u.i.PlatformDependent0] sun.misc.Unsafe.theUnsafe: available
web[][i.n.u.i.PlatformDependent0] sun.misc.Unsafe.copyMemory: available
web[][i.n.u.i.PlatformDependent0] java.nio.Buffer.address: available
web[][i.n.u.i.PlatformDependent0] direct buffer constructor: available
web[][i.n.u.i.PlatformDependent0] java.nio.Bits.unaligned: available, true
web[][i.n.u.i.PlatformDependent0] jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable prior to Java9
web[][i.n.u.i.PlatformDependent0] java.nio.DirectByteBuffer.<init>(long, int): available
web[][i.n.u.i.PlatformDependent] sun.misc.Unsafe: available
web[][i.n.u.i.PlatformDependent] -Dio.netty.tmpdir: /../../sonarqube-6.7.1/temp (java.io.tmpdir)
web[][i.n.u.i.PlatformDependent] -Dio.netty.bitMode: 64 (sun.arch.data.model)
web[][i.n.u.i.PlatformDependent] -Dio.netty.noPreferDirect: false
web[][i.n.u.i.PlatformDependent] -Dio.netty.maxDirectMemory: 4772593664 bytes
web[][i.n.u.i.PlatformDependent] -Dio.netty.uninitializedArrayAllocationThreshold: -1
web[][i.n.u.i.CleanerJava6] java.nio.ByteBuffer.cleaner(): available
web[][i.n.c.n.NioEventLoop] -Dio.netty.noKeySetOptimization: false
web[][i.n.c.n.NioEventLoop] -Dio.netty.selectorAutoRebuildThreshold: 512
web[][i.n.u.i.PlatformDependent] org.jctools-core.MpscChunkedArrayQueue: available
web[][i.n.c.DefaultChannelId] -Dio.netty.processId: ***** (auto-detected)
web[][i.netty.util.NetUtil] -Djava.net.preferIPv4Stack: true
web[][i.netty.util.NetUtil] -Djava.net.preferIPv6Addresses: false
web[][i.netty.util.NetUtil] Loopback interface: lo (lo, 127.0.0.1)
web[][i.netty.util.NetUtil] /proc/sys/net/core/somaxconn: 128
web[][i.n.c.DefaultChannelId] -Dio.netty.machineId: ***** (auto-detected)
web[][i.n.u.ResourceLeakDetector] -Dio.netty.leakDetection.level: simple
web[][i.n.u.ResourceLeakDetector] -Dio.netty.leakDetection.maxRecords: 4
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.numHeapArenas: 47
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.numDirectArenas: 47
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.pageSize: 8192
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.maxOrder: 11
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.chunkSize: 16777216
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.tinyCacheSize: 512
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.smallCacheSize: 256
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.normalCacheSize: 64
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.maxCachedBufferCapacity: 32768
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.cacheTrimInterval: 8192
web[][i.n.b.PooledByteBufAllocator] -Dio.netty.allocator.useCacheForAllThreads: true
web[][i.n.b.ByteBufUtil] -Dio.netty.allocator.type: pooled
web[][i.n.b.ByteBufUtil] -Dio.netty.threadLocalDirectBufferSize: 65536
web[][i.n.b.ByteBufUtil] -Dio.netty.maxThreadLocalCharBufferSize: 16384
web[][i.n.b.AbstractByteBuf] -Dio.netty.buffer.bytebuf.checkAccessible: true
web[][i.n.u.ResourceLeakDetectorFactory] Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector#6c6be5c2
web[][i.n.util.Recycler] -Dio.netty.recycler.maxCapacityPerThread: 32768
web[][i.n.util.Recycler] -Dio.netty.recycler.maxSharedCapacityFactor: 2
web[][i.n.util.Recycler] -Dio.netty.recycler.linkCapacity: 16
web[][i.n.util.Recycler] -Dio.netty.recycler.ratio: 8
web[][o.s.s.e.EsClientProvider] Connected to local Elasticsearch: [127.0.0.1:*****]
web[][o.s.s.p.LogServerVersion] SonarQube Server / 6.7.1.35068 / 426519346f51f7b980a76f9050f983110550509d
web[][o.sonar.db.Database] Create JDBC data source for jdbc:sqlserver:*****
web[][o.s.s.p.ServerFileSystemImpl] SonarQube home: /../../sonarqube-6.7.1
web[][o.s.s.u.SystemPasscodeImpl] System authentication by passcode is disabled
web[][o.s.c.i.DefaultI18n] Loaded 2094 properties from l10n bundles
web[][o.s.s.p.d.m.c.MssqlCharsetHandler] Verify that database collation is case-sensitive and accent-sensitive
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.ws.WebServiceFilter#7e977d45 [pattern=UrlPattern{inclusions=[/api/system/migrate_db/*, ...], exclusions=[/api/properties*, ...]}]
web[][o.s.s.a.TomcatAccessLog] Tomcat is started
web[][o.s.s.a.EmbeddedTomcat] HTTP connector enabled on port ****
web[][o.s.s.p.UpdateCenterClient] Update center:https://update.sonarsource.org/update-center.properties (no proxy)
web[][o.s.a.r.Languages] No language available
web[][o.s.s.e.RecoveryIndexer] Elasticsearch recovery - sonar.search.recovery.minAgeInMs=300000
web[][o.s.s.e.RecoveryIndexer] Elasticsearch recovery - sonar.search.recovery.loopLimit=10000
web[][o.s.s.s.LogServerId] Server ID: *****
web[][o.s.s.e.RecoveryIndexer] Elasticsearch recovery - sonar.search.recovery.delayInMs=300000
web[][o.s.s.e.RecoveryIndexer] Elasticsearch recovery - sonar.search.recovery.initialDelayInMs=26327
web[][o.s.s.t.TelemetryDaemon] Sharing of SonarQube statistics is enabled.
web[][o.s.s.n.NotificationDaemon] Notification service started (delay 60 sec.)
web[][o.s.s.s.GeneratePluginIndex] Generate scanner plugin index
web[][o.s.s.s.GeneratePluginIndex] Generate scanner plugin index (done) | time=1ms
web[][o.s.s.s.RegisterPlugins] Register plugins
web[][o.s.s.s.RegisterPlugins] Register plugins (done) | time=167ms
web[][o.s.s.s.RegisterMetrics] Register metrics
web[][o.s.s.s.RegisterMetrics] Register metrics (done) | time=2734ms
web[][o.s.s.r.RegisterRules] Register rules
web[][o.s.s.r.RegisterRules] Register rules (done) | time=685ms
web[][o.s.s.q.BuiltInQProfileRepositoryImpl] Load quality profiles
web[][o.s.s.q.BuiltInQProfileRepositoryImpl] Load quality profiles (done) | time=2ms
web[][o.s.s.s.RegisterPermissionTemplates] Register permission templates
web[][o.s.s.s.RegisterPermissionTemplates] Register permission templates (done) | time=153ms
web[][o.s.s.s.RenameDeprecatedPropertyKeys] Rename deprecated property keys
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.ws.WebServiceFilter#3a6e54b [pattern=UrlPattern{inclusions=[/api/measures/component/*, ...], exclusions=[/api/properties*, ...]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.ws.DeprecatedPropertiesWsFilter#3b2c45f3 [pattern=UrlPattern{inclusions=[/api/properties/*], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.ws.WebServiceReroutingFilter#42ffe60e [pattern=UrlPattern{inclusions=[/api/components/bulk_update_key, ...], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.authentication.InitFilter#3bc1cd0f [pattern=UrlPattern{inclusions=[/sessions/init/*], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.authentication.OAuth2CallbackFilter#533fe992 [pattern=UrlPattern{inclusions=[/oauth2/callback/*], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.authentication.ws.LoginAction#54370dcd [pattern=UrlPattern{inclusions=[/api/authentication/login], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.authentication.ws.LogoutAction#7bc801b4 [pattern=UrlPattern{inclusions=[/api/authentication/logout], exclusions=[]}]
web[][o.s.s.p.w.MasterServletFilter] Initializing servlet filter org.sonar.server.authentication.ws.ValidateAction#2e0576fc [pattern=UrlPattern{inclusions=[/api/authentication/validate], exclusions=[]}]
web[][o.s.s.e.IndexerStartupTask] Indexing of type [issues/issue] ...
web[][o.s.s.es.BulkIndexer] 1387134 requests processed (23118 items/sec)
web[][o.s.s.es.BulkIndexer] 2715226 requests processed (22134 items/sec)
web[][o.s.s.es.BulkIndexer] 3944404 requests processed (20486 items/sec)
web[][o.s.s.es.BulkIndexer] 5319447 requests processed (22917 items/sec)
web[][o.s.s.es.BulkIndexer] 6871423 requests processed (25866 items/sec)
web[][o.s.s.es.BulkIndexer] 7814247 requests processed (15713 items/sec)
web[][o.s.s.es.BulkIndexer] 7814247 requests processed (0 items/sec)
web[][o.s.s.es.BulkIndexer] 7814247 requests processed (0 items/sec)
web[][o.s.s.p.Platform] Background initialization failed. Stopping SonarQube
java.lang.IllegalStateException: Unrecoverable indexation failures
at org.sonar.server.es.IndexingListener$1.onFinish(IndexingListener.java:39)
at org.sonar.server.es.BulkIndexer.stop(BulkIndexer.java:117)
at org.sonar.server.issue.index.IssueIndexer.doIndex(IssueIndexer.java:247)
at org.sonar.server.issue.index.IssueIndexer.indexOnStartup(IssueIndexer.java:95)
at org.sonar.server.es.IndexerStartupTask.indexUninitializedTypes(IndexerStartupTask.java:68)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
at org.sonar.server.es.IndexerStartupTask.execute(IndexerStartupTask.java:55)
at java.util.Optional.ifPresent(Optional.java:159)
at org.sonar.server.platform.platformlevel.PlatformLevelStartup$1.doPrivileged(PlatformLevelStartup.java:84)
at org.sonar.server.user.DoPrivileged.execute(DoPrivileged.java:45)
at org.sonar.server.platform.platformlevel.PlatformLevelStartup.start(PlatformLevelStartup.java:80)
at org.sonar.server.platform.Platform.executeStartupTasks(Platform.java:196)
at org.sonar.server.platform.Platform.access$400(Platform.java:46)
at org.sonar.server.platform.Platform$1.lambda$doRun$1(Platform.java:121)
at org.sonar.server.platform.Platform$AutoStarterRunnable.runIfNotAborted(Platform.java:371)
at org.sonar.server.platform.Platform$1.doRun(Platform.java:121)
at org.sonar.server.platform.Platform$AutoStarterRunnable.run(Platform.java:355)
at java.lang.Thread.run(Thread.java:748)
web[][o.s.s.p.Platform] Background initialization of SonarQube done
web[][o.s.p.StopWatcher] Stopping process
===========================================================================
Edit: I have referenced the link provided prior to my initial post. The post referenced "free space" which I assumed to mean disk space, here is my disk space values where SonarQube 6.7.1 is installed:
1K-blocks Used Available Use%
251531268 16204576 235326692 7% /prod/appl
Also here is a portion of my elasticsearch log where the error in the web.log occurs. SonarQube 6.7.1 uses Elasticsearch-5.
Elasticsearch log:
es[][o.e.i.IndexingMemoryController] write indexing buffer to disk for shard [[issues][0]] to free up its [29.8mb] indexing buffer
es[][o.e.i.s.IndexShard] add [29.8mb] writing bytes for shard [[issues][0]]
es[][o.e.i.e.Engine] use refresh to write indexing buffer (heap size=[23.5mb]), to also clear version map (heap size=[6.3mb])
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][49] took [462.3micros]
es[][o.e.i.s.IndexShard] remove [29.8mb] writing bytes for shard [[issues][0]]
es[][o.e.i.IndexingMemoryController] now write some indexing buffers: total indexing heap bytes used [104.3mb] vs indices.memory.index_buffer_size [98.9mb], currently writing bytes [0b], [5] shards with non-zero indexing buffer
es[][o.e.i.IndexingMemoryController] write indexing buffer to disk for shard [[issues][1]] to free up its [54.8mb] indexing buffer
es[][o.e.i.s.IndexShard] add [54.8mb] writing bytes for shard [[issues][1]]
es[][o.e.i.e.Engine] use IndexWriter.flush to write indexing buffer (heap size=[51.1mb]) since version map is small (heap size=[3.6mb])
es[][o.e.i.s.IndexShard] remove [54.8mb] writing bytes for shard [[issues][1]]
es[][o.e.i.IndexingMemoryController] now write some indexing buffers: total indexing heap bytes used [104.2mb] vs indices.memory.index_buffer_size [98.9mb], currently writing bytes [0b], [5] shards with non-zero indexing buffer
es[][o.e.i.IndexingMemoryController] write indexing buffer to disk for shard [[issues][1]] to free up its [50.7mb] indexing buffer
es[][o.e.i.s.IndexShard] add [50.7mb] writing bytes for shard [[issues][1]]
es[][o.e.i.e.Engine] use IndexWriter.flush to write indexing buffer (heap size=[43.9mb]) since version map is small (heap size=[6.7mb])
es[][o.e.i.s.IndexShard] remove [50.7mb] writing bytes for shard [[issues][1]]
es[][o.e.i.IndexingMemoryController] now write some indexing buffers: total indexing heap bytes used [100.1mb] vs indices.memory.index_buffer_size [98.9mb], currently writing bytes [0b], [5] shards with non-zero indexing buffer
es[][o.e.i.IndexingMemoryController] write indexing buffer to disk for shard [[issues][1]] to free up its [31.5mb] indexing buffer
es[][o.e.i.s.IndexShard] add [31.5mb] writing bytes for shard [[issues][1]]
es[][o.e.i.e.Engine] use refresh to write indexing buffer (heap size=[23.3mb]), to also clear version map (heap size=[8.2mb])
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][46] took [988.8micros]
es[][o.e.i.s.IndexShard] remove [31.5mb] writing bytes for shard [[issues][1]]
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][46] took [880.6micros]
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][57] took [510.7micros]
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][49] took [829.3micros]
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][47] took [412.9micros]
es[][o.e.i.f.p.SortedSetDVOrdinalsIndexFieldData] global-ordinals [_parent#authorization][43] took [277.4micros]
es[][o.e.i.e.InternalEngine$EngineMergeScheduler] merge segment [_kh] done: took [30.9s], [343.7 MB], [3,159,200 docs], [0s stopped], [1.5s throttled], [169.4 MB written], [Infinity MB/sec throttle]
es[][o.e.i.e.InternalEngine$EngineMergeScheduler] merge segment [_oc] done: took [28.9s], [290.9 MB], [2,593,116 docs], [0s stopped], [0s throttled], [232.1 MB written], [Infinity MB/sec throttle]
es[][o.e.i.e.InternalEngine$EngineMergeScheduler] merge segment [_pz] done: took [30.6s], [341.3 MB], [2,573,716 docs], [0s stopped], [0s throttled], [266.1 MB written], [Infinity MB/sec throttle]
es[][o.e.i.e.InternalEngine$EngineMergeScheduler] merge segment [_th] done: took [35.2s], [346.3 MB], [3,102,397 docs], [0s stopped], [0s throttled], [262.0 MB written], [Infinity MB/sec throttle]
es[][o.e.c.s.ClusterService] processing [update-settings]: execute
es[][o.e.i.IndicesQueryCache] using [node] query cache with size [98.9mb] max filter count [10000]
es[][o.e.i.IndicesService] creating Index [[issues/WmTjz_-ITtyPeqpDlqPeFg]], shards [5]/[0] - reason [metadata verification]
es[][o.e.i.s.IndexStore] using index.store.throttle.type [NONE], with index.store.throttle.max_bytes_per_sec [null]
es[][o.e.i.m.MapperService] using dynamic[false]
es[][o.e.i.c.b.BitsetFilterCache] clearing all bitsets because [close]
es[][o.e.i.c.q.IndexQueryCache] full cache clear, reason [close]
es[][o.e.i.c.b.BitsetFilterCache] clearing all bitsets because [close]
es[][o.e.c.s.ClusterService] cluster state updated, version [17], source [update-settings]
es[][o.e.c.s.ClusterService] publishing cluster state version [17]
es[][o.e.c.s.ClusterService] applying cluster state version 17
es[][o.e.c.s.ClusterService] set local cluster state to version 17
es[][o.e.c.s.ClusterService] processing [update-settings]: took [19ms] done applying updated cluster_state (version: 17, uuid: dkhQacKBQGS5YsyMqp1kmQ)
es[][o.e.n.Node] stopping ...
I install spark on three nodes successfully. I can visit spark web UI and find every worker node and master node is active.
I can run the SparkPi example successfully.
My cluster info:
10.45.10.33(master&worker,hadoop-master,hadoop-slave)
10.45.10.34(worker,hadoop-slave)
10.45.10.35(worker,hadoop-slave)
But when I try to run "spark-shell --master yarn",it gave out the exception:
16/09/12 19:50:29 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101)
at $line3.$read$$iw$$iw.<init>(<console>:15)
at $line3.$read$$iw.<init>(<console>:31)
at $line3.$read.<init>(<console>:33)
at $line3.$read$.<init>(<console>:37)
at $line3.$read$.<clinit>(<console>)
at $line3.$eval$.$print$lzycompute(<console>:7)
at $line3.$eval$.$print(<console>:6)
at $line3.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)
at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)
at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:94)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:68)
at org.apache.spark.repl.Main$.main(Main.scala:51)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/09/12 19:50:29 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
16/09/12 19:50:29 WARN MetricsSystem: Stopping a MetricsSystem that is not running
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:500)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101)
... 47 elided
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
Here is my configuration:
1.spark-env.sh
export JAVA_HOME=/root/Downloads/jdk1.8.0_77
export SPARK_HOME=/root/Downloads/spark-2.0.0-bin-without-hadoop
export HADOOP_HOME=/root/Downloads/hadoop-2.7.2
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DIST_CLASSPATH=$(/root/Downloads/hadoop-2.7.2/bin/hadoop classpath)
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_LIBARY_PATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$HADOOP_HOME/lib/native
SPARK_MASTER_HOST=10.45.10.33
SPARK_MASTER_WEBUI_PORT=28686
SPARK_LOCAL_DIRS=/root/Downloads/spark-2.0.0-bin-without-hadoop/sparkdata/local
SPARK_WORKER_DIR=/root/Downloads/spark-2.0.0-bin-without-hadoop/sparkdata/work
SPARK_LOG_DIR=/root/Downloads/spark-2.0.0-bin-without-hadoop/logs
spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://10.45.10.33/spark-event-log
3.slaves
10.45.10.33
10.45.10.34
10.45.10.35
Here is some log info:
yarn job logs:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/Downloads/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/Downloads/hadoop-2.7.2/share/hadoop/common/lib/alluxio-core-client-1.2.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/Downloads/alluxio-master/core/client/target/alluxio-core-client-1.2.0-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/09/14 11:21:08 INFO SignalUtils: Registered signal handler for TERM
16/09/14 11:21:08 INFO SignalUtils: Registered signal handler for HUP
16/09/14 11:21:08 INFO SignalUtils: Registered signal handler for INT
16/09/14 11:21:14 INFO ApplicationMaster: Preparing Local resources
16/09/14 11:21:15 ERROR ApplicationMaster: RECEIVED SIGNAL TERM
yarn logs on runnong node:
2016-09-14 01:26:41,321 WARN alluxio.logger.type: Worker Client last execution took 2271 ms. Longer than the interval 1000
2016-09-14 06:13:10,905 WARN alluxio.logger.type: Worker Client last execution took 1891 ms. Longer than the interval 1000
2016-09-14 08:41:36,122 WARN alluxio.logger.type: Worker Client last execution took 1625 ms. Longer than the interval 1000
2016-09-14 10:41:49,426 WARN alluxio.logger.type: Worker Client last execution took 2441 ms. Longer than the interval 1000
2016-09-14 11:18:44,355 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1473752235721_0009_000002 (auth:SIMPLE)
2016-09-14 11:18:45,319 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1473752235721_0009_02_000001 by user root
2016-09-14 11:18:45,447 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1473752235721_0009
2016-09-14 11:18:45,601 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=10.45.10.33 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1473752235721_0009 CONTAINERID=container_1473752235721_0009_02_000001
2016-09-14 11:18:45,811 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1473752235721_0009 transitioned from NEW to INITING
2016-09-14 11:18:45,815 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Adding container_1473752235721_0009_02_000001 to application application_1473752235721_0009
2016-09-14 11:18:45,865 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1473752235721_0009 transitioned from INITING to RUNNING
2016-09-14 11:18:46,060 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from NEW to LOCALIZING
2016-09-14 11:18:46,060 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1473752235721_0009
2016-09-14 11:18:46,211 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.45.10.33:8020/user/root/.sparkStaging/application_1473752235721_0009/__spark_libs__8339309767420855025.zip transitioned from INIT to DOWNLOADING
2016-09-14 11:18:46,211 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.45.10.33:8020/user/root/.sparkStaging/application_1473752235721_0009/__spark_conf__.zip transitioned from INIT to DOWNLOADING
2016-09-14 11:18:46,223 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1473752235721_0009_02_000001
2016-09-14 11:18:47,083 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1473752235721_0009_02_000001.tokens. Credentials list:
2016-09-14 11:18:47,658 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user root
2016-09-14 11:18:47,761 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying from /tmp/hadoop-root/nm-local-dir/nmPrivate/container_1473752235721_0009_02_000001.tokens to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001.tokens
2016-09-14 11:18:47,765 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Localizer CWD set to /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009 = file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009
2016-09-14 11:20:54,352 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.45.10.33:8020/user/root/.sparkStaging/application_1473752235721_0009/__spark_libs__8339309767420855025.zip(->/tmp/hadoop-root/nm-local-dir/usercache/root/filecache/10/__spark_libs__8339309767420855025.zip) transitioned from DOWNLOADING to LOCALIZED
2016-09-14 11:20:55,049 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://10.45.10.33:8020/user/root/.sparkStaging/application_1473752235721_0009/__spark_conf__.zip(->/tmp/hadoop-root/nm-local-dir/usercache/root/filecache/11/__spark_conf__.zip) transitioned from DOWNLOADING to LOCALIZED
2016-09-14 11:20:55,052 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from LOCALIZING to LOCALIZED
2016-09-14 11:20:57,298 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from LOCALIZED to RUNNING
2016-09-14 11:20:57,509 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001/default_container_executor.sh]
2016-09-14 11:20:58,338 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1473752235721_0009_02_000001
2016-09-14 11:21:07,134 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26593 for container-id container_1473752235721_0009_02_000001: 50.3 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used
2016-09-14 11:21:15,218 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 26593 for container-id container_1473752235721_0009_02_000001: 90.9 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used
2016-09-14 11:21:15,224 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_1473752235721_0009_02_000001 has processes older than 1 iteration running over the configured limit. Limit=2254857728, current usage = 2424918016
2016-09-14 11:21:15,412 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=26593,containerID=container_1473752235721_0009_02_000001] is running beyond virtual memory limits. Current usage: 90.9 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1473752235721_0009_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 26593 26591 26593 26593 (bash) 1 0 115838976 119 /bin/bash -c /usr/java/jdk1.8.0_91/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001/tmp -Dspark.yarn.app.container.log.dir=/root/Downloads/hadoop-2.7.2/logs/userlogs/application_1473752235721_0009/container_1473752235721_0009_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg '10.45.10.33:54976' --properties-file /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001/__spark_conf__/__spark_conf__.properties 1> /root/Downloads/hadoop-2.7.2/logs/userlogs/application_1473752235721_0009/container_1473752235721_0009_02_000001/stdout 2> /root/Downloads/hadoop-2.7.2/logs/userlogs/application_1473752235721_0009/container_1473752235721_0009_02_000001/stderr
|- 26597 26593 26593 26593 (java) 811 62 2309079040 23149 /usr/java/jdk1.8.0_91/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001/tmp -Dspark.yarn.app.container.log.dir=/root/Downloads/hadoop-2.7.2/logs/userlogs/application_1473752235721_0009/container_1473752235721_0009_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 10.45.10.33:54976 --properties-file /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001/__spark_conf__/__spark_conf__.properties
2016-09-14 11:21:15,451 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Removed ProcessTree with root 26593
2016-09-14 11:21:15,469 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from RUNNING to KILLING
2016-09-14 11:21:15,471 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Cleaning up container container_1473752235721_0009_02_000001
2016-09-14 11:21:15,891 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_1473752235721_0009_02_000001 is : 143
2016-09-14 11:21:19,717 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
2016-09-14 11:21:19,797 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009/container_1473752235721_0009_02_000001
2016-09-14 11:21:19,811 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root OPERATION=Container Finished - KilleTARGET=ContainerImpl RESULT=SUCCESS APPID=application_1473752235721_0009 CONTAINERID=container_1473752235721_0009_02_000001
2016-09-14 11:21:19,813 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: Container container_1473752235721_0009_02_000001 transitioned from CONTAINER_CLEANEDUP_AFTER_KILL to DONE
2016-09-14 11:21:19,813 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Removing container_1473752235721_0009_02_000001 from application application_1473752235721_0009
2016-09-14 11:21:19,813 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_STOP for appId application_1473752235721_0009
2016-09-14 11:21:21,458 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Stopping resource-monitoring for container_1473752235721_0009_02_000001
2016-09-14 11:21:21,531 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed completed containers from NM context: [container_1473752235721_0009_02_000001]
2016-09-14 11:21:21,536 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1473752235721_0009 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2016-09-14 11:21:21,572 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1473752235721_0009
2016-09-14 11:21:21,585 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: Application application_1473752235721_0009 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2016-09-14 11:21:21,589 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: Scheduling Log Deletion for application: application_1473752235721_0009, with delay of 10800 seconds
2016-09-14 11:21:21,592 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1473752235721_0009
How do I solve this problem? Can anyone give some advice?
I was receiveing this ERROR : 'Attempted to request executors before the AM has registered!'
and landed on this page without answer.. If anyone has the same error, for me the solution was to open Spark ports.
On version Spark 3.1.2, running in Ubuntu 20.04 you have to specify some things in the cluster, so the ports don´t be assigned randomly:
in spark-defaults.conf:
spark.driver.bindAddress 10.0.0.1
spark.driver.host 10.0.0.1
spark.shuffle.service.port 7337
spark.ui.port 4040
spark.blockManager.port 31111
spark.driver.blockManager.port 32222
spark.driver.port 33333
in spark-env.sh:
SPARK_LOCAL_IP=10.0.0.1
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export YARN_CONF_DIR=/opt/hadoop/etc/hadoop
and in workers you put the adresses ot the datanodes.
I'm currently evaluating CloudControl as platform provider for my Java based applications.
I created a very simple Spring Boot (https://github.com/mhmpl/gradle-example-app) app with Gradle but I'm unable to deploy the app.
There are no errors in the Error log which could give me some information. However, this is the output of the Deploy log:
8/3/14 12:53 PM lxc-1272 INFO Container did not come up within 120 seconds.
8/3/14 12:53 PM lxc-1250 INFO Waiting for the container to be reachable...
8/3/14 12:53 PM lxc-1272 INFO Waiting for the container to be reachable...
8/3/14 12:52 PM lxc-1250 INFO Waiting for the container to be reachable...
8/3/14 12:52 PM lxc-1272 INFO Waiting for the container to be reachable...
8/3/14 12:52 PM lxc-1250 INFO Waiting for the container to be reachable...
8/3/14 12:52 PM lxc-1272 INFO Waiting for the container to be reachable...
8/3/14 12:51 PM lxc-1250 INFO Deploying ...
Finally, the app is not deployed and I cannot see an error which I've potentially made. I already tried to set the memory to 1024MB and added a second container, but that did not change anything at all.
You need to bind the webserver to the correct port, which is defined in the PORT environment variable.