Running zeppelin on spark cluster mode - hadoop

I am using this tutorial spark cluster on yarn mode in docker container to launch zeppelin in spark cluster in yarn mode. However I am stuck at step 4. I can't find conf/zeppelin-env.sh in my docker container to put further configuration. I tried putting these conf folder of zeppelin but yet now successful. Apart from that zeppelin notebook is also not running on localhost:9001.
I am very new to distributed system, it would be great if someone can help me start zeppelin on spark cluster in yarn mode.
Here is my docker-compose file to enable zeppelin talk with spark cluster.
version: '2'
services:
sparkmaster:
build: .
container_name: sparkmaster
ports:
- "8080:8080"
- "7077:7077"
- "8888:8888"
- "8081:8081"
- "8082:8082"
- "5050:5050"
- "5051:5051"
- "4040:4040"
zeppelin:
image: dylanmei/zeppelin
container_name: zeppelin-notebook
env_file:
- ./hadoop.env
environment:
ZEPPELIN_PORT: 9001
CORE_CONF_fs_defaultFS: "hdfs://namenode:8020"
HADOOP_CONF_DIR_fs_defaultFS: "hdfs://namenode:8020"
SPARK_MASTER: "spark://spark-master:7077"
MASTER: "yarn-client"
SPARK_HOME: spark-master
ZEPPELIN_JAVA_OPTS: >-
-Dspark.driver.memory=1g
-Dspark.executor.memory=2g
ports:
- 9001:9001
volumes:
- ./data:/usr/zeppelin/data
- ./notebooks:/usr/zeppelin/notebook

this is the dockerfile you used to launch the standalone spark cluster.
https://github.com/apache/zeppelin/blob/master/scripts/docker/spark-cluster-managers/spark_standalone/Dockerfile
But there is no Zeppelin instance inside the container, so you have to use Zeppelin on your local machine.
Please download and use it.

Related

Facing error response from daemon-Windows

I am trying to run apache Kafka on windows using docker and my docker-compose.yml code is as follows:
version: "3"
services:
spark:
image: jupyter/pyspark-notebook
ports:
- "9092:9092"
- "4010-4109:4010-4109"
volumes:
- ./notebooks:/home/jovyan/work/notebooks/
zookeeper:
image: 'bitnami/zookeeper:latest'
container_name: zookeeper
ports:
- '2181:2181'
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka:
image: 'bitnami/kafka:latest'
container_name: kakfa
ports:
- '9092:9092'
environment:
- KAFKA_BROKER_ID=1
- KAFKA_LISTENERS=PLAINTEXT://:9092
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://127.0.0.1:9092
- KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
depends_on:
- zookeeper
When I execute the command
docker-compose -f docker-compose.yml up
I get an error: Error response from daemon: driver failed programming external connectivity on endpoint kafka-spark-1 (452eae1760b7860e3924c0e630943f825a809272760c8aa8bbb2f58ab2865377): Bind for 0.0.0.0:9092 failed: port is already allocated
I have tried net stop winnat and net start winnat, unfortunately this solution didn't work.
Would appreciate any kind of help!
Spark isn't running Kafka
Remove the ports here
image: jupyter/pyspark-notebook
ports:
- "9092:9092"
Also, change variable for Kafka to use the proper hostname, otherwise Spark will not work with it...
KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
Then you can also remove ports for Kafka container since you wouldn't have access from the host. Unless you add external listeners.
You may also be interested in an example notebook I use to test PySpark with Kafka.

Docker containers become unresponsive/hang on error

I'm running Docker Desktop on Windows and am having a problem with containers becoming unresponsive on startup errors. This doesn't happen 'every' time, but by far most of the time. Consequently, I have to be very careful to start my containers 1 at a time, and if I see one error, I have to "Restart Docker Desktop" and start the starting again.
I'm using docker-compose and as a specific example, this morning I started elasticsearch, zookeeper, then kafka. Kafka threw an exception regarding the zookeeper state and shuts down - but now the kafka container is unresponsive in docker. I can't stop it (it's already stopped?) but it shows as running. I can't CLI into it, I can't restart it. The only way forwards is to restart docker using the debug menu. (If I have the restart:always flag on, then the containers will actually restart automatically, but given they're throwing errors, it will just spin around in circles starting then dying without my being able to stop/kill/remove the offending container)
Once I've restarted docker, I'll be able to view the log of the container and see the error that was thrown...
This happens with pretty much all of my containers, however it does appear that if I start the container whilst viewing the log window within Docker Desktop, it is perhaps 'more likely' that I'll be able to start the container again if it has an error.
I've tried several different containers and this seems to be a pretty common issue for us, it doesn't appear to relate to any specific settings that I'm passing into the containers, however an extract from our docker-compose file is below:
volumes:
zData:
kData:
eData:
zookeeper:
container_name: zookeeper
image: bitnami/zookeeper:latest
environment:
ALLOW_ANONYMOUS_LOGIN: "yes" #Dev only
ZOOKEEPER_ROOT_LOGGER: WARN, CONSOLE
ZOOKEEPER_CONSOLE_THRESHOLD: WARN
ports:
- "2181:2181"
volumes:
- zData:/bitnami/zookeeper:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
kafka:
container_name: kafka
image: bitnami/kafka:latest
depends_on:
- zookeeper
environment:
ALLOW_PLAINTEXT_LISTENER: "yes" # Debug only
KAFKA_ADVERTISED_PORT: 9092
KAFKA_ADVERTISED_HOST_NAME: kafka
KAFKA_CREATE_TOPICS: xx1_event:1:1,xx2_event:1:1,xx3_event:1:1,xx4_event:1:1
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=${DOCKER_HOSTNAME} -Dcom.sun.management.jmxremote.rmi.port=9096 -Djava.net.preferIPv4Stack=true
JMX_PORT: 9096
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
hostname: kakfa
ports:
- 9092:9092
- 9096:9096
volumes:
- kData:/bitnami/kafka:rw
logging:
driver: "fluentd"
options:
fluentd-address: localhost:24224
tag: zookeeper
fluentd-async-connect: "true"
elasticsearch:
image: bitnami/elasticsearch:latest
container_name: elasticsearch
cpu_shares: 2048
environment:
ELASTICSEARCH_HEAP_SIZE: "2048m"
xpack.monitoring.enabled: "false"
ports:
- 9200:9200
- 9300:9300
volumes:
- C:/config/elasticsearch.yml:/opt/bitnami/elasticsearch/config/my_elasticsearch.yml:rw
- eData:/bitnami/elasticsearch/data:rw
I've wondered about the potential for this to be a resourcing issue, however I'm running this on an a reasonably spec'd laptop (i7 laptop, SSD, 16GB RAM) using WSL2 (also happens when using Hyper-V) and RAM limits don't look like they're being approached. And when there are no errors on startup, the system runs fine and uses far more resources.
Any ideas on what I could try? I'm surprised there's not many more people struggling with this?
There is currently an issue https://github.com/moby/moby/issues/40063 where containers will hang/freeze/become unresponsive when logging is set to fluentd in asynchronous mode AND the fluentd container is not operational.

Running Sonarqube with docker-compose using bind mount volumes

I’m trying to run Sonarqube in a Docker container on a Centos 7 server using docker-compose. Everything works as expected using named volumes as configured in this docker-compose.yml file:
version: "3"
services:
sonarqube:
image: sonarqube
ports:
- "9000:9000"
networks:
- sonarnet
environment:
- sonar.jdbc.url=jdbc:postgresql://db:5432/sonar
volumes:
- sonarqube_conf:/opt/sonarqube/conf
- sonarqube_data:/opt/sonarqube/data
- sonarqube_extensions:/opt/sonarqube/extensions
- sonarqube_bundled_plugins:/opt/sonarqube/lib/bundled-plugins
db:
image: postgres
networks:
- sonarnet
environment:
- POSTGRES_USER=sonar
- POSTGRES_PASSWORD=sonar
volumes:
- postgresql:/var/lib/postgresql
- postgresql_data:/var/lib/postgresql/data
networks:
sonarnet:
driver: bridge
volumes:
sonarqube_conf:
sonarqube_data:
sonarqube_extensions:
sonarqube_bundled_plugins:
postgresql:
postgresql_data:
However, my /var/lib/docker/volumes directory is not large enough to house the named volumes. So, I changed the docker-compose.yml file to use bind mount volumes as shown below.
version: "3"
services:
sonarqube:
image: sonarqube
ports:
- "9000:9000"
networks:
- sonarnet
environment:
- sonar.jdbc.url=jdbc:postgresql://db:5432/sonar
volumes:
- /data/sonarqube/conf:/opt/sonarqube/conf
- /data/sonarqube/data:/opt/sonarqube/data
- /data/sonarqube/extensions:/opt/sonarqube/extensions
- /data/sonarqube/bundled_plugins:/opt/sonarqube/lib/bundled-plugins
db:
image: postgres
networks:
- sonarnet
environment:
- POSTGRES_USER=sonar
- POSTGRES_PASSWORD=sonar
volumes:
- /data/postgresql:/var/lib/postgresql
- /data/postgresql_data:/var/lib/postgresql/data
networks:
sonarnet:
driver: bridge
However, after running docker-compose up -d, the app starts up but none of the bind mount volumes are written to. As a result, the Sonarqube plugins are not loaded and the sonar postgreSQL database is not initialized. I thought it may be a selinux issue, but I temporarily disabled it with no success. I’m unsure what to look at next.
I think my answer from "How to persist configuration & analytics across container invocations in Sonarqube docker image" would help you as well.
For good measure I have also pasted it in here:
.....
Notice this line SONARQUBE_HOME in the Dockerfile for the docker-sonarqube image. We can control this environment variable.
When using docker run. Simply do:
txt
docker run -d \
...
...
-e SONARQUBE_HOME=/sonarqube-data
-v /PERSISTENT_DISK/sonarqubeVolume:/sonarqube-data
This will make Sonarqube create the conf, data and so forth folders and store data therein. As needed.
Or with Kubernetes. In your deployment YAML file. Do:
txt
...
...
env:
- name: SONARQUBE_HOME
value: /sonarqube-data
...
...
volumeMounts:
- name: app-volume
mountPath: /sonarqube-data
And the name in the volumeMounts property points to a volume in the volumes section of the Kubernetes deployment YAML file.
This again will make Sonarqube use the /sonarqube-data mountPath for creating extenions, conf and so forth folders, then save data therein.
And voila your Sonarqube data is thereby persisted.
I hope this will help others.
N.B. Notice that the YAML and Docker run examples are not exhaustive. They focus on the issue of persisting Sonarqube data.
Try it out BobC and let me know.
Have a great day.
The below code will help you in a single command I hope so.
Create a new docker-compose file named as docker-compose.yaml,
version: "3"
services:
sonarqube:
image: sonarqube:8.2-community
depends_on:
- db
ports:
- "9000:9000"
networks:
- sonarqubenet
environment:
SONAR_JDBC_URL: jdbc:postgresql://db:5432/sonarqube
SONAR_JDBC_USERNAME: sonar
SONAR_JDBC_PASSWORD: sonar
volumes:
- sonarqube_data:/opt/sonarqube/data
- sonarqube_extensions:/opt/sonarqube/extensions
- sonarqube_logs:/opt/sonarqube/logs
- sonarqube_temp:/opt/sonarqube/temp
restart: on-failure
container_name: sonarqube
db:
image: postgres
networks:
- sonarqubenet
environment:
POSTGRES_USER: sonar
POSTGRES_PASSWORD: sonar
volumes:
- postgresql:/var/lib/postgresql
- postgresql_data:/var/lib/postgresql/data
restart: on-failure
container_name: postgresql
networks:
sonarqubenet:
driver: bridge
volumes:
sonarqube_data:
sonarqube_extensions:
sonarqube_logs:
sonarqube_temp:
postgresql:
postgresql_data:
Then, execute the command,
$ docker-compose up -d
$ docker container ps
Sounds like the container is running and, as you mentioned, Sonarqube starts-up. When it starts, is it showing that it's using the H2 in memory db? After running docker-compose up -d, use docker logs -f <container_name> to see what's happening on Sonarqube startup.
To simplify viewing your logs with a known name, I suggest you also add a container name to your Sonarqube service. For example, container_name: sonarqube.
Also, while I know the plan is to deprecate the use of environment variables for the username, password and jdbc connection, I've had better luck in docker-compose using environment variables rather than the corresponding property value. For the connection string, try: SONARQUBE_JDBC_URL: jdbc:postgresql://db/sonar without specifying the default port for postgres.

Apache Storm could not find leader nimbus from seed hosts

I have installed Apache-Storm using docker compose
docker-compose.yml:
kafka:
image: spotify/kafka
ports:
- "9092:9092"
- "2181:2181"
environment:
ADVERTISED_HOST: 172.16.8.37
ADVERTISED_PORT: 9092
nimbus:
command: --daemon nimbus drpc
image: fhuz/docker-storm
ports:
- 3773:3773
- 3772:3772
- 6627:6627
links:
- kafka:zk
supervisor:
command: --daemon supervisor logviewer
image: fhuz/docker-storm
ports:
- 8000:8000
- 6700:6700
- 6701:6701
- 6702:6702
- 6703:6703
links:
- kafka:zk
ui:
command: --daemon ui
image: fhuz/docker-storm
ports:
- 8080:8080
links:
- kafka:zk
elasticsearch:
image: elasticsearch:2.4.1
ports:
- 9300:9300
- 9200:9200
I run and there are no problems, then when I Access to the Storm-UI: XXXX.XX.XX.XX:8080 there is the Storm UI, but It throws the next error:
I have try some answers and solutions of other users of StackOverflow but it still failing.
Internal Server Error:
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts ["127.0.0.1"]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:90)
at org.apache.storm.ui.core$nimbus_summary.invoke(core.clj:388)
at org.apache.storm.ui.core$fn__12489.invoke(core.clj:937)
at org.apache.storm.shade.compojure.core$make_route$fn__4604.invoke(core.clj:93)
at org.apache.storm.shade.compojure.core$if_route$fn__4592.invoke(core.clj:39)
at org.apache.storm.shade.compojure.core$if_method$fn__4585.invoke(core.clj:24)
at org.apache.storm.shade.compojure.core$routing$fn__4610.invoke(core.clj:106)
at clojure.core$some.invoke(core.clj:2570)
at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:106)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at clojure.core$apply.invoke(core.clj:632)
at org.apache.storm.shade.compojure.core$routes$fn__4614.invoke(core.clj:111)
at org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__11958.invoke(json.clj:56)
at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__5680.invoke(multipart_params.clj:103)
at org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__11140.invoke(reload.clj:22)
at org.apache.storm.ui.helpers$requests_middleware$fn__5907.invoke(helpers.clj:46)
at org.apache.storm.ui.core$catch_errors$fn__12679.invoke(core.clj:1224)
at org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__5611.invoke(keyword_params.clj:27)
at org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__5651.invoke(nested_params.clj:65)
at org.apache.storm.shade.ring.middleware.params$wrap_params$fn__5582.invoke(params.clj:55)
at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__5680.invoke(multipart_params.clj:103)
at org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__5866.invoke(flash.clj:14)
at org.apache.storm.shade.ring.middleware.session$wrap_session$fn__5854.invoke(session.clj:43)
at org.apache.storm.shade.ring.middleware.cookies$wrap_cookies$fn__5782.invoke(cookies.clj:160)
at org.apache.storm.shade.ring.util.servlet$make_service_method$fn__5488.invoke(servlet.clj:127)
at org.apache.storm.shade.ring.util.servlet$servlet$fn__5492.invoke(servlet.clj:136)
at org.apache.storm.shade.ring.util.servlet.proxy$javax.servlet.http.HttpServlet$ff19274a.service(Unknown Source)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:654)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1320)
at org.apache.storm.logging.filters.AccessLoggingFilter.handle(AccessLoggingFilter.java:47)
at org.apache.storm.logging.filters.AccessLoggingFilter.doFilter(AccessLoggingFilter.java:39)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:247)
at org.apache.storm.shade.org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:210)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1291)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:443)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1044)
at org.apache.storm.shade.org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:372)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:978)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.apache.storm.shade.org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.apache.storm.shade.org.eclipse.jetty.server.Server.handle(Server.java:369)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:486)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:933)
at org.apache.storm.shade.org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:995)
at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at org.apache.storm.shade.org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.apache.storm.shade.org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
at org.apache.storm.shade.org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at org.apache.storm.shade.org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Your Supervisor and Nimbus are running on different machines. By default, Storm looks for Nimbus on localhost using this parameter:
nimbus.seeds : ["localhost"]
That's the error you get, it cannot find Nimbus on the local machine. You need to add that field under the Supervisor, with the IP of the machine running the Nimbus process.
Make sure you have started your Zookeeper server and client before running Storm:
$ bin/zkServer.sh start
$ bin/zkCli.sh

How to have "RUN" command in docker-compose similar to dockerfile?

Docker file
FROM elasticsearch:2
RUN /usr/share/elasticsearch/bin/plugin install --batch cloud-aws
from https://www.elastic.co/blog/elasticsearch-docker-plugin-management
Can someone plz help me to add ES plugin in docker-compose file?
version: '2'
services:
nitrogen:
build: .
ports:
- "8000:8000"
volumes:
- ~/mycode:/mycode
depends_on:
- couchdb
- elasticsearch
elasticsearch:
image: elasticsearch:1.7.5
volumes:
- ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
ports:
- "9200:9200"
- "9300:9300"
In above docker-compose missing is installation of plugin.
Tried this but it runs on local machine, instead of docker container.
command: /usr/share/elasticsearch/bin/plugin install elasticsearch/elasticsearch-river-couchdb/2.6.0
You have to create your own docker image like my-elasticsearch with the Dockerfile you mentioned, then in docker-compose.yml to refer to that image.

Resources