Has someone successfully run Flink jobs with this kind of setup (Github CI CD and Kubernetes)?
Since Flink jobs can’t be dockerized and deployed in a natural way as part
of the container I am not very sure of how is the
best way of doing this.
Thanks
Yes, this can be done. For the dockerizing portion, see the the docs about running Flink on Docker and running Flink on Kubernetes, as well as Patrick Lukas' Flink Forward talk on "Flink in Containerland". You'll find links to docker hub, github, slideshare, and youtube behind these links.
dA Platform 2 is a commercial offering from data Artisans that supports CI/CD integrations for Flink on Kubernetes. The demo video from the product announcement at Flink Forward Berlin 2017 illustrates this.
I solved this stuff. Flink jobs can be dockerized and deployed in a natural way as part of the container.
I extend flink Docker and added "Flink run "some.jar"" step to it.
It works perfect
https://github.com/Aleksandr-Filichkin/flink-k8s
Related
I have installed a single-node Apache YARN with Kafka-Zookeeper and Spring Cloud Dataflow 1.0.3.
All is working fine, but when I made some deployment examples, like:
stream create --name "ticktock" --definition "time | hdfs --rollover=100" --deploy
http --port=8000 | log --name=logtest --level=INFO
The stream's status never stays on "deployed". It keeps changing on "undeployed"->"parcial"->"deployed" in a constant loop.
However, the YARN application deploys successfully, but it's like the comunication between Spring Cloud Dataflow server instance and Apache Hadoop it's constantly failing.
What can be the possible issue for this?
Thanks in advance!
Well, there's not a lot of information to go on here, but you may want to check and make sure you have the necessary base directory created in HDFS and that your Yarn user has read/write permission to it.
spring.cloud.deployer.yarn.app.baseDir=/dataflow
Thanks to all for your answers! Been busy with other high-priority projects, but I was able to build all using Ambari (including the plugin mentioned by Sabby Adannan). Now all is working great!.
Hi
hi i want to implement hadoop in cloudsim by netbeans software.
could you please guide me.
meanwhile i already have source of hadoop classes.
CloudSim is a simulator and it does not perform any actual task. Hadoop is a framework which is used to solve large problems using master slave approach. If you want to deploy Hadoop framework on cloud then try making a private cloud using openstack, then create instances in it and then try to deploy Hadoop in it. It is not possible to deploy hadoop framework in cloudsim as it's just a simulator
I want to use Big Data Analytics for my work. I have already implemented all the docker stuff creating containers within containers. I am new to Big Data however and I have come to know that using Hadoop for HDFS and using Spark instead of MapReduce on Hadoop itself is the best way for websites and applications when speed matters (is it?). Will this work on my Docker containers? It'd be very helpful if someone could direct me somewhere to learn more.
You can try playing with Cloudera QuickStart Docker Image to get started. Please take a look at https://hub.docker.com/r/cloudera/quickstart/. This docker image supports single-node deployment of Cloudera's Hadoop platform, and Cloudera Manager. Also this docker image supports spark too.
The hadoop documentation states that DCE does not support a cluster with secure mode (Kerberos): https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
Are people working on this? Is there a way around this limitation?
Ok. There's no current work on DCE (YARN-2466). Efforts have shifted towards supporting Docker containers in the LinuxContainerExecutor (YARN-3611). This will support Kerberos. There is currently no documentation yet (YARN-5258) and many of these features are expected to be part of the 2.8 Apache release.
Source and more info:
https://community.hortonworks.com/questions/39064/can-i-run-dce-docker-container-executor-on-yarn-wi.html
I have been using a Hadoop cluster, created using Google's script, for a few months.
Every time I boot the machines I have to manually start Hadoop using:
sudo su hadoop
cd /home/hadoop/hadoop-install/sbin
./start-all.sh
Besides scripting, how can I resolve this?
Or is this just the way it is by default?
(The first boot after cluster creation always starts Hadoop automatically, why not always?)
You have to configure using init.d.
Document provide more details and sample script for datameer. You need to follow similar steps. Script should be smart enough to check all the nodes in the cluster are up before invoking this script using ssh.
While different third-party scripts and "getting started" solutions like Cloud Launcher have varying degrees of support for automatic restart of Hadoop on boot, the officially supported tools are bdutil as a do-it-yourself deployment tool, and Google Cloud Dataproc as a managed service, both of which are already configured with init.d and/or systemd to automatically start Hadoop on boot.
More detailed instructions on using bdutil here.