How to use two versions of spark shell? - hadoop

I have Spark 1.6.2 and Spark 2.0 installed on my hortonworks cluster.
Both these versions are installed on a node in the Hadoop Cluster of 5 nodes.
Each time I start the spark-shell I get:
$ spark-shell
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
When I check the version I get:
scala> sc.version
res0: String = 1.6.2
How can I start the other version(spark-shell of Spark2.0)?

export SPARK_MAJOR_VERSION=2
You just need to give the major version 2 or 1.
$ export SPARK_MAJOR_VERSION=2
$ spark-submit --version
SPARK_MAJOR_VERSION is set to 2, using Spark2
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0.2.5.0.0-1245

Working this approach:
spark-shell
loads Spark 1.6
whilst typing
spark2-shell
loads Spark 2.0

$ SPARK_MAJOR_VERSION=2 spark-shell

use spark2-submit, pyspark2 or spark2-shell

if you are using windows 8 or 10 change the environment variables for spark_home for spark2 version or spark3 version whichever you want to use and change the path variable too. and close terminal and restart it
and launch the sparkshell you will be able to see your default version

Related

minikube's got loads of space, but the pod says no space left on device

I'm trying to run Elasticsearch on minikube on my mac. I'm following the instructions from the Elasticsearch helm repo here.
I'm starting minikube like this:
minikube start --memory 8192 --cpus 4 --disk-size 50000mb
which starts fine and indicates the addons listed in the Elasticsearch helm minikube example README are included.
🌟 Enabled addons: storage-provisioner, default-storageclass
Versions:
➜ minikube version
minikube version: v1.23.1
commit: 84d52cd81015effbdd40c632d9de13db91d48d43
➜ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:38:26Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
I initialise helm with helm init and then start elasticsearch with
helm install --name elasticsearch elastic/elasticsearch -f values.yaml
The values are from here https://github.com/elastic/helm-charts/blob/master/elasticsearch/examples/minikube/values.yaml and ask elasticsearch to allow all the pods to run on one node, and request less resources.
I check the elasticsearch pods (there are three) and they've all complained about the same thing:
MountVolume.SetUp failed for volume "kube-api-access-lxmxc" : write /var/lib/kubelet/pods/5ce1f0e8-6d43-48a0-bac6-55eab6eafc97/volumes/kubernetes.io~projected/kube-api-access-lxmxc/..2021_09_19_06_47_54.897372328/namespace: no space left on device
So I log into the minikube VM and navigate into that folder and check the space:
➜ minikube ssh
_ _
_ _ ( ) ( )
___ ___ (_) ___ (_)| |/') _ _ | |_ __
/' _ ` _ `\| |/' _ `\| || , < ( ) ( )| '_`\ /'__`\
| ( ) ( ) || || ( ) || || |\`\ | (_) || |_) )( ___/
(_) (_) (_)(_)(_) (_)(_)(_) (_)`\___/'(_,__/'`\____)
$ cd /var/lib/kubelet
$ df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 43G 1.5G 39G 4% /var/lib/kubelet
$ df -ih .
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/vda1 25M 13K 25M 1% /var/lib/kubelet
which seems fine (though I'm not gonna pretend I ever understood what inodes are).
My problem is that I'm not sure where to look next? I've a pod that says "there's not enough space left on the device" and a device that says "I've got quite a lot of space actually!" and so clearly I'm missing the real problem. But I'm clueless as to what it might actually be!
Turns out the latest helm chart has some bugs, so I reset to an earlier tag and tried again and all good! I didn't learn anything though, other than that helm charts can have bugs.
Let me try to explain it to you. Generally, you shouldn't use the latest tag at all. Helm needs an image and a specific version of it. Look at this example for the alpine image:
The latest docker tag is the latest release version (https://github.com/helm/helm/releases/latest)
Please avoid to use latest tag for any production deployment. Tag with right version is the proper way, such as alpine/helm:3.1.1
You can also learn a lot from this article: What's Wrong With The Docker :latest Tag?. It is related to docker, but in Helm you use these images. Note what can happen if you use the latest tag.
You can also read this good article. It explain how to proper create helm charts:
We will use AppVersion as the Docker image tag for our application. That allows us to upgrade Helm chart with new version of Application by just changing value in Chart.yaml
image: "{{ .Values.image.repository }}:{{ default .Chart.AppVersion .Values.image.tag }}"
Therefore, whenever you use the helm chart, you should make sure what images and in what versions are used. I agree that you can come across images that contain the tag latest. You can always change it ;)

Starting newer Karaf (4.2.8) on linux, feature:repo-add camel does not work

maybe somebody can help me.
I, was testing karaf in my local windows laptop an everything work fine. But now I'm trying to install it on Centos Linux server, but Im not getting the same results.
My steps:
1-Donwload Karaf from http://www.apache.org/dyn/closer.lua/karaf/4.2.8/apache-karaf-4.2.8.tar.gz
2- Unzipped in the centos linux path: /usr/local/apache-karaf-4.2.8
3- Check the java version installed in the server
JAVA_HOME=/usr/local/jdk1.8.0_251
PATH=.../usr/local/jdk1.8.0_251/jre/bin..
4- run sudo ./bin/karaf and then
karaf: JAVA_HOME not set; results may vary
__ __ ____
/ //_/____ __________ _/ __/
/ ,< / __ / ___/ __/ /_
/ /| |/ // / / / // / __/
// ||__,// __,//
Apache Karaf (4.2.8)
Anyway the shell become active, but I tried to add features and I receved this error ( It doesn't seem to be related to the previous message)
karaf#root()> feature:repo-add camel
Adding feature url mvn:org.apache.camel.karaf/apache-camel/RELEASE/xml/features
Error executing command: Error resolving artifact org.apache.camel.karaf:apache-camel:xml:features:RELEASE: [Failed to resolve version for org.apache.camel.karaf:apache-camel:xml:features:RELEASE: Could not find metadata org.apache.camel.karaf:apache-camel/maven-metadata.xml in local (/root/.m2/repository)] : mvn:org.apache.camel.karaf/apache-camel/RELEASE/xml/features
I did the same on my PC and everything worked perfectly. Im missing something specific for Linux ? Any Idea?
Thanks in advance
Karaf uses etc/org.ops4j.pax.url.mvn.cfg to list all the m2 repos from where he can downloads your bundles/features.
By default, the value for this distribution is :
org.ops4j.pax.url.mvn.repositories= \
https://repo1.maven.org/maven2#id=central, \
https://repository.apache.org/content/groups/snapshots-group#id=apache#snapshots#noreleases, \
https://oss.sonatype.org/content/repositories/ops4j-snapshots#id=ops4j.sonatype.snapshots.deploy#snapshots#noreleases
However, in your case, it seems it tries to get it from /root/.m2/repository. Could you check your etc/org.ops4j.pax.url.mvn.cfg and see if there is no modification to it ?

decanter framework doesnt work as expected in OpenDaylight Karaf shell

I was looking for logging frameworks available in OpenDaylight Controller.
Something similar to ELK stack.
I found apache decanter as a possible way to do this.
https://karaf.apache.org/manual/decanter/latest-1/
The problem is that it works fine with normal karaf shell but doesnt work so with the ODL karaf shell of Oxygen SR4 release.
As per the documentation,
https://karaf.apache.org/download.html#decanter-installation
feature:repo-add decanter
feature:install decanter-appender-elasticsearch
feature:install decanter-collector-log
feature:install decanter-collector-jmx
I tried the same in ODL karaf shell.
I downloaded the Oxygen-SR4 binary and started the karaf shell.
./karaf clean Apache Karaf starting up. Press Enter to open the shell now... 100% [========================================================================]
Karaf started in 0s. Bundle stats: 13 active, 13 total
________ ________ .__ .__ .__ __
\_____ \ ______ ____ ____ \______ \ _____ ___.__.| | |__| ____ | |___/ |_
/ | \\____ \_/ __ \ / \ | | \\__ \< | || | | |/ ___\| | \ __\
/ | \ |_> > ___/| | \| ` \/ __ \\___ || |_| / /_/ > Y \ |
\_______ / __/ \___ >___| /_______ (____ / ____||____/__\___ /|___| /__|
\/|__| \/ \/ \/ \/\/ /_____/ \/
Hit '<tab>' for a list of available commands and '[cmd] --help' for help on a specific command. Hit '<ctrl-d>' or type 'system:shutdown' or 'logout' to shutdown OpenDaylight.
opendaylight-user#root>system:version
4.1.6
opendaylight-user#root>feature:repo-add decanter Adding feature url
opendaylight-user#root>feature:install decanter-appender-elasticsearch
org.apache.karaf.features.core[org.apache.karaf.features.internal.service.FeaturesServiceImpl] : null
But the same thing works with plain apache karaf shell.
./karaf
__ __ ____
/ //_/____ __________ _/ __/
/ ,< / __ `/ ___/ __ `/ /_
/ /| |/ /_/ / / / /_/ / __/
/_/ |_|\__,_/_/ \__,_/_/
Apache Karaf (4.2.5)
Hit '<tab>' for a list of available commands and '[cmd] --help' for help on a specific command. Hit '<ctrl-d>' or type 'system:shutdown' or 'logout' to shutdown Karaf.
karaf#root()> feature:repo-add decanter Adding feature url mvn:org.apache.karaf.decanter/apache-karaf-decanter/RELEASE/xml/features karaf#root()> feature:install decanter-appender-elasticsearch karaf#root()>
Can anyone point out what is missing here because I feel the shell versions are similar?
Can you also suggest some other logging frameworks to process Karaf logs and data in OpenDaylight Controller(Oxygen SR4) something similar to ELK stack.
we use decanter in upstream OpenDaylight system testing. the features we
install (using the featuresBoot variable in etc/org.apache.karaf.features.cfg are:
odl-jolokia,decanter-collector-jmx,decanter-appender-elasticsearch
but, we also configure the featuresRepositories to have:
mvn:org.apache.karaf.decanter/apache-karaf-decanter/1.0.0/xml/features
here is a wiki page with some extra info.
here is an example of us grabbing data to find Mem Usage and we also
install elasticsearch which lets us see it as a graph over time
Hope it helps.

On my Mac, hadoop 3.1.0 finds it native libraries, but spark 2.3.1 does not

Intro
I know that the answer in 99% of cases to this error message:
WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Is simply "It's just a warning, don't worry about it", and then sometimes followed by "Just download the libraries, compile them and point HADOOP_HOME to this folder and add $HADOOP_HOME/bin/native to your LD_LIBRARY_PATH"
That's what I did, but I'm still getting the error, and after two days of googling I'm starting to feel there's something really interesting to learn if I manage to fix this, There is currently a strange behaviour that I do not understand, hopefully we can work through this together.
Ok, so here's what's up:
Hadoop finds the native libraries
Running a hadoop checknative -a gives me this:
dds-MacBook-Pro-2:~ Rkey$ hadoop checknative -a
2018-07-15 16:18:25,956 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
2018-07-15 16:18:25,959 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2018-07-15 16:18:25,963 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop: true /usr/local/Cellar/hadoop/3.1.0/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
zstd : false
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:10301
bzip2: false
openssl: false build does not support openssl.
ISA-L: false libhadoop was built without ISA-L support
2018-07-15 16:18:25,986 INFO util.ExitUtil: Exiting with status 1: ExitException
There are some errors here, which might be the cause, but most importantly for now this line is present:
hadoop: true /usr/local/Cellar/hadoop/3.1.0/lib/native/libhadoop.dylib
When I start my hadoop cluster, this is how it looks:
dds-MacBook-Pro-2:~ Rkey$ hstart
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [dds-MacBook-Pro-2.local]
Starting resourcemanager
Starting nodemanagers
No warnings. I downloaded the hadoop source and built it myself. Before I did that, there where "Cannot find native library"-warnings on starting hadoop.
However, spark does not find the native libraries
This is how it looks when I run pyspark:
dds-MacBook-Pro-2:~ Rkey$ pyspark
Python 3.7.0 (default, Jun 29 2018, 20:13:53)
[Clang 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2018-07-15 16:22:22 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.3.1
/_/
This is where our old friend makes a re-appearence:
WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I find this very strange, because I know for a fact it uses the same hadoop that I can start on my own without any warnings. There are no other hadoop installations on my computer.
Clarifications
I downloaded the non-hadoop version of apache-spark from their website called "Pre-build with user-provided Apache Hadoop". This was then put in my Cellar-folder just because I did not want to re-link everything.
As for variables, this is my ~/.profile
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_172.jdk/Contents/Home
export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2o_2
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3
export HADOOP_HOME=/usr/local/Cellar/hadoop/3.1.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
alias hstart="$HADOOP_HOME/sbin/start-dfs.sh;$HADOOP_HOME/sbin/start-yarn.sh"
alias hstop="$HADOOP_HOME/sbin/stop-dfs.sh;$HADOOP_HOME/sbin/stop-yarn.sh"
And here are my additions to spark-env.sh:
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export LD_LIBRARY_PATH=/usr/local/Cellar/hadoop/3.1.0/lib/native/:$LD_LIBRARY_PATH
This is how the folder /usr/local/Cellar/hadoop/3.1.0/lib/native looks:
The question
How is it that hadoop can start locally without giving a warning that it's missing its libraries, and goes through the checknatives -a command showing that it finds the native libraries, but when the same hadoop is launched through pyspark I'm suddenly given this warning again?
Update 16/7
I recently made a discovery. The standard version of this classic error message looks like this:
WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
This is actually different, as my error message instead says NativeCodeLoader:60, with 60 instead of 62. This points towards my theory that it is not really the hadoop libraries missing, but some native libraries that hadoop is using that are missing. This is why hadoop can launch without warnings, but pyspark that likely tries to use more native libraries from hadoop, launches with warnings.
This is still just a theory, and until I remove all warnings from the checknative -a call I do not know.
Update 15/7
Currently trying to remove the "WARN bzip2.Bzip2Factory:"-warning from hadoop checknative -a, perhaps this might remove the warning when launching pyspark.
I had the same question as yours. In my case, it is because, from macOS X El Capitan, the SIP mechanism in macOS makes the operating system ignore the LD_LIBRARY_PATH/DYLD_LIBRARY_PATH even though you have already added the Hadoop native library to the value of any of these variables (I get this information from https://help.mulesoft.com/s/article/Variables-LD-LIBRARY-PATH-DYLD-LIBRARY-PATH-are-ignored-on-MAC-OS-if-System-Integrity-Protect-SIP-is-enable).
Actually, the NativeCodeLoader warning from Spark can be ignored. However, if you really want to let the warning go away, you can disable the SIP on macOS X, and then make sure to add $HADOOP_HOME/lib/native to LD_LIBRARY_PATH. Then, the spark can find the Hadoop native library correctly.

Why Spring Roo is showing only a single addon?

Running addon search --refresh returns only a single result:
roo> addon search --refresh
Successfully downloaded Roo add-on Data
1 found, sorted by rank; T = trusted developer; R = Roo 1.2 compatible
ID T R DESCRIPTION -------------------------------------------------------------
01 - Y 0.1.0.BUILD Hindi language support for Spring Roo Web MVC JSP
Scaffolding; #mvc,#localization,locale:hi
--------------------------------------------------------------------------------
I am using spring-roo version 1.2.2
roo> version
____ ____ ____
/ __ \/ __ \/ __ \
/ /_/ / / / / / / /
/ _, _/ /_/ / /_/ /
/_/ |_|\____/\____/ 1.2.2.RELEASE [rev 7d75659]
There have been a handful of problems with the addon repository in recent months - see here, for example. The (somewhat annoying) workaround is that if you can locate the URL for an addon you're interested in, you can still load it via the 'osgi obr url add' and 'osgi obr deploy' commands. You can see an example here.

Resources