Spark - How to colourise terminal output from spark-submit

Spark - How to colourise terminal output from spark-submit - macos

When I run spark-submit it works successfully, but the output is not colourised.
(/Users/me/bai/conda-envs/spark-mllib-kmeans) me#my-mbp spark-mllib-kmeans % spark-submit spark-helloWorld.py
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-spark/3.0.1/libexec/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/12/22 12:18:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/12/22 12:18:34 INFO SparkContext: Running Spark version 3.0.1
20/12/22 12:18:34 INFO ResourceUtils: ==============================================================
20/12/22 12:18:34 INFO ResourceUtils: Resources for spark.driver:
20/12/22 12:18:34 INFO ResourceUtils: ==============================================================
20/12/22 12:18:34 INFO SparkContext: Submitted application: Simple App
...
I am using Spark version 3.0.1:
(base) me#my-mbp spark-mllib-kmeans % spark-shell --version
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-spark/3.0.1/libexec/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 14.0.1
Branch HEAD
Compiled by user ubuntu on 2020-08-28T08:58:35Z
Revision 2b147c4cd50da32fe2b4167f97c8142102a0510d
Url https://gitbox.apache.org/repos/asf/spark.git
Type --help for more information.
I am using the default Mac terminal program on latest Mac OS:
% uname -a
Darwin my-mbp.lan 20.2.0 Darwin Kernel Version 20.2.0: Wed Dec 2 20:39:59 PST 2020; root:xnu-7195.60.75~1/RELEASE_X86_64 x86_64
I would like to see the different log statement levels (WARN/INFO/ERROR) in different colours. Perhaps other use of colours to differentiate the output from spark framework and the output from my application.
Given there is so much framework level output and there is noise of WARNINGS due to framework issues, I was hoping better use of colour could help me to scan my output quicker.
Is there a simple solution for this?
I see this behaviour in both native Mac Terminal and MS VSC integrated terminal.
I saw the output line:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
I see I can create then edit the log4j config file here, so perhaps I just need the config for the log4j config file to colourise all output.
% cd $SPARK_HOME/conf
% cp log4j.properties.template log4j.properties

I got the same warning but it works properly without errors.

Related

How do I resolve "AttributeError: 'NoneType' object has no attribute 'origin'" when attempting to run pyspark on macOS

I have installed pyspark on macOS using brew but I'm getting the error when I type pyspark in zsh:
Traceback (most recent call last):
File "/opt/homebrew/bin/find_spark_home.py", line 86, in <module>
print(_find_spark_home())
File "/opt/homebrew/bin/find_spark_home.py", line 52, in _find_spark_home
module_home = os.path.dirname(find_spec("pyspark").origin)
AttributeError: 'NoneType' object has no attribute 'origin'
I've tried setting the path inside the pyspark script but then got
/opt//homebrew/Cellar/apache-spark/3.3.1/bin/load-spark-env.sh: line 2: /opt/homebrew/Cellar/apache-spark/3.3.1/libexec/bin/load-spark-env.sh: Permission denied
/opt//homebrew/Cellar/apache-spark/3.3.1/bin/load-spark-env.sh: line 2: exec: /opt/homebrew/Cellar/apache-spark/3.3.1/libexec/bin/load-spark-env.sh: cannot execute: Undefined error: 0
How do I resolve this error?

I first had to locate and copy the apache spark directory to usr/local:
sudo cp -r /opt/homebrew/Cellar/apache-spark /usr/local/Cellar/
I found the spark directory with sudo find /opt/ -name find_spark_home.py
then I set the environment variables:
SPARK_HOME=/usr/local/Cellar/apache-spark/3.3.1/libexec
export PATH=/usr/local/Cellar/apache-spark/3.3.1/bin:$PATH
after that typing pyspark gives:
Python 3.9.6 (default, Oct 18 2022, 12:41:40)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
23/01/23 13:31:11 WARN Utils: Your hostname, Reggies-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.0.20 instead (on interface en0)
23/01/23 13:31:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/23 13:31:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Python version 3.9.6 (default, Oct 18 2022 12:41:40)
Spark context Web UI available at http://192.168.0.20:4040
Spark context available as 'sc' (master = local[*], app id = local-1674498672860).
SparkSession available as 'spark'.

Warning issue when installing Hadoop 2.8.2 on Mac OS

I am attempting to install Hadoop 2.8.2 with Java 8 on Mac OS, but i ran into this error after typing the hstart command in terminal. (I initially had Java 9 installed, but switched to Java 8 becuase i thought that Java 9 was causing the issue, but the warning still remains.
WARNING: An illegal reflective access operation has occurred WARNING:
Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/usr/local/Cellar/hadoop/2.8.2/libexec/share/hadoop/common/lib/hadoop-auth-2.8.2.jar)
to method sun.security.krb5.Config.getInstance() WARNING: Please
consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil WARNING:
Use --illegal-access=warn to enable warnings of further illegal
reflective access operations WARNING: All illegal access operations
will be denied in a future release 18/02/04 23:38:54 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable Starting
namenodes on [localhost] localhost: namenode running as process 34039.
Stop it first. localhost: datanode running as process 34125. Stop it
first. Starting secondary namenodes [0.0.0.0] The authenticity of host
'0.0.0.0 (0.0.0.0)' can't be established.
This is in my "/usr/local/Cellar/hadoop/2.8.2/libexec/etc/hadoop/hadoop-env.sh" file:
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="

Hadoop command `hadoop fs -ls` gives ConnectionRefused error

When I run hadoop command like hadoop fs -ls, I get following error/warnings:
16/08/04 11:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From master/172.17.100.54 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Am I doing anything wrong with the hadoop path?

Hadoop Native Libraries Guide say its some thing to do with
installation. please check documentation to resolve this.
Native Hadoop Library
Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so.
Please note the following:
It is mandatory to install both the zlib and gzip development packages on the target platform in order to build the native hadoop library; however, for deployment it is sufficient to install just one package if you wish to use only one codec.
It is necessary to have the correct 32/64 libraries for zlib, depending on the 32/64 bit jvm for the target platform, in order to build and deploy the native hadoop library.
Runtime
The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: -Djava.library.path=<path>
During runtime, check the hadoop log files for your MapReduce tasks.
If everything is all right, then: DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... INFO util.NativeCodeLoader - Loaded the native-hadoop library
If something goes wrong, then: INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Check
NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. You can launch NativeLibraryChecker as follows
$ hadoop checknative -a
14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: false
Second thing Connection refused is something related to your setup. please double check setup.
also see the below as pointers..
Hadoop cluster setup - java.net.ConnectException: Connection refused
Hadoop - java.net.ConnectException: Connection refused

Hadoop command error

I have installed Hadoop-2.4.0 on sigle node cluster. After starting the dfs and yarn and executing the jps I get the following services running..
6584 ResourceManager
5976 NameNode
6706 NodeManager
6407 SecondaryNameNode
6148 DataNode
7471 Jps
When I try to execute the following command I get the error
hduser#dhruv-VirtualBox:/usr/local/hadoop$ bin/hdfs dfs -mkdir /hello
OpenJDK 64-Bit Server VMwarning: You have loaded llibrary
/usr/local/hadoop-2.4.0/lib/native/libhadoop.so.1.0.0 which might
have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack' -c
', or link it with '-z noexecstack. 14/10/22 12:21:36 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin java classes where applicable.
Can please somebody suggest me, what is wrong and how to rectify this ?
Thanks
Dhruv

You can ignore that if you want. It means that you are running a 32-bit native libraries on a 64-bit runtime.
If the log still annoys you, then you need to build those native libraries on a 64-bit environment.

Command "hadoop fs -ls ." does not work

I think I have installed hadoop correctly. If I do jps I can see the namenode and datanode, no problem.
When I type hadoop fs -ls . I get the error:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/db/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/08/08 12:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: '.': No such file or directory
When I type hadoop dfs -ls . I get the error:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/db/hadoop-2.4.1/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/08/08 12:43:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: '.': No such file or directory
And when I type hadoop hdfs -ls . I get the error:
Error: Could not find or load main class hdfs
This is regardless of whether I put '.' or '/' or whatever directory I'm in.
What does this all mean? How can I get normal, expected output? What am I missing?

Use
hdfs dfs -ls ...
I dont think there is such a thing as hadoop hdfs

use the command as follows
bin/hadoop fs -ls /

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio