osx get pyarrow.lib.ArrowIOError: Unable to load libhdfs - hadoop

import pyarrow as pa
client = pa.hdfs.connect('localhost', 9000)
ERROR
Traceback (most recent call last):
File "/Users/wyx/project/py3.7aio/hdfs/list_dir.py", line 13, in <module>
client = pa.hdfs.connect('localhost', 9000)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyarrow/hdfs.py", line 207, in connect
extra_conf=extra_conf)
File "/Users/wyx/project/py3.7aio/.env/lib/python3.6/site-packages/pyarrow/hdfs.py", line 38, in __init__
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libhdfs
I install hadoop by brew and get any native libs so I build hadoop3.1.1 by Native Libraries Guide but I can't get any
libhdfs.so which the pyarrow need I only get libhdfs.dylib
➜ native git:(branch-3.1.1) ✗ hadoop checknative -a
2019-02-24 22:05:31,686 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
2019-02-24 22:05:31,689 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2019-02-24 22:05:31,695 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
Native library checking:
hadoop: true /usr/local/Cellar/hadoop/3.1.1/libexec/lib/native/libhadoop.dylib
zlib: true /usr/lib/libz.1.dylib
zstd : false
snappy: true /usr/local/lib/libsnappy.1.dylib
lz4: true revision:10301
bzip2: false
openssl: false build does not support openssl.
ISA-L: false libhadoop was built without ISA-L support
2019-02-24 22:05:31,723 INFO util.ExitUtil: Exiting with status 1: ExitException

Related

How do I resolve "AttributeError: 'NoneType' object has no attribute 'origin'" when attempting to run pyspark on macOS

I have installed pyspark on macOS using brew but I'm getting the error when I type pyspark in zsh:
Traceback (most recent call last):
File "/opt/homebrew/bin/find_spark_home.py", line 86, in <module>
print(_find_spark_home())
File "/opt/homebrew/bin/find_spark_home.py", line 52, in _find_spark_home
module_home = os.path.dirname(find_spec("pyspark").origin)
AttributeError: 'NoneType' object has no attribute 'origin'
I've tried setting the path inside the pyspark script but then got
/opt//homebrew/Cellar/apache-spark/3.3.1/bin/load-spark-env.sh: line 2: /opt/homebrew/Cellar/apache-spark/3.3.1/libexec/bin/load-spark-env.sh: Permission denied
/opt//homebrew/Cellar/apache-spark/3.3.1/bin/load-spark-env.sh: line 2: exec: /opt/homebrew/Cellar/apache-spark/3.3.1/libexec/bin/load-spark-env.sh: cannot execute: Undefined error: 0
How do I resolve this error?
I first had to locate and copy the apache spark directory to usr/local:
sudo cp -r /opt/homebrew/Cellar/apache-spark /usr/local/Cellar/
I found the spark directory with sudo find /opt/ -name find_spark_home.py
then I set the environment variables:
SPARK_HOME=/usr/local/Cellar/apache-spark/3.3.1/libexec
export PATH=/usr/local/Cellar/apache-spark/3.3.1/bin:$PATH
after that typing pyspark gives:
Python 3.9.6 (default, Oct 18 2022, 12:41:40)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
23/01/23 13:31:11 WARN Utils: Your hostname, Reggies-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.0.20 instead (on interface en0)
23/01/23 13:31:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/23 13:31:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Python version 3.9.6 (default, Oct 18 2022 12:41:40)
Spark context Web UI available at http://192.168.0.20:4040
Spark context available as 'sc' (master = local[*], app id = local-1674498672860).
SparkSession available as 'spark'.

could not load snappy native libraries for HBase

I have been trying and reading different blogs but failed to get snappy Libraries check TRUE.
OS in use - CentOs 6.9
Java Version & Path
java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[root#hadoop1 bin]# $JAVA_HOME
-bash: /usr/local/jdk1.8.0_121: is a directory
Output of - hadoop checknative -a
17/10/26 11:16:13 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
17/10/26 11:16:13 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop-2.7.1/lib/native/libhadoop.so
zlib: true /lib64/libz.so.1
snappy: false
lz4: true revision:99
bzip2: false
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
17/10/26 11:16:13 INFO util.ExitUtil: Exiting with status 1
hbase org.apache.hadoop.util.NativeLibraryChecker
2017-10-26 10:46:07,878 WARN [main] bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
2017-10-26 10:46:07,881 INFO [main] zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/local/hadoop-2.7.1/lib/native/libhadoop.so
zlib: true /lib64/libz.so.1
snappy: false
lz4: true revision:99
bzip2: false
Few statements from : hbase-env.sh
export JAVA_HOME="/usr/local/jdk1.8.0_121"
export HBASE_LIBRARY_PATH=/usr/local/hadoop-2.7.1/lib/native/Linux-amd64-64:/usr/local/hadoop-2.7.1/lib/native
(commented for now, tried uncommenting it too)
export LD_LIBRARY_PATH=/usr/local/hbase-1.2.6/lib/native/Linux-amd64-64
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/usr/local/hadoop-2.7.1/lib/native
I have all the *.so required in the required path.
output of - ps -ef | grep hbase to check the paths checked by HBase for libraries.

Does NativeLoader supported on Windows?

I have build Hadoop 2.7.3 from source, all succeeded. I am using a prebuild Spark 2.0 binary with Hadoop 2.7 support. When I start the spark-shell, I got this warning.
16/09/23 14:53:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
hadoop checknative -a gives me
16/09/23 14:59:47 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
16/09/23 14:59:47 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
Native library checking:
hadoop: true D:\hadoop-2.7.3\bin\hadoop.dll
zlib: false
snappy: false
lz4: true revision:99
bzip2: false
openssl: false build does not support openssl.
winutils: true D:\hadoop-2.7.3\bin\winutils.exe
16/09/23 14:59:47 INFO util.ExitUtil: Exiting with status 1
Do I have to get native build for all the libraries? I checked the Hadoop build instruction, and I could not find any information about build the other libraries.
Or maybe there's some miss configuration in my Spark. But I could not figure out what. I have these environment variable set for my Spark:
set HADOOP_HOME=D:/hadoop-2.7.3
set HADOOP_CONF_DIR=%HADOOP_HOME%/etc/hadoop
set SPARK_HOME=D:/spark-2.0.0-bin-hadoop2.7
set HADOOP_COMMON_LIB_NATIVE_DIR=%HADOOP_HOME%/bin
set SPARK_LOCAL_IP=

Hadoop command `hadoop fs -ls` gives ConnectionRefused error

When I run hadoop command like hadoop fs -ls, I get following error/warnings:
16/08/04 11:24:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From master/172.17.100.54 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
Am I doing anything wrong with the hadoop path?
Hadoop Native Libraries Guide say its some thing to do with
installation. please check documentation to resolve this.
Native Hadoop Library
Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the *nix platforms the library is named libhadoop.so.
Please note the following:
It is mandatory to install both the zlib and gzip development packages on the target platform in order to build the native hadoop library; however, for deployment it is sufficient to install just one package if you wish to use only one codec.
It is necessary to have the correct 32/64 libraries for zlib, depending on the 32/64 bit jvm for the target platform, in order to build and deploy the native hadoop library.
Runtime
The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: -Djava.library.path=<path>
During runtime, check the hadoop log files for your MapReduce tasks.
If everything is all right, then: DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... INFO util.NativeCodeLoader - Loaded the native-hadoop library
If something goes wrong, then: INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Check
NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. You can launch NativeLibraryChecker as follows
$ hadoop checknative -a
14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib/x86_64-linux-gnu/libz.so.1
snappy: true /usr/lib/libsnappy.so.1
lz4: true revision:99
bzip2: false
Second thing Connection refused is something related to your setup. please double check setup.
also see the below as pointers..
Hadoop cluster setup - java.net.ConnectException: Connection refused
Hadoop - java.net.ConnectException: Connection refused

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

alpesh#alpesh-Inspiron-3647:~/hadoop-2.7.2/sbin$ hadoop fs -ls
16/07/05 13:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
It is also showing me the the output as below
hadoop check native -a
16/07/05 14:00:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
16/07/05 14:00:42 INFO util.ExitUtil: Exiting with status 1
Please help me to solve this
Library you are using is compiled for 32 bit and you are using 64 bit version. so open your .bashrc file where configuration for hadoop exists. Go to this line
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
and replace it with
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
To get rid of this error:
Suppose Jar file is at /home/cloudera/test.jar and class file is at /home/cloudera/workspace/MapReduce/bin/mapreduce/WordCount, where mapreduce is the package name.
Input file mytext.txt is at /user/process/mytext.txt and output file location is /user/out.
We should run this mapreduce program in following way:
$hadoop jar /home/cloudera/bigdata/text.jar mapreduce.WordCount /user/process /user/out
Add these properties in bash profile of Hadoop user, the issue will be solved
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
It's just a warning, because it can not find the correct .jar. Either by compiling it or because it does not exist.
If I were you, I would simply omit it
To do that add in the corresponding configuration file
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

Resources