Apache Arrow connectivity issue with HDFS (Remote file-system) - parquet

I want to connect pyarrow to read to and write parquet file in hdfs
But I am facing some connectivity issue
I installed pyarrow and python pandas now I am trying to connect with hdfs
in remote machine
Reference link - https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae
import pyarrow as pa
host = '172.17.0.2'
port = 8020
fs = pa.hdfs.connect(host, port)
Error messages
>>> fs = pa.hdfs.connect(host, port)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 211, in connect
extra_conf=extra_conf)
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 36, in __init__
_maybe_set_hadoop_classpath()
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 136, in _maybe_set_hadoop_classpath
classpath = _hadoop_classpath_glob('hadoop')
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 161, in _hadoop_classpath_glob
return subprocess.check_output(hadoop_classpath_args)
File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

Related

Spyder latest version

Did any one face this issue while downloading the latest Anaconda3 and opened spyder?
An error ocurred while starting the kernel
The error is:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site‑packages\spyder\plugins\ipythonconsole\plugin.py", line 1173, in create_kernel_manager_and_kernel_client
kernel_manager.start_kernel(stderr=stderr_handle, **kwargs)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\manager.py", line 301, in start_kernel
kernel_cmd, kw = self.pre_start_kernel(**kw)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\manager.py", line 248, in pre_start_kernel
self.write_connection_file()
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\connect.py", line 474, in write_connection_file
kernel_name=self.kernel_name
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\connect.py", line 138, in write_connection_file
with secure_write(fname) as f:
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_core\paths.py", line 435, in secure_write
win32_restrict_file_to_user(fname)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_core\paths.py", line 361, in win32_restrict_file_to_user
import win32api
ImportError: DLL load failed: %1 is not a valid Win32 application.
Any help will be appreciated !!

Hortonworks HDP 3 : Error starting ResourceManager

I have installed a new cluster HDP 3 using ambari 2.7.
The problem is that resource manager service is not starting.
I get the following error:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 275, in <module>
Resourcemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 158, in start
service('resourcemanager', action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 92, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.0.0-1634/hadoop/libexec && /usr/hdp/3.0.0.0-1634/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.0.0-1634/hadoop/conf --daemon start resourcemanager' returned 1.

Permission denied error while installing pixiedust jupyter

Team, im getting permission denied error [Errno 13] Permission denied while extracting spark package (2.1.0) in local installation. I have admin access to folders and verified in Security as well. Any pointers will be helpful
Environment details
OS - **Windows 7**
(C:\conda) C:\conda>conda --version
**conda 4.3.30**
(C:\conda) C:\conda>python --version
**Python 3.6.3 :: Anaconda, Inc.**
(C:\conda) C:\conda>jupyter --version
**4.3.0**
(C:\conda) C:\conda>pip --version
**pip 9.0.1** from C:\conda\lib\site-packages (python 3.6)
From anaconda terminal - jupyter pixiedust install
←[32;1mStep 1: **PIXIEDUST_HOME**: C:\conda\pixiedust-master←[0m
Keep y/n [y]? **y**
←[32;1mStep 2: **SPARK_HOME**: C:\conda\pixiedust-master\bin\spark←[0m
Keep y/n [y]? y
←[32;1mDirectory C:\conda\pixiedust-master\bin\spark does not contain a valid SPARK install←[0m
**Download Spark** y/n [y]? y
←[32;1mWhat version would you like to download? 1.6.3, 2.0.2, 2.1.0, 2.2.0 [2.2.0]: ←[0m**2.1.0**
F 82%
F 100%
Error Details - Below
←[FExtracting Spark 2.1.0 to C:\conda\pixiedust-master\bin\spark
Traceback (most recent call last): File "c:\conda\lib\runpy.py",
line 193, in _run_module_as_main
"main", mod_spec)
File "c:\conda\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\conda\Scripts\jupyter-pixiedust.EXE__main__.py", line 9,
in
File "c:\conda\lib\site-packages\install\pixiedustapp.py", line 41,
in main
PixiedustJupyterApp.launch_instance()
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 657, in launch_instance
app.initialize(argv)
File "", line 2, in initialize
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 296, in initialize
self.parse_command_line(argv)
File "", line 2, in parse_command_line
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 514, in parse_command_line
return self.initialize_subcommand(subc, subargv)
File "", line 2, in initialize_subcommand
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 452, in initialize_subcommand
self.subapp.initialize(argv)
File "", line 2, in initialize
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\jupyter_core\application.py", line
239, in initialize
self.parse_command_line(argv)
File "c:\conda\lib\site-packages\install\createKernel.py", line 150,
in parse_command_line
self.download_spark(silent, silent_spark_version)
File "c:\conda\lib\site-packages\install\createKernel.py", line 409,
in download_spark
self.extract_temp_file(temp_file, self.spark_home)
File "c:\conda\lib\site-packages\install\createKernel.py", line 478,
in extract_temp_file
tar = tarfile.open(temp_file.name, "r:gz")
File "c:\conda\lib\tarfile.py", line 1586, in open
return func(name, filemode, fileobj, **kwargs)
File "c:\conda\lib\tarfile.py", line 1633, in gzopen
fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
File "c:\conda\lib\gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:\Temp\tmpnt0i718r.tgz'
Unfortunately, Windows is not a supported platform by the PixieDust install script as mentioned in the PixieDust site documentation: https://ibm-watson-data-lab.github.io/pixiedust/install.html.
As a workaround I suggest to use a docker container with Linux or MacOS.

Jupyter does not run with sagemath installed

I am running on a mac, with sagemath installed and Anaconda also.
Sage is working fine, though Jupyter notebook doesn't run.
I get the following error:
Rois-MBP:~ roi$ /anaconda/bin/jupyter_mac.command ; exit;
[W 22:32:09.192 NotebookApp] Unrecognized JSON config file version, assuming version 1
Traceback (most recent call last):
File "/anaconda/bin/jupyter-notebook", line 6, in <module>
sys.exit(notebook.notebookapp.main())
File "//anaconda/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "//anaconda/lib/python3.5/site-packages/traitlets/config/application.py", line 595, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "//anaconda/lib/python3.5/site-packages/traitlets/config/application.py", line 74, in catch_config_error
return method(app, *args, **kwargs)
File "//anaconda/lib/python3.5/site-packages/notebook/notebookapp.py", line 1069, in initialize
self.init_configurables()
File "//anaconda/lib/python3.5/site-packages/notebook/notebookapp.py", line 837, in init_configurables
parent=self,
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 19, in __init__
specs = self.find_kernel_specs() or {}
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 129, in find_kernel_specs
self.conda_info = self._conda_info()
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 29, in _conda_info
p = subprocess.check_output(["conda", "info", "--json"]
File "//anaconda/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "//anaconda/lib/python3.5/subprocess.py", line 693, in run
with Popen(*popenargs, **kwargs) as process:
File "//anaconda/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "//anaconda/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'conda'
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
Other apps like spyder do run successfully.
Can I solve it somehow?
Found a way to run jupyter notebook through sage:
Just run from the terminal: ./sage -n jupyter

Can't run a topology in Apache Storm

I'm using petrel 0.9.3 and apache storm with the same version. When I try to run a topology I'm getting the following error:
(petrel)[root#localhost example1]# petrel submit --config topology.yaml --logdir `pwd`
[Errno 2] No such file or directory
Traceback (most recent call last):
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 111, in main
func(**args.__dict__)
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 32, in submit
sourcejar = get_sourcejar()
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 23, in get_sourcejar
storm_version = get_storm_version()
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 17, in get_storm_version
version = subprocess.check_output(['storm', 'version']).strip()
File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
I had to add the storm bin location to $PATH

Resources