I want to connect pyarrow to read to and write parquet file in hdfs
But I am facing some connectivity issue
I installed pyarrow and python pandas now I am trying to connect with hdfs
in remote machine
Reference link - https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae
import pyarrow as pa
host = '172.17.0.2'
port = 8020
fs = pa.hdfs.connect(host, port)
Error messages
>>> fs = pa.hdfs.connect(host, port)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 211, in connect
extra_conf=extra_conf)
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 36, in __init__
_maybe_set_hadoop_classpath()
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 136, in _maybe_set_hadoop_classpath
classpath = _hadoop_classpath_glob('hadoop')
File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 161, in _hadoop_classpath_glob
return subprocess.check_output(hadoop_classpath_args)
File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
Related
Did any one face this issue while downloading the latest Anaconda3 and opened spyder?
An error ocurred while starting the kernel
The error is:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site‑packages\spyder\plugins\ipythonconsole\plugin.py", line 1173, in create_kernel_manager_and_kernel_client
kernel_manager.start_kernel(stderr=stderr_handle, **kwargs)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\manager.py", line 301, in start_kernel
kernel_cmd, kw = self.pre_start_kernel(**kw)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\manager.py", line 248, in pre_start_kernel
self.write_connection_file()
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\connect.py", line 474, in write_connection_file
kernel_name=self.kernel_name
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_client\connect.py", line 138, in write_connection_file
with secure_write(fname) as f:
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 112, in __enter__
return next(self.gen)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_core\paths.py", line 435, in secure_write
win32_restrict_file_to_user(fname)
File "C:\Users\user\AppData\Roaming\Python\Python37\site‑packages\jupyter_core\paths.py", line 361, in win32_restrict_file_to_user
import win32api
ImportError: DLL load failed: %1 is not a valid Win32 application.
Any help will be appreciated !!
I have installed a new cluster HDP 3 using ambari 2.7.
The problem is that resource manager service is not starting.
I get the following error:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 275, in <module>
Resourcemanager().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/resourcemanager.py", line 158, in start
service('resourcemanager', action='start')
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/YARN/package/scripts/service.py", line 92, in service
Execute(daemon_cmd, user = usr, not_if = check_process)
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/system.py", line 263, in action_run
returns=self.resource.returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/hdp/3.0.0.0-1634/hadoop/libexec && /usr/hdp/3.0.0.0-1634/hadoop-yarn/bin/yarn --config /usr/hdp/3.0.0.0-1634/hadoop/conf --daemon start resourcemanager' returned 1.
Team, im getting permission denied error [Errno 13] Permission denied while extracting spark package (2.1.0) in local installation. I have admin access to folders and verified in Security as well. Any pointers will be helpful
Environment details
OS - **Windows 7**
(C:\conda) C:\conda>conda --version
**conda 4.3.30**
(C:\conda) C:\conda>python --version
**Python 3.6.3 :: Anaconda, Inc.**
(C:\conda) C:\conda>jupyter --version
**4.3.0**
(C:\conda) C:\conda>pip --version
**pip 9.0.1** from C:\conda\lib\site-packages (python 3.6)
From anaconda terminal - jupyter pixiedust install
←[32;1mStep 1: **PIXIEDUST_HOME**: C:\conda\pixiedust-master←[0m
Keep y/n [y]? **y**
←[32;1mStep 2: **SPARK_HOME**: C:\conda\pixiedust-master\bin\spark←[0m
Keep y/n [y]? y
←[32;1mDirectory C:\conda\pixiedust-master\bin\spark does not contain a valid SPARK install←[0m
**Download Spark** y/n [y]? y
←[32;1mWhat version would you like to download? 1.6.3, 2.0.2, 2.1.0, 2.2.0 [2.2.0]: ←[0m**2.1.0**
F 82%
F 100%
Error Details - Below
←[FExtracting Spark 2.1.0 to C:\conda\pixiedust-master\bin\spark
Traceback (most recent call last): File "c:\conda\lib\runpy.py",
line 193, in _run_module_as_main
"main", mod_spec)
File "c:\conda\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\conda\Scripts\jupyter-pixiedust.EXE__main__.py", line 9,
in
File "c:\conda\lib\site-packages\install\pixiedustapp.py", line 41,
in main
PixiedustJupyterApp.launch_instance()
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 657, in launch_instance
app.initialize(argv)
File "", line 2, in initialize
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 296, in initialize
self.parse_command_line(argv)
File "", line 2, in parse_command_line
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 514, in parse_command_line
return self.initialize_subcommand(subc, subargv)
File "", line 2, in initialize_subcommand
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 452, in initialize_subcommand
self.subapp.initialize(argv)
File "", line 2, in initialize
File "c:\conda\lib\site-packages\traitlets\config\application.py",
line 87, in catch_config_error
return method(app, *args, **kwargs)
File "c:\conda\lib\site-packages\jupyter_core\application.py", line
239, in initialize
self.parse_command_line(argv)
File "c:\conda\lib\site-packages\install\createKernel.py", line 150,
in parse_command_line
self.download_spark(silent, silent_spark_version)
File "c:\conda\lib\site-packages\install\createKernel.py", line 409,
in download_spark
self.extract_temp_file(temp_file, self.spark_home)
File "c:\conda\lib\site-packages\install\createKernel.py", line 478,
in extract_temp_file
tar = tarfile.open(temp_file.name, "r:gz")
File "c:\conda\lib\tarfile.py", line 1586, in open
return func(name, filemode, fileobj, **kwargs)
File "c:\conda\lib\tarfile.py", line 1633, in gzopen
fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
File "c:\conda\lib\gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:\Temp\tmpnt0i718r.tgz'
Unfortunately, Windows is not a supported platform by the PixieDust install script as mentioned in the PixieDust site documentation: https://ibm-watson-data-lab.github.io/pixiedust/install.html.
As a workaround I suggest to use a docker container with Linux or MacOS.
I am running on a mac, with sagemath installed and Anaconda also.
Sage is working fine, though Jupyter notebook doesn't run.
I get the following error:
Rois-MBP:~ roi$ /anaconda/bin/jupyter_mac.command ; exit;
[W 22:32:09.192 NotebookApp] Unrecognized JSON config file version, assuming version 1
Traceback (most recent call last):
File "/anaconda/bin/jupyter-notebook", line 6, in <module>
sys.exit(notebook.notebookapp.main())
File "//anaconda/lib/python3.5/site-packages/jupyter_core/application.py", line 267, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "//anaconda/lib/python3.5/site-packages/traitlets/config/application.py", line 595, in launch_instance
app.initialize(argv)
File "<decorator-gen-7>", line 2, in initialize
File "//anaconda/lib/python3.5/site-packages/traitlets/config/application.py", line 74, in catch_config_error
return method(app, *args, **kwargs)
File "//anaconda/lib/python3.5/site-packages/notebook/notebookapp.py", line 1069, in initialize
self.init_configurables()
File "//anaconda/lib/python3.5/site-packages/notebook/notebookapp.py", line 837, in init_configurables
parent=self,
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 19, in __init__
specs = self.find_kernel_specs() or {}
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 129, in find_kernel_specs
self.conda_info = self._conda_info()
File "//anaconda/lib/python3.5/site-packages/nb_conda_kernels/manager.py", line 29, in _conda_info
p = subprocess.check_output(["conda", "info", "--json"]
File "//anaconda/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "//anaconda/lib/python3.5/subprocess.py", line 693, in run
with Popen(*popenargs, **kwargs) as process:
File "//anaconda/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "//anaconda/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'conda'
logout
Saving session...
...copying shared history...
...saving history...truncating history files...
...completed.
Other apps like spyder do run successfully.
Can I solve it somehow?
Found a way to run jupyter notebook through sage:
Just run from the terminal: ./sage -n jupyter
I'm using petrel 0.9.3 and apache storm with the same version. When I try to run a topology I'm getting the following error:
(petrel)[root#localhost example1]# petrel submit --config topology.yaml --logdir `pwd`
[Errno 2] No such file or directory
Traceback (most recent call last):
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 111, in main
func(**args.__dict__)
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 32, in submit
sourcejar = get_sourcejar()
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 23, in get_sourcejar
storm_version = get_storm_version()
File "/usr/local/petrel/lib/python2.7/site-packages/petrel-0.9.3.0.3-py2.7.egg/petrel/cmdline.py", line 17, in get_storm_version
version = subprocess.check_output(['storm', 'version']).strip()
File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
I had to add the storm bin location to $PATH