How to install pyspark without hadoop? - hadoop

I want to install pyspark but I don't want to use hadoop because I just want to test out some functions. I followed instructions from a bunch of websites: I used pip to install pyspark, installed jdk 8 and set JAVA_PATH, SPARK_HOME, PATH variables but it's not working.
My program is:
from pyspark import *
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
I am getting this exception:
\Java\jdk1.8.0_291\bin\java was unexpected at this time.
Traceback (most recent call last):
File "c:\Users\ankit\Untitled-1.py", line 4, in <module>
spark = SparkSession.builder.getOrCreate()
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\sql\session.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 384, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\context.py", line 331, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Users\ankit\AppData\Local\Programs\Python\Python39\lib\site-packages\pyspark\java_gateway.py", line 108, in launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

Related

VirtualBox anaconda installation failing

Having trouble installing anaconda on my Ubuntu VirtualBox. Have tried rebooting and have tried assigning a bigger chunk of base memory but still failing at the final few hurdles.
Unpacking payload ...
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "concurrent/futures/process.py", line 368, in _queue_management_worker
File "multiprocessing/connection.py", line 251, in recv
TypeError: __init__() missing 1 required positional argument: 'msg'
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "entry_point.py", line 69, in <module>
File "concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
File "concurrent/futures/_base.py", line 611, in result_iterator
File "concurrent/futures/_base.py", line 439, in result
File "concurrent/futures/_base.py", line 388, in __get_result
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or
pending.
[1981] Failed to execute script entry_point

"No module named error" when try to run Elasticalert

When I try to run elastalert
python -m elastalert.elastalert --verbose --start 2019-09-04 --rule rules/rule.yaml --config config.yaml
I get following error.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/elastalert-0.2.1-py2.7.egg/elastalert/elastalert.py", line 29, in <module>
from . import kibana
File "/usr/local/lib/python2.7/dist-packages/elastalert-0.2.1-py2.7.egg/elastalert/kibana.py", line 4, in <module>
import urllib.error
ImportError: No module named error
My environment is
ubuntu 18
elasticsearch 6.3.0

Airflow initdb fails due to SyntaxError on importing AsyncRetrying

I am new to Airflow. I install it by pip install apache-airflow. When I run command airflow initdb in terminal then I am getting the error below. Where did I go wrong during install, and how can I fix this issue?
aamir#aamir-Inspiron-3542:~$ airflow initdb
[2019-03-30 18:32:27,309] {__init__.py:51} INFO - Using executor SequentialExecutor
DB: sqlite:////home/aamir/airflow/airflow.db
[2019-03-30 18:32:31,790] {db.py:338} INFO - Creating tables
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [airflow.models.DagBag] Failed to import: /home/aamir/anaconda3/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py
Traceback (most recent call last):
File "/home/aamir/anaconda3/lib/python3.7/site-packages/airflow/models.py", line 374, in process_file
m = imp.load_source(mod_name, filepath)
File "/home/aamir/anaconda3/lib/python3.7/imp.py", line 171, in load_source
module = _load(spec)
File "<frozen importlib._bootstrap>", line 696, in _load
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/aamir/anaconda3/lib/python3.7/site-packages/airflow/example_dags/example_http_operator.py", line 27, in <module>
from airflow.operators.http_operator import SimpleHttpOperator
File "/home/aamir/anaconda3/lib/python3.7/site-packages/airflow/operators/http_operator.py", line 21, in <module>
from airflow.hooks.http_hook import HttpHook
File "/home/aamir/anaconda3/lib/python3.7/site-packages/airflow/hooks/http_hook.py", line 23, in <module>
import tenacity
File "/home/aamir/anaconda3/lib/python3.7/site-packages/tenacity/__init__.py", line 352
from tenacity.async import AsyncRetrying
^
SyntaxError: invalid syntax
Done.
In Python 3.7, async is a reserved keyword, which means it cannot be used in module and variable names. This was valid in prior Python versions, but starting from 3.7, a SyntaxError is raised.
In your case, Airflow comes pre-installed with example DAGs, which were parsed when you ran airflow initdb. Some of those DAGs make use of the SimpleHttpOperator which depends on http_hook.py. That hook furthermore depends on the tenacity library, which tries to import an async module as part of initialization:
from tenacity.async import AsyncRetrying
To fix this, wait for/install Airflow v1.10.3 which updates Tenacity (see AIRFLOW-2876). Alternatively, you can downgrade your Python version. You can see that the import fails using 3.7.3:
$ docker run --rm -it python:3.7
Python 3.7.3 (default, Mar 27 2019, 23:40:30)
>>> from tenacity.async import AsyncRetrying
File "<stdin>", line 1
from tenacity.async import AsyncRetrying
^
SyntaxError: invalid syntax
But it works fine in version 3.6.8:
$ docker run --rm -it python:3.6
Python 3.6.8 (default, Feb 6 2019, 12:07:20)
>>> from tenacity.async import AsyncRetrying
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'tenacity'

Ambari error during installation yarn

i'm trying to install IBM Open Platform 4.1 on Linux RedHat6; i created local repository for IOP and IOP-UTILS, because of proxy that block connection to ibm.
I installed ambari correctly but during installation of all packages i get this error:
1
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install 'hadoop_4_1_*-yarn'' returned 1.
Error: Nothing to do

Ubuntu 10.04 - Python multiprocessing - 'module' object has no attribute 'local' error

The following code is from the python 2.6 manual.
from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', 'me')
print('parent process:', os.getppid())
print('process id:', os.getpid())
def f(name):
info('function f')
print('hello', name)
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()
This creates the following stack traces:
Traceback (most recent call last):
File "threading.py", line 1, in <module>
from multiprocessing import Process
File "/usr/lib/python2.6/multiprocessing/__init__.py", line 64, in <module>
from multiprocessing.util import SUBDEBUG, SUBWARNING
File "/usr/lib/python2.6/multiprocessing/util.py", line 287, in <module>
class ForkAwareLocal(threading.local):
AttributeError: 'module' object has no attribute 'local'
Exception AttributeError: '_shutdown' in <module 'threading' from '/home/v0idnull/tmp/pythreads/threading.pyc'> ignored
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.6/multiprocessing/util.py", line 258, in _exit_function
info('process shutting down')
TypeError: 'NoneType' object is not callable
Error in sys.exitfunc:
Traceback (most recent call last):
File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/usr/lib/python2.6/multiprocessing/util.py", line 258, in _exit_function
info('process shutting down')
TypeError: 'NoneType' object is not callable
I'm completely clueless as to WHY this is happening, and google has given me very little to work with.
that code runs fine on my machine:
Ubuntu 10.10, Python 2.6.6 64-bit.
but your error is actually because you have a file named 'threading.py' that you are running this code from (see the stack-trace details). this is causing a namespace mismatch, since the multiprocessing module needs the 'real' threading module. try renaming your file to something other than 'threading.py' and running it again.
also... the example you posted is not from the Python 2.6 docs... it is from the Python 3.x docs. make sure you are reading the docs for the version that matches what you are running.

Resources