I am trying to import modules into executescript processor in nifi.
As suggested , I. am giving full path into the modules directory.
example:
Module Directory: /var/lib/nifi/Levenshtein --> which contains necessary files for the script.
Furthermore, In the script also I have set the system path pointing to use that module directory
My code Looks something like this
import re
import datetime
import sys
sys.path.append('/var/lib/nifi/Levenshtein')
import Levenshtein
When I am running the processor with above code it fails.
ERROR: No Module named Levenshtein in at line number 3.
If this particular library is a "native module" (compiled C code), Jython (the Python execution engine used by ExecuteScript) will not be able to load it. ExecuteScript in NiFi using Python can only use pure Python code.
The work-around is to use ExecuteProcess or ExecuteStreamCommand and invoke python <my_script.py> on the command-line, which can handle various Python versions, native modules, etc. This execution will occur outside the JVM and use real Python, not Jython.
To sum up what Andy said, this Levensthein module is written in C and cannot be executed by a Java virtual machine, assuming you are running a Jython implementation.
Related
I'm running a script which imports a module from a file in the same directory. The first time I run the script after starting the cluster the script runs as expected. Any subsequent times I run the script I get the following error: ModuleNotFoundError: No module named 'ex_cls'
How do I get Ray to recognize modules I'm importing after the first run?
I am using Ray 1.11.0 on a redhat Linux cluster.
Here are my scripts. Both are located in the /home/ray_experiment directory:
--ex_main.py
import sys
sys.path.insert(0, '/home/ray_experiment')
from ex_cls import monitor_wrapper
import ray
ray.init(address='auto')
from ray.util.multiprocessing import Pool
def main():
pdu_infos = range(10)
with Pool() as pool:
results = pool.map(monitor_wrapper, [pdu for pdu in pdu_infos])
for pdu_info, result in zip(pdu_infos, results):
print(pdu_info, result)
if __name__ == "__main__":
main()
--ex_cls.py
import sys
from time import time, sleep
from random import randint
import collections
sys.path.insert(0, '/home/ray_experiment')
MonitorResult = collections.namedtuple('MonitorResult', 'key task_time')
def monitor_wrapper(args):
start = time()
rando = randint(0, 200)
lst = []
for i in range(10000 * rando):
lst.append(i)
pause = 1
sleep(pause)
return MonitorResult(args, time() - start)
-- Edit
I've found that by adding these two environment variables I no longer see the ModuleNotFoundError.
export PYTHONPATH="${PYTHONPATH}:/home/ray_experiment/"
export RAY_RUNTIME_ENV_WORKING_DIR_CACHE_SIZE_GB=0
Is there another solution that doesn't require disabling the working environment caching?
The issue here is that Ray's worker processes may be run from different working directories than your driver python script. In fact, on a cluster, they may even be run from different machines. This is coupled by the fact that python looks for the module based on a relative path (to be precise, cloudpickle serializes definitions in other modules by reference).
The "intended" solution to this problem is to use runtime environments.
In particular, you should do ray.init(address='auto', runtime_env={"working_dir": "./"}) when starting Ray to ensure that the module is passed to other processes.
After searching for similar error I found this in Ray docs:
https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#library-development
Adding __init__.py to root of these directories and handing them over with this kwarg solved my issue with ModuleNotFoundError. Another part of solution is to install all requirements to the container the ray is running in and probably starting it in same workdir as app (or passing workdir in runtime kwargs)
To answer OPs problem I would try something like this:
import ex_cls
# it cannot pass classes, just whole module
ray.init(address='auto', runtime_env={"py_modules": [ex_cls]})
I have an ansible custom module, that have a configuration file in YAML format.
Now the question is how should I load that YAML file inside the module?
NOTE as I understand I can't simply use something like PyYAML since ansible will run my module on the node that it is configuring and maybe that system does not have PyYAML installed.
NOTE Also ansible itself have ansible.parsing.utils.yaml.from_yaml it is not usable by the modules.
So funny as it may sound, I don't know how to load a YAML file in custom ansible module. Please help
It's a great question. It does sound funny and you'd expect a simple answer but as far as i can see these are the facts.
The latest development branch of ansible has /lib/ansible/module_utils/common/yaml.py which can be used by modules because it is under module_utils. see here
If you look at the source code all it's doing is import yaml as _yaml, which you could do yourself inside your custom module. My understanding is this is using PyYAML, which is documented here. (someone correct me if I"m wrong! I don't fully understand the comment in that file stating "preferring the YAML compiled C extensions...")
Anyway, if your target machine does not have PyYAML you can always add a task to ensure its there. e.g.
- name: Install PyYAML python package
pip:
name: pyyaml
and then use it in your own module with:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
# ...
data = load(stream, Loader=Loader)
# ...
output = dump(data, Dumper=Dumper)
I want to be able to graph using Matplotlib in Jython so that I can use ABAGAIL inside of Python.
ABAGAIL:
https://github.com/pushkar/ABAGAIL
Jython does not seem to support Matplotlib. But I found the following idea on how to call Python inside of Jython:
Invoking Jython from Python (or Vice Versa)
Unfortunately, I can't get the code they suggest to work:
import execnet
gw = execnet.makegateway("popen//python=python")
channel = gw.remote_exec("""
from numpy import *
a = array([2,3,4])
channel.send(a.size)
""")
for item in channel:
print item
The main problem is that python=python doesn't work. What I'm not understanding is how to actually specify the version of python (anaconda, actually) on my Windows 10 system. What do I need to setup?
Also, is there an alternative package besides matplotlib I can use with Jython to create graphs?
I had an issue with an eyaml file used to store password for DB connection and it seems that I missed a "[".
I want to know if there is a command or script to check eyaml syntax
One thing you can do, if you have python installed somewhere, is install ruamel.yaml (disclaimer: I am the author of that package) and run the following:
python check.py your_eyaml_file
with check.py being:
import sys
from ruamel.yaml import YAML
yaml = YAML()
yaml.load(sys.argv[1])
This will do a safe load of your YAML file and will throw an error if your file doesn't conform to the YAML specification.
There are also online parsers when you can run such checks, but I would not want to use them with sensitive information (encrypted or not).
At long last(!) I've compiled Boost::Python and have gotten my XCode project to import a local module. This module starts with the line from xml.dom import minidom, but when it executes, I'm given this error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "FeedStore.py", line 1, in <module>
from xml.dom import minidom
ImportError: No module named xml.dom
However, I know that I've installed the xml Python module -- when I open Python from my command prompt and type from xml.dom import minidom, everything goes smoothly. Moreover, when I import the module, it behaves as I would expect.
I suspected that there was something wrong with sys.path, so I compared the one I get from the prompt to the one that's being used in my embedded module. The only difference is that the embedded sys.path does not include ''. I've tried appending it, but that didn't change the behavior.
I also suspected that the embedded version was accessing a different version of Python than I was using from the prompt, but sys.prefix matched between both executions.
Here's the code that imports my module and runs it. It's pretty bare-bones at the moment (not even reference counting yet) because at this point I'd just like to make sure I'll be able to embed my module (I'm a total newbie C++ programmer).
Py_Initialize();
//PyRun_SimpleString("import sys");
//PyRun_SimpleString("sys.path.append('')"); //tried this to no avail!
PySys_SetPath("/Users/timoooo/Documents/Code/TestEmbed/"); //this allows me to import my local module
PyRun_SimpleString("import FeedStore as fs"); //here's where it whines about the lack of xml.dom
PyRun_SimpleString("store = fs.feedStore()");
PyRun_SimpleString("print store.next()");
Py_Finalize();
I'm probably misunderstanding something essential about boost::python. Can anyone help me out?
Despite having identical sys.path values, calling
PyRun_SimpleString("sys.path.append(\"<<path>>\")");
with the places I needed fixed the problem.