How to load nltk stopwords or words from local disk - download

I would need to load nltk 'words' data from local disk. In the notebook my code looks like the following,
import nltk
nltk.data.path.append("/data") # Setting path here
nltk.corpus.words.words()
But I am getting error as follows,
LookupError Traceback (most recent call last)
/anaconda3/lib/python3.8/site-packages/nltk/corpus/util.py in __load(self)
83 try:
---> 84 root = nltk.data.find(f"{self.subdir}/{zip_name}")
85 except LookupError:
/anaconda3/lib/python3.8/site-packages/nltk/data.py in find(resource_name, paths)
582 resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
--> 583 raise LookupError(resource_not_found)
584
LookupError:
**********************************************************************
Resource words not found.
Please use the NLTK Downloader to obtain the resource:
>>> import nltk
>>> nltk.download('words')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/words.zip/words/
Searched in:
- '/home/my_user_name/nltk_data'
- '/anaconda3/nltk_data'
- '/anaconda3/share/nltk_data'
- '/anaconda3/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/data'
I used the manual installation part from here, https://www.nltk.org/data.html
But, instead of NLTK_DATA, I wanted to set the path from the notebook.
Any help? Thanks in advance.

This is completely frustrating that such a popular package nltk gives misleading messages.
The following message is just wrong!
Attempted to load corpora/words.zip/words/
It is not trying to load corpora from words.zip with folder words. It was attempting to load from nltk_data/corpora/words.
Attempted to load nltk_data/corpora/words/
so, solution is to add correct path manually as follows,
nltk.data.path.append("./data/nltk_data")
and then, put the unzipped file inside this, for example 'words' folder. And then access words as,
nltk.corpus.words.words()

Related

Stable Diffusion Videos - DLL load failed while importing _ufuncs: %1 is not a valid Win32 application

I'm trying to run the Stable Diffusion Videos package and have installed the package and logged into Hugging Face. When I try to run the provided code from the package's GitHub page, I run into the ImportError: DLL load failed while importing _ufuncs: %1 is not a valid Win32 application. error. I have tried various solutions to this error but so far none of the solutions have worked.
I'm running Windows 11, 64 bit, and Python 3.10. I've read about missing dll files but am unsure how to find / install them to possibly fix this problem.
from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch
pipeline = StableDiffusionWalkPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
video_path = pipeline.walk(
prompts=['a cat', 'a dog'],
seeds=[42, 1337],
num_interpolation_steps=3,
height=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
width=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
output_dir='dreams', # Where images/videos will be saved
name='animals_test', # Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, # Number of diffusion steps per image generated. 50 is good default
)

how to resolve AttributeError: module 'graphviz.backend' has no attribute 'ENCODING'

I am not sure why I get an AttributeError: module 'graphviz.backend' has no attribute 'ENCODING' when I tried to export regression tree to graphviz. I tried re-installing graphviz and sklearn but it doesn't solve the problem. Appreciate any advice on this.
AttributeError Traceback (most recent call last)
<ipython-input-4-9d9e0becf9b6> in <module>
3 # graphviz is the drawing tool
4 from sklearn.tree import export_graphviz
----> 5 import graphviz
6 dot_data = export_graphviz(
7 model,
C:\ProgramData\Anaconda3\lib\site-packages\graphviz\__init__.py in <module>
25 """
26
---> 27 from .dot import Graph, Digraph
28 from .files import Source
29 from .lang import escape, nohtml
C:\ProgramData\Anaconda3\lib\site-packages\graphviz\dot.py in <module>
30
31 from . import backend
---> 32 from . import files
33 from . import lang
34
C:\ProgramData\Anaconda3\lib\site-packages\graphviz\files.py in <module>
20
21
---> 22 class Base(object):
23
24 _engine = 'dot'
C:\ProgramData\Anaconda3\lib\site-packages\graphviz\files.py in Base()
26 _format = 'pdf'
27
---> 28 _encoding = backend.ENCODING
29
30 #property
AttributeError: module 'graphviz.backend' has no attribute 'ENCODING'
I had a similar issue when using pipdeptree. It would seem that there was a very recent change to graphviz, intended to obfuscate its internals. Quoting the module author's reply in issue #149 (a similar issue with backend.FORMATS):
Submodules of graphviz are not part of the public API (cf. https://graphviz.readthedocs.io/en/stable/api.html). Please stick to the documented interface and use graphviz.FORMATS, see https://graphviz.readthedocs.io/en/stable/api.html#graphviz.FORMATS).
In the short term, you could downgrade your graphviz module… it looks like 0.18 was the last tag before the submodules were made opaque.
Moving forward, you may wish to create an issue and/or pull request against the sklearn-pandas repository, to replace graphviz.backend.FORMATS with graphviz.FORMATS, or even just cap its graphviz dependency at 0.18.
I had the same issue and I am very new to Python/conda world, so this might help newbies like me...
I downloaded graphviz 0.19.1 from:
https://pypi.org/project/graphviz/#files
Source Distribution: graphviz-0.19.1.zip (247.8 kB view hashes)
download link
and replaced graphviz folder with this version in "C:\Users\Nino\anaconda3\Lib\site-packages" (will be different for you) and rename it so that name is again graphviz.
"C:\Users\Nino\anaconda3\Lib\site-packages\graphviz"
I have met same problem! The most voted answer works for me! And paste the code to forceably downgrade the graphviz.
pip install --force-reinstall graphviz==0.18
I had the same error with python-graphviz==0.16. OP did not include a version number, but it looks like the line numbers in the traceback match with v0.16.
Note that the traceback shows the error to be inside the python-graphviz package, so it's more likely that it's an issue with a dependency.
With python-graphviz==0.19 I don't get the import error.
On a side note: Versions shown by conda list or pip list can be misleading. In case of doubt check the content of the __init__.py.
I solved this issue in a different way:
Open graphviz file on my PC through following path (Path may differ)
"C:\Users\Anoop\anaconda3\Lib\site-packages\graphviz\backend"
Copy the encoding.py file from here
Paste this file in the backend
"C:\Users\Anoop\anaconda3\Lib\site-packages\graphviz\backend"
Problem solved
In my case, it seems that the class Base in the "C:\ProgramData\Anaconda3\lib\site-packages\graphviz\files.py" takes the 'backend folder' instead of the 'backend.py' on init.
quick solve:
go to "C:\ProgramData\Anaconda3\lib\site-packages\graphviz" and rename the 'backend folder' to something else.
PS: since I didn't check the whole code, it may then cause another dependency problem.
Thanks! ver0.2 was trouble some and this error disappear after downgrading it with ver 0.19

umap collectstatic gives "No such file or directory" error

As I have very little knowledge about Linux, pretty much all I can do is copy and paste things from a good tutorial and in most cases simply hope nothing goes wrong. I really tried finding a solution on my own and searching the internet but to no avail (I found a number of quite similar things but no solution I understood enough to be able to adapt it on my own to fix my problem).
I've installed an osm tile server using this amazing tutorial and it works like a charm. Now I want to install umap, using this tutorial.
Everything works fine until I get to the line "umap collectstatic". The error I get is this:
(venv) $ sudo umap collectstatic
[sudo] Passwort für umap2:
You have requested to collect static files at the destination
location as specified in your settings:
/home/ybon/.virtualenvs/umap/var/static
This will overwrite existing files!
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
Traceback (most recent call last):
File "/usr/local/bin/umap", line 11, in <module>
sys.exit(main())
File "/usr/local/lib/python2.7/dist-packages/umap/bin/__init__.py", line 12, in main
management.execute_from_command_line()
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python2.7/dist-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/lib/python2.7/dist-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 193, in handle
collected = self.collect()
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/management/commands/collectstatic.py", line 115, in collect
for path, storage in finder.list(self.ignore_patterns):
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/finders.py", line 112, in list
for path in utils.get_files(storage, ignore_patterns):
File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/utils.py", line 28, in get_files
directories, files = storage.listdir(location)
File "/usr/local/lib/python2.7/dist-packages/django/core/files/storage.py", line 399, in listdir
for entry in os.listdir(path):
OSError: [Errno 2] No such file or directory: '/home/ybon/Code/js/Leaflet.Storage'
Now, I get the something might be wrong with a setting in a config file somewhere, but changing the directory in local.py
doesn't seem to do anything (like I have set it to STATIC_ROOT = '/home/xxx_myusername_xxx/umap/var/static') - I have no idea where this "/home/ybon/Code/..." path even comes from! What settings ?
I sure didn't specify THIS path anywhere! And the folder is indeed nowhere to be found on my machine. Maybe using virtualenv is somehow generating it, and I can't find it on my machine because it IS virtual (as in "not really there physically") but this is just a very wild guess and I don't really know what I'm talking about.
(I tried running the command with and without sudo and it doesn't change anything).
I have always wanted to install a tile server and have tried the tutorials you have given today. So I'm a learner like you!
Installing the Tile Server with the tutorial https://www.linuxbabe.com/linux-server/openstreetmap-tile-server-ubuntu-16-04 was really straightforward. I only used the part for Rhineland Palatinate.
With Umap (https://umap-project.readthedocs.io/en/latest/ubuntu/#tutorial) I had some problems.
1. A port was used twice. I changed the port for Apache.
2. After creating the local configuration (wget https://raw.githubusercontent.com/umap-project/umap/master/umap/settings/local.py.sample -O /etc/umap/umap.conf) this file was not immediately recognized. I helped myself by changing the file before executing the command "umap migrate".
I have made the following changes:
# For static deployment
STATIC_ROOT = '/etc/umap/var/static'
# For users' statics (geojson mainly)
MEDIA_ROOT = '/etc/umap/umap/var/data'
# Umap Settings
UMAP_SETTINGS='/etc/umap/umap.conf'
STATIC_ROOT und MEDIA_ROOT I have changed, because so the user umap has all permissions. Then I set the envirement variable UMAP_SETTINGS because otherwise the settings file /etc/umap/umap.conf is not found.
( I also have no idea where this "/home/ybon/Code/..." path comes from. After the configuration file is properly loaded, the path is loaded from the configuration file. That's why that's not important anymore. )
Now I could use the following commands without errors:
(venv) $ umap collectstatic
Loaded local config from /etc/umap/umap.conf
You have requested to collect static files at the destination
location as specified in your settings:
/etc/umap/var/static
This will overwrite existing files!
Are you sure you want to do this?
Type 'yes' to continue, or 'no' to cancel: yes
Copying '/srv/umap/venv/lib/python3.5/site-packages/umap/static/favicon.ico'
...
290 static files copied to '/etc/umap/var/static'.
(venv) $ umap storagei18n
Loaded local config from /etc/umap/umap.conf
Processing English
Found file /etc/umap/var/static/storage/src/locale/en.json
Exporting to /etc/umap/var/static/storage/src/locale/en.js
..
Processing Deutsch
Found file /etc/umap/var/static/storage/src/locale/de.json
..
Found file /etc/umap/var/static/storage/src/locale/sk_SK.json
Exporting to /etc/umap/var/static/storage/src/locale/sk_SK.js
(venv) $ umap createsuperuser
Loaded local config from /etc/umap/umap.conf
Username (leave blank to use 'umap'):
Email address:
Password:
Password (again):
Superuser created successfully.
(venv) $ umap runserver 0.0.0.0:8000
Loaded local config from /etc/umap/umap.conf
Loaded local config from /etc/umap/umap.conf
Performing system checks...
System check identified no issues (0 silenced).
April 09, 2018 - 14:02:15
Django version 1.10.5, using settings 'umap.settings'
Starting development server at http://0.0.0.0:8000/
And finally I was able to use umap.

Python web scraping stopped working - Invalid File Error

I have been working on a web scraping program and it has recently just stopped working and is giving me the following error.
Traceback (most recent call last):
File "C:/Users/Bob/Desktop/test 3.5.py", line 7, in <module>
with open(saveDB,'rb') as f:
TypeError: invalid file: WindowsPath('Z:/project1/MasterWellDB.txt')
Where Z: is a network drive, but I have also moved the file locally and the error still exists.
I have tried multiple python versions, uninstalled and reinstalled visual studio multiple times and I am still clueless.
Here is my code:
from pathlib import Path
import pickle
# Opening well database
saveDB = Path(r"Z:\project1\MasterWellDB.txt")
#open picked DB if avaiable else remake database
if saveDB.exists():
with open(saveDB,'rb') as f:
wells = pickle.load(f)
print('success!')
Any help would be greatly appreciated. Thanks!

Loading com.databricks.spark.csv via RStudio

I have installed Spark-1.4.0. I have also installed its R package SparkR and I am able to use it via Spark-shell and via RStudio, however, there is one difference I can not solve.
When launching the SparkR-shell
./bin/sparkR --master local[7] --packages com.databricks:spark-csv_2.10:1.0.3
I can read a .csv-file as follows
flights <- read.df(sqlContext, "data/nycflights13.csv", "com.databricks.spark.csv", header="true")
Unfortunately, when I start SparkR via RStudio (correctly setting my SPARK_HOME) I get the following error message:
15/06/16 16:18:58 ERROR RBackendHandler: load on 1 failed
Caused by: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
I know I should load com.databricks:spark-csv_2.10:1.0.3 in a way, but I have no idea how to do this. Could someone help me?
This is the right syntax (after hours of trying):
(Note - You've to focus on the first line. Notice to double-quotes)
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.0.3" "sparkr-shell"')
library(SparkR)
library(magrittr)
# Initialize SparkContext and SQLContext
sc <- sparkR.init(appName="SparkR-Flights-example")
sqlContext <- sparkRSQL.init(sc)
# The SparkSQL context should already be created for you as sqlContext
sqlContext
# Java ref type org.apache.spark.sql.SQLContext id 1
# Load the flights CSV file using `read.df`. Note that we use the CSV reader Spark package here.
flights <- read.df(sqlContext, "nycflights13.csv", "com.databricks.spark.csv", header="true")
My colleagues and I found the solution. We have initialized the sparkContext like this:
sc <- sparkR.init(appName="SparkR-Example",sparkEnvir=list(spark.executor.memory="1g"),sparkJars="spark-csv-assembly-1.1.0.jar")
We did not find how to load a remote jar, hence we have downloaded spark-csv_2.11-1.0.3.jar. Including this one in sparkJars however does not work, since it does not find its dependencies locally. You can add a list of jars as well, but we have build an assembly jar containing all dependencies. When loading this jar, it is possible to load the .csv-file as desired:
flights <- read.df(sqlContext, "data/nycflights13.csv","com.databricks.spark.csv",header="true")
I have downloaded Spark-1.4.0, via command line I went to the directory Spark-1.4.0/R, where I have build the SparkR package located in the subdirectory pkg as follows:
R CMD build --resave-data pkg
This gives you a .tar file which you can install in RStudio (with devtools, you should be able to install the package in pkg as well).
In RStudio, you should set your path to Spark as follows:
Sys.setenv(SPARK_HOME="path_to_spark/spark-1.4.0")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
And you should be ready to go. I can only talk from mac experience, I hope it helps?
If after you tried Pragith's solution above and you still having the issue. It is very possible the csv file you want to load is not in the current RStudio working directory. Use getwd() to check the RStudio directory and make sure the csv file is there.

Resources