The sample Docker configuration section of stack.yaml gives:
# Location of database used to track image usage, which `stack docker cleanup`
# uses to determine which images should be kept. On shared systems, it may
# be useful to override this in the global configuration file so that
# all users share a single database.
database-path: "~/.stack/docker.db"
However when I put this in the stack.yaml for a new project and stack setup I get:
Aeson exception:
Error in $.docker['database-path']: failed to parse field 'docker': failed to parse field 'database-path': InvalidAbsFile "~/.stack/docker.db"
See http://docs.haskellstack.org/en/stable/yaml_configuration/
This is the only reference I could find to database-path, without digging in to the code.
Is database-path required?
If so: How do I initialize a .db file (to mitigate InvalidAbsFile "~/.stack/docker.db")?
It is not a matter of initialization of the database. The problem is that it does not expand the ~, so you need to use /home/dukedave/.stack/docker.db
Related
I am able to run my pipelines using the kedro run command without issue. For some reason though I can't access my context and catalog from Jupyter Notebook anymore. When I run kedro jupyter notebook and start a new (or existing) notebook using my project name when selecting "New", I get the errors following errors:
context
NameError: name 'context' is not defined
catalog.list()
NameError: name 'catalog' is not defined
EDIT:
After running the magic command %kedro_reload I can see that my ProjectContext init_spark_session is looking for files in project_name/notebooks instead of project_name/src. I tried changing the working directory in my Jupyter Notebook session with %cd ../src and os.ch_dir('../src') but kedro still looks in the notebooks folder:
%kedro_reload
java.io.FileNotFoundException: File file:/Users/user_name/Documents/app_name/kedro/notebooks/dist/project_name-0.1-py3.8.egg does not exist
_spark_session.sparkContext.addPyFile() is looking in the wrong place. When I comment out this line from my ProjectContext this error goes away but I receive another one about not being able to find my Oracle driver when trying to load a dataset from the catalog:
df = catalog.load('dataset')
java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
EDIT 2:
For reference:
kedro/src/project_name/context.py
def init_spark_session(self) -> None:
"""Initialises a SparkSession using the config defined in project's conf folder."""
# Load the spark configuration in spark.yaml using the config loader
parameters = self.config_loader.get("spark*", "spark*/**")
spark_conf = SparkConf().setAll(parameters.items())
# Initialise the spark session
spark_session_conf = (
SparkSession.builder.appName(self.package_name)
.enableHiveSupport()
.config(conf=spark_conf)
)
_spark_session = spark_session_conf.getOrCreate()
_spark_session.sparkContext.setLogLevel("WARN")
_spark_session.sparkContext.addPyFile(f'src/dist/project_name-{__version__}-py3.8.egg')
kedro/conf/base/spark.yml:
# You can define spark specific configuration here.
spark.driver.maxResultSize: 8g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true
# https://kedro.readthedocs.io/en/stable/11_tools_integration/01_pyspark.html#tips-for-maximising-concurrency-using-threadrunner
spark.scheduler.mode: FAIR
# JDBC driver
spark.jars: drivers/ojdbc8-21.1.0.0.jar
I think a combination of this might help you:
Generally, let's try to avoid manually interfering with the current working directory, so let's remove os.chdir in your notebook. Construct an absolute path where possible.
In your init_spark_session, when addPyFile, use absolute path instead. self.project_path points to the root directory of your Kedro project, so you can use it to construct the path to your PyFile accordingly, e.g. _spark_session.sparkContext.addPyFile(f'{self.project_path}/src/dist/project_name-{__version__}-py3.8.egg')
Not sure why you would need to add the PyFile though, but maybe you have a specific reason.
i have added "versioned: true" in the "catalog.yml" file of the "hello_world" tutorial.
example_iris_data:
type: pandas.CSVDataSet
filepath: data/01_raw/iris.csv
versioned: true
Then when I used
"kedro run" to run the tutorial, it has error as below:
"VersionNotFoundError: Did not find any versions for CSVDataSet".
May i know what is the right way for me to do versioning for the "iris.csv" file? thanks!
Try versioning one of the downstream outputs. For example, add this entry in your catalog.yml, and run kedro run
example_train_x:
type: pandas.CSVDataSet
filepath: data/02_intermediate/example_iris_data.csv
versioned: true
And you will see example_iris.data.csv directory (not a file) under data/02_intermediate. The reason example_iris_data gives you an error is that it's the starting data and there's already iris.csv in data/01_raw so, Kedro cannot create data/01_raw/iris.csv/ directory because of the name conflict with the existing iris.csv file.
Hope this helps :)
The reason for the error is that when Kedro tries to load the dataset, it looks for a file in data/01_raw/iris.csv/<load_version>/iris.csv and, of course, cannot find such path. So if you really want to enable versioning for your input data, you can move iris.csv like:
mv data/01_raw/iris.csv data/01_raw/iris.csv_tmp
mkdir data/01_raw/iris.csv
mv data/01_raw/iris.csv_tmp data/01_raw/iris.csv/<put_some_timestamp_here>/iris.csv
You wouldn't need to do that for any intermediate data as this path manipulations are done by Kedro automatically when it saves a dataset (but not on load).
I need to run stanza ner in a platform without any access to external network. The code stanza.download('en') fails. Running without the download function, gives me an exception
Exception: Resources file not found at: \home\stanza_resources\resources.json. Try to download the model again
Is there a way to download and cache all the required modules in a resource directory and point this directory to stanza pipeline?
Thanks
It looks like both download and the Pipeline class take an argument for directory dir
So the below code works
stanza.download('en', dir='resources/', processors={ner_processor: package})
nlp_pipeline = stanza.Pipeline('en', dir='resources/', processors={ner_processor: package})
I generated model with hanami generate model stimulus. Then I fixed "stimuluses" to "stimuli" in the migration file name and inside, the table name.
Everytime I load a page I get this error in the server console window:
[ROM::Relation[Stimuluses]] failed to infer schema. Make sure tables exist before ROM container is set up. This may also happen when your migration tasks load ROM container, which is not needed for migrations as only the connection is required (schema parsing returned no columns, table "stimuluses" probably doesn't exist)
I looked into the libraries and found that this functionality has Inflecto library. Then I tried both adding to hanami project this:
# /config/initializers/inflecto.rb
require 'inflecto'
Inflecto.inflections do |inflect|
inflect.irregular('stimulus', 'stimuli')
end
And editing the defualt library file:
# gems/inflecto-0.0.2/lib/inflecto/defaults.rb
Inflecto.inflections do |inflect|
...
inflect.irregular('stimulus', 'stimuli')
...
end
But the message is still there after restarting the server.
Is this something I should solve and if yes, how to do this?
EDIT:
Also tried:
# /config/initializers/inflector.rb
require 'hanami/utils/inflector'
Hanami::Utils::Inflector.inflections do
exception 'stimulus', 'stimuli'
end
I'm assuming we are talking about Hanami v1.0.0, right?
You nearly succeeded. What hit you is that initializers seem to be not loaded when executing hanami commands and maybe a bug in code reloading. So instead of an initializer put it into a file that gets loaded when hanami commands are executed or require the initializer file in such a place. E.g.,
# config/initializers/inflections.rb
require 'hanami/utils/inflector'
Hanami::Utils::Inflector.inflections do
exception 'stimulus', 'stimuli'
end
and then in your environment file
# config/environment.rb
# ...
require_relative 'initializers/inflections.rb'
# ...
I'm not sure if that is a good place to put custom inflection rules, but at least it works.
I am following the tutorial at
https://cloud.google.com/appengine/docs/java/endpoints/calling-from-ios
and when I get to step 5 and Open a new Terminal window to invoke ServiceGenerator. I get the error message in my terminal saying..
Barrys-MacBook-Pro:~ barrymadej$ /Users/barrymadej/Library/Developer/Xcode/DerivedData/ServiceGenerator-avaeguyitgyhxpcnaejpgzvxezei/Build/Products/Debug/ServiceGenerator \
/Users/barrymadej/Documents/AndroidStudioProjects/StudentProgressTrackerDatabaseAndCloud/backend/build/discovery-docs/myApi-v2-rpc.discovery /
ERROR: An output directory is required.
Usage: ServiceGenerator [FLAGS] [ARGS]
Required Flags:
--outputDir PATH
The destination directory for writing the generated files.
Optional Flags:
--discoveryService URL
Instead of discovery's default URL, use the specified URL as the
location to send the JSON-RPC requests. This is useful for running
against a custom or prerelease server.
--gtlFrameworkName NAME
Will generate sources that include GTL's headers as if they are in a
framework with the given name. If you are using GTL via CocoaPods,
you'll likely want to pass "GoogleAPIClient" as the value for this.
--apiLogDir DIR
Write out a file into DIR for each JSON API description processed. These
can be useful for reporting bugs if generation fails with an error.
--httpLogDir PATH
Turn on the HTTP fetcher logging and set it to write to PATH. This can
be useful for diagnosing errors on discovery fetches.
--generatePreferred
Causes the list of services to be collected, and all preferred services
to be generated.
--httpHeader NAME:VALUE
Causes the given NAME/VALUE pair to be added as an HTTP header on *all*
HTTP requests made by the generator. Can be used repeatedly to provide
additional header pairs.
--formattedName SERVICE:VERSION=NAME
Causes the given SERVICE:VERSION pair to override its service name in
files, classes, etc. with NAME. If :VERSION is omitted the override is
for any version of the service. Can be used repeatedly to provide
several maps when generating a few things in a single run.
--addServiceNameDir yes|no Default: no
Causes the generator to add a directory with the service name in the
outputDir for the files. This is useful for generating multiple
services.
--generatedDir yes|no Default: no
Causes a directory in outputDir called "Generated" to be created and
used to contain the generated files.
--removeUnknownFiles yes|no Default: no
By default, the generator will report unknown files in the output
directory, as commonly happens when classes go away in a new API
version. This option causes the generator to also remove the unknown
files.
--rootURLOverrides yes|no Default: yes
Causes any API root URL for a Google sandbox server to be replaced with
the googleapis.com root instead.
--verbose
Generate more verbose output. Can be used more than once.
Arguments:
Multiple arguments can be given on the command line.
service:version
The description of the given [service]/[version] pair is fetched and the
files for it are generated. When using --generatePreferred version can
be '-' to skip generating the name service.
http[s]://url/to/rpc_description_json
A URL to download containing the description of a service to generate.
path/to/rpc_description.json
The path to a text file containing the description of a service to
generate.
ServiceGenerator path:
/Users/barrymadej/Library/Developer/Xcode/DerivedData/ServiceGenerator-avaeguyitgyhxpcnaejpgzvxezei/Build/Products/Debug/ServiceGenerator
ERROR: There was one or more errors; check the full output for details.
Barrys-MacBook-Pro:~ barrymadej$ --outputDir
-bash: --outputDir: command not found
Barrys-MacBook-Pro:~ barrymadej$ /Users/barrymadej/Documents/AndroidStudioProjects/StudentProgressTrackerDatabaseAndCloud/API
You should generate a REST discovery document and use the new Objective C client instead. The client library you're trying to use is deprecated anyway. It looks like it didn't work because you specified the flag without the rest of the command, though.