How give azure machine learning dataset path in an inference script? - azure-databricks

I am using azureml sdk in Azure Databricks.
When I write the script for inference model (%%writefile script.py) in a databricks cell,
I try to load a .bin file that I loaded in Azure Machine Learning Datasets.
I would like to do this in the script.py:
fasttext.load_model(azuremldatasetpath)
How can I do to give good dataset path of my .bin file in azuremldatasetpath variable ? (Without calling workspace in the script).
Something like:
dataset_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'file.bin')

You can use your model name with the Model.get_model_path() method to retrieve the path of the model file or files on the local file system. If you register a folder or a collection of files, this API returns the path of the directory that contains those files.
More info you may want to refer: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-advanced-entry-script#azureml_model_dir

Related

Azure Data Factory Unzipping many files into partitions based on filename

I have a large zip file that has 900k json files in it. I need to process these with a data flow. I'd like to organize the files into folders using the last two digits in the file name so I can process them in junks of 10k. My question is how to I setup a pipeline to use part of the file name of the files in the zip file (the source) as part of the path in the sink?
current setup: zipfile.zip -> /json/XXXXXX.json
desired setup: zipfile.zip -> /json/XXX/XXXXXX.json
Please check if below references can help
In source transformation, you can read from a container, folder, or
individual file in Azure Blob storage. Use the Source options tab to
manage how the files are read. Using a wildcard pattern will instruct
the service to loop through each matching folder and file in a single
source transformation. This is an effective way to process multiple
files within a single flow.
[ ] Matches one or more characters in the brackets.
/data/sales/**/*.csv Gets all .csv files under /data/sales
And please go through 1. Copy and transform data in Azure Blob storage - Azure Data Factory
& Azure Synapse | Microsoft Docs for other patterns and to check all filtering
possibilities in azure blob storage.
How to UnZip Multiple Files which are stored on Azure Blob Storage
By using Azure Data Factory - Bing video
In the sink transformation, you can write to either a container or a folder in Azure Blob storage.
File name option: Determines how the destination files are named in the destination folder.

How can I use named pipes to stream a GCP Cloud Storage object to an executable that wants input files?

I have a third-party executable that takes a directory path as an argument and in turn looks there for a collection of .db files. I have said collection of files stored in a Google Cloud Storage bucket and would like to stream the content of those files into some local named pipes that can be used as input to the executable.
I'm writing an application to perform the above in Go and am using the "cloud.google.com/go/storage" package to work with cloud storage objects.
As a note, I need all pipes/files to be available for reading at the time I run the executable.
What is the best way to go about this? I'm looking to essentially used the named pipe as a proxy of sorts to make remote files look local to this executable. Possible?

Google Cloud Logs Export Names

Is there a way to configure the names of the files exported from Logging?
Currently the file exported includes colons. This are invalid characters as a path element in hadoop, so PySpark for instance cannot read these files. Obviously the easy solution is to rename the files, but this interferes with syncing.
Is there a way to configure the names or change them to no include colons? Any other solutions are appreciated. Thanks!
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
At this time, there is no way to change the naming convention when exporting log files as this process is automated on the backend.
If you would like to request to have this feature available in GCP, I would suggest creating a PIT. This page allows you to report bugs and request new features to be implemented within GCP.

Get path to file in storage

I saved a file to storage using:
$request->file('avatar')->store('avatars');
Which saved it to:
storage/app/avatars/avatar.png
How can I get the path to this file/folder (not the URL)? What is the correct way to do this using Laravel's Filesystem?
There is no correct way to do this; because it should not be done. The Storage is an opaque system to talk to different storage systems; as such there is no api to get the backing file path. As an example, that wouldn't work with Amazon S3. The only path your application knows about is the string you send to the Storage facade to work with the file, there are no guarantees that this string is used to generate the filename when the storage system stores the file.
There are some hacks you can use that works for the local disk, but those are not available for the Storage system in general. Using these solutions means that you'll limit yourself to only use the local disk; this will cause you troubles when you need to scale out and add another server. You'll then have two servers with two separate local disks, with separate content.
The correct way to work with the files, that will work for all configurations, is to get the file content (Storage::get), do the modifications (including storing them in a temporary file) and then write back the new file content (Storage::set).
If you're really sure that you will only ever use the local filesystem, use the File facade instead of the Storage facade. I'm unable to find any documentation for this, only the interface it exposes.
Reference: https://github.com/laravel/framework/issues/13610
Try this
storage_path('app/avatars/avatar.png');
you can only get the storage folder path from laravel function, you can give nested folder name after it, it will bind the base url as well
storage_path(folder1/folder2/.../file.png);

Program solution reading through share folders

I have a quick project I am working on for one of our VPs.
We have a few thousand CAD jobs stored on a network file share. The file structure is such that there is a parent folder for the CAD job. Part of the folder name contains the job number. Inside the folder, there are 1 to many .ini text files that contain the connection information I need.
What I need is a programatic way to search through all the folders and extract the job number from the folder name, and all the connection values from the ini files.
For example for a folder named CM8252390-3, the job number is 8252390-3. Inside this folder are 3 ini files. Inside the ini files are that look like this:
[Connection]
Name=IMP_Acme_3.5
[Origin]
X=-15.044784
Y=19.620095
Z=44.621395
So my program needs to give me the following result
Job Connection
8252390-3 IMP_Acme1_3.5
8252390-3 IMP_Acme2_3.5
8252390-3 IMP_Acme3_3.5
8254260-1 IMP_Acme3_2.4
8254260-1 IMP_Acme3_4.1
...continued for all folders in the network share
Any suggestion on the best way to do this. I am primarily an Oracle PL/SQL developer, but have some basic Windows batch and Unix shell experience. If I can get the data loaded into Oracle tables, I can search using PL/SQL tools, but is there a better way using shell, batch, or other tools?
Thank you.
I think this is a job for Powershell or vbScript. It would be easy to use these tools to write the information you need to one file.
This file should be written to an Oracle directory.
grant read permission to a database user on this directory
use utl_file to read the file or treat the file as an external table and expose it as a view
schedule a regular OS job to refresh or rebuild the list

Resources