How to save fasttext model in binary and text formats?

How to save fasttext model in binary and text formats? - gensim

The documentation is a bit unclear how to save the fasttext model to disk - how do you specify a path in the argument, I tried doing so and it failed with an error
Example in documentation
>>> from gensim.test.utils import get_tmpfile
>>>
>>> fname = get_tmpfile("fasttext.model")
>>>
>>> model.save(fname)
>>> model = FastText.load(fname)
Furthermore, how can I save the model in text format like can be done with word2vec models?
'word2vecmodel.wv.save_word2vec_format("D:\w2vmodel.txt")'
EDIT
After trying the suggestion to make a file first I keep kgetting the same error as before when I run this code
savepath = os.path.abspath('D:\fasttextmodel.v3.bin');
from gensim.test.utils import get_tmpfile
fname = get_tmpfile(savepath)
fasttext_model.save(fname)
TypeError: file must have a 'write' attribute

Documentation in FastText save()/load() example is misleading, they suggest you use get_tmpfile. I am able to save the model if I pass the data file name as a string and do not wrap it in get_tmpfile:
model.save("fasttext.model")
Then you can load the same way, passing the string directly:
model = FastText.load("fasttext.model")
Note that this will save multiple files for models that are large. However, when you load the model, you only need to specify the main fasttext.model file, and the function will automatically load additional files, if there are any.

Did you try creating a file in your local directory called "fasttext.model" before trying to save it?
Also, I'm assuming you trained the model before this correct?

Related

Adding props (tags/ keywords) to a folder using only ctypes in Python 3

I'm trying to add tag to a folder using Windows API, and I've stumbled upon this article.
According in this article,
to create an ANSI simple property set, you would call IPropertySetStorage::Create to create the property set, specifying PROPSETFLAG_ANSI (simple is the default type of property set), then write to it with a call to IPropertyStorage::WriteMultiple. To read the property set, you would call IPropertyStorage::ReadMultiple.
I tried doing it as the code below but get stuck because I can't find where the IPropertySetStorage and IPropertyStorage is.
from ctypes import sizeof
STGM_READWRITE = 0x00000002
def addTag(folder):
shell32 = ctypes.windll.shell32
_clsid = None # should to be the CLSID from the SHFOLDERCUSTOMSETTINGS
# ↓↓↓↓↓↓ I can't find where should I get IPropertySetStorage
shell32.IPropertySetStorage.Create("{F29F85E0-4FF9-1068-AB91-08002B27B3D9}", _clsid, 0, STGM_READWRITE)
rglpwstrName = ["TestTag"]
rgpropid = ["prop5"]
cpropid = sizeof(rgpropid)
# ↓↓↓↓↓↓ I can't find where should I get IPropertyStorage
shell32.IPropertyStorage.WritePropertyNames(cpropid, rgpropid, rglpwstrName)
addTag(r"C:\Users\Agustin\Desktop\New folder")
I tried searching for answers here and in the Microsoft documentation but (Reading and writing Windows "tags" with Python 3) is the closest example I've seen. The example there is using pywin32 but I want to find out if this can be pulled off using only ctypes.
My Question: Where can I find the IPropertySetStorage and IPropertyStorage? And how can I add tag to a folder using only the Windows API with ctypes.
Any help will be appreciated.

Azure Forms Recognizer - Saving output results SDK Python

When I used the API from Forms Recognizer, it returned a JSON file. Now, I am using Form Recognizer with SDK and Python, and it returns a data type that seems to be specific from the library azure.ai.formrecognizer.
Does anyone know how to save the data acquired from Form Recognizer SDK Python in a JSON file like the one received from Form Recognzier API?
from azure.ai.formrecognizer import FormRecognizerClient
from azure.identity import ClientSecretCredential
client_secret_credential = ClientSecretCredential(tenant_id, client_id, client_secret)
form_recognizer_client = FormRecognizerClient(endpoint, client_secret_credential)
with open(os.path.join(path, file_name), "rb") as fd:
form = fd.read()
poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()

Thanks for your question! The Azure Form Recognizer SDK for Python provides helper methods like to_dict and from_dict on the models to facilitate converting the data type in the library to and from a dictionary. You can use the dictionary you get from the to_dict method directly or convert it to JSON.
For your example above, in order to get a JSON output you could do something like:
poller = form_recognizer_client.begin_recognize_content(form)
form_pages = poller.result()
d = [page.to_dict() for page in form_pages]
json_string = json.dumps(d)
I hope that answers your question, please let me know if you need more information related to the library.
Also, there's more information about our models and their methods on our documentation page here. You can use the dropdown to select the version of the library that you're using.

Yolov4 error when trying to display image on custom model

I have trained my own model, using my own custom dataset, using Yolov4, and I have downloaded the .cfg, .weights and .data files.
When I try to run my model using:
darknet.exe detector test cfg/obj.data cfg/yolov4-og.cfg custom-yolov4-detector_best.weights
I get the error:
Error: l.outputs == params.inputs filters= in the [convolutional]-layer doesn't correspond to classes= or mask= in [yolo]-layer
I don't know if this is an error on my part, with the command I am running, or an error from the model I trained.
Any help would be appreciated.

I am assuming you are using the main darknet repo AlexeyAB. Please make sure you follow the following instructions:
Make sure you assign the correct classes number in the config file.
Change filters=255 to filters=(classes + 5)x3 in the 3
[convolutional] before each [yolo] layer, keep in mind that it only
has to be the last [convolutional] before each of the [yolo] layers
https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L603
https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L689
https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L776
So if classes=1 then should be filters=18. If classes=2 then write filters=21
(Generally filters depends on the classes, coords and number of masks, i.e. filters=(classes + coords + 1)*, where mask is indices of anchors. If mask is absent, then filters=(classes + coords + 1)*num)
Reference: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

SSIS dynamic flat file connection to load daily file with date-time,minute,second timestamp

I have to load the daily csv file from a network location which has the date time stamp with minute and second when it gets exported from api and saved to a network location.
I am trying to make my package dynamic so it does not change when the file name changes every other day. I have tried using an expression in the flat file manager connection properties but that not working either.
My file name looks like following:
DS_All_users_with_additional_fields_2018_12_11_10_00.csv
which i have tried to solve my using the following expression but things gets complicated if there is delay in the csv export and the minute and second changes in the file name:
#[User::DataLoadDir]+"DS_All_users_with_additional_fields_"+(DT_STR,4,1252)YEAR( DATEADD( "dd", -1, getdate() ))+"_"+(DT_STR,4,1252)MONTH( DATEADD( "dd", -1, getdate() ))+"_"+(DT_STR,4,1252)DAY( DATEADD( "dd", 0, getdate() ))+"_10_00.csv"
Any suggestions how to solve this problem?

You can use a foreach loop file enumerator and apply a filespec expression of:
DS_All_users_with_additional_fields*.csv
The * servers as a wild card and will pick up all files matching that string. You can work with this in order to make it flexible based off your needs. In this case, the job will scan for all files that are available in a specific folder that matches the above string. This can then be assigned to a variable, which you can use to dynamically set the connection string.
I don't think you can add the * into the connection string itself.
Update
To set a connection manager's connection string property, see the photo below. It is import to note that this solution will change the work flow. Your initial work flow was telling the connection manager what file to specifically look for. However, by implementing a foreach loop, the job is now searching for any and all files that are available in a specific folder path. Note: you will need to make sure that you include the fully qualified domain name (FQDN) in the connection string variable (i.e., \\networkpath\filename.csv)

Are the files that you need to import the only files in that directory with a name that starts with DS_All_users_with_additional_fields_? If so, use a Script Task to find the most recent one and set the variable used in the connection string to this name. The following example uses LINQ to look for files that begin with the name you listed, then sorts them by the date they were created on, and returns the name of the first one. The Name property below will include the extension. You can also get the complete file path by changing this to the FullName property, in which case you could just use this value for the variable used by the flat file connection string, as opposed to concatenating it with the #[User::DataLoadDir] variable. This example does reference System.IO and System.Linq as specified below.
using System.IO;
using System.Linq;
string filePath = Dts.Variables["User::DataLoadDir"].Value.ToString();
DirectoryInfo di = new DirectoryInfo(filePath);
FileInfo mostRecentFile = (from f in di.GetFiles().Where(x =>
x.Name.StartsWith("DS_All_users_with_additional_fields_"))
orderby f.CreationTime descending
select f).First();
//The Name property below can be changed to FullName to get the complete file path
Dts.Variables["User::VariableHoldingFileName"].Value = mostRecentFile.Name;

Manually populate an ImageField

I have a models.ImageField which I sometimes populate with the corresponding forms.ImageField. Sometimes, instead of using a form, I want to update the image field with an ajax POST. I am passing both the image filename, and the image content (base64 encoded), so that in my api view I have everything I need. But I do not really know how to do this manually, since I have always relied in form processing, which automatically populates the models.ImageField.
How can I manually populate the models.ImageField having the filename and the file contents?
EDIT
I have reached the following status:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
And this is updating the file reference, using the right value configured in upload_to in the ImageField.
But it is not saving the image. I would have imagined that the first .save call would:
Generate a file name in the configured storage
Save the file contents to the selected file, including handling of any kind of storage configured for this ImageField (local FS, Amazon S3, or whatever)
Update the reference to the file in the ImageField
And the second .save would actually save the updated instance to the database.
What am I doing wrong? How can I make sure that the new image content is actually written to disk, in the automatically generated file name?
EDIT2
I have a very unsatisfactory workaround, which is working but is very limited. This illustrates the problems that using the ImageField directly would solve:
# TODO: workaround because I do not yet know how to correctly populate the ImageField
# This is very limited because:
# - only uses local filesystem (no AWS S3, ...)
# - does not provide the advance splitting provided by upload_to
local_file = os.path.join(settings.MEDIA_ROOT, file_name)
with open(local_file, 'wb') as f:
f.write(data)
instance.image = file_name
instance.save()
EDIT3
So, after some more playing around I have discovered that my first implementation is doing the right thing, but silently failing if the passed data has the wrong format (I was mistakingly passing the base64 instead of the decoded data). I'll post this as a solution

Just save the file and the instance:
instance.image.save(file_name, File(StringIO(data)))
instance.save()
No idea where the docs for this usecase are.

You can use InMemoryUploadedFile directly to save data:
file = cStringIO.StringIO(base64.b64decode(request.POST['file']))
image = InMemoryUploadedFile(file,
field_name='file',
name=request.POST['name'],
content_type="image/jpeg",
size=sys.getsizeof(file),
charset=None)
instance.image = image
instance.save()

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to save fasttext model in binary and text formats? - gensim

Did you try creating a file in your local directory called "fasttext.model" before trying to save it? Also, I'm assuming you trained the model before this correct?

Related

Adding props (tags/ keywords) to a folder using only ctypes in Python 3

Azure Forms Recognizer - Saving output results SDK Python

Yolov4 error when trying to display image on custom model

SSIS dynamic flat file connection to load daily file with date-time,minute,second timestamp

Manually populate an ImageField

Categories

Resources