Annotation specs - AutoML (VertexAi) - google-cloud-vertex-ai

We're trying to build an imaged based product search for our webshop using the vertex ai image classification model (single label).
Currently we have around 20k products with xx images per product.
So our dataset containing 20k of labels (one for each product - product number), but on import we receive the following error message:
There are too many AnnotationSpecs in the dataset. Up to 5000 AnnotationSpecs are allowed in one Dataset. Check your csv/jsonl format with our public documentation.
Looks like not more than 5000 labels are allowed per Dataset... This quota is not really visible in the documentation - or we didn't find it.
Anyway, any ideas how we can make it work? Does we have to build 5 Datasets with 5 different Endpoints and than query every Enpoint for matching?

You can find those limits in the AutoML quotas and limits documentation.
It is possible to have multiple models for group of products -- Maybe even something like: one initial model to classify the product category (jewery, watches, shoes, toys, etc) and a second step for a specific model (to identify the specific product belong toys, or belong shoes, etc). But to be honest, it seems a bit hard to support - but certainly worth trying.
A second option would be training a custom model where you could do a fine tuning on some larger model (ie. inception, resnet, etc) do know all your 20k+ classes (products). It could add a little bit more work at first, but after established, it will become a single model for inference and re-training would be simpler using MLOps mechanisms (ie. Vertex Pipelines).

Related

How to define two different entity roles for the same prebuilt Entity (number) in LUIS

I am looking to build a bot that typically requires two numbers with a different meaning (role) in the same utterance. Let's take the example of a StockMarket order assistent (fictional, as example)
Some example utterances:
Buy 100 MSFT stock at limit of 340
Get me 200 Apple at maximum 239.4
Buy 40 AMZN at market price
In LUIS portal, I have defined two entities
StockSymbol a List entity (for all stocks, linking their symbols and the names as synonyms).
number the prebuilt entity with two Roles: Amount and Limit
When specifying the utterances shown as example, it shows that the entities get recognized. But I cannot find a way to specify the roles for the different number entities in my sample utterances. (in the examples, the first number instance of number is the Amount, and if a second it there, that is typically the Limit role.
Anyone an idea on how to define this and set this up?
Best regards
There are 2 different ways to do this, First is to use roles for a prebuilt entity, go into the number prebuilt, click on Roles, add 2 different roles.
one for Amount another for Limit then you have to go in the utterances and label for the roles, you do that by going to the utterance, clicking on the # symbol on the right, selecting the number prebuilt, selecting the role, then highlighting the numbers with that role.
Second approach is to use ML entities, create 2 ML entities, one for Amount one for Limit. Add the number as a feature and make it a required feature, and then go ahead and label the Amounts with the Amount entity and the limits with the Limit entity directly.

Correct way to label lists in GCP AutoML entity text model

I want to create a model to extract info for PDFs containing purchase orders. I thought that I could create an AutoML text entity model for that task. The main doubt is what's the best way to handle with the article lists. How can I label each cell in order I can have a list of rows in the reuslt
Thanks
The labeling is very important, less than 10 labels to start would make it easy. As you will need at least a 100 entities labeled per label to train. remember you have three sets to label, train, test, validate. 100 for train, 30 for test and 30 for validate should suffice.
check the label tab often, it shows the break down of what has been labeled so far.
google's documentation is a good start. https://cloud.google.com/natural-language/automl/docs/prepare
I ended up building a Java client to call predict on the model, sending it a list of files to process. the returned JSON has the entities by label for each file.

Online updating for ALS recommendation system

Is there a way to export the spark ALS model after its being trained, into a PMML model or whatever format, which can be called outside of spark environment?
E.g., in JAVA, given a customer id C and a product id P, load the model file created by SCALA program, and call it, it will output a score for (C,P).
The major reason for this question is that when the size of the active user is huge, say 0.1 billion users over 100 products, the prediction size will be 10 billion. And item-based recommendation is not an option in our case.
Not sure how people in the industry do that, especially when there is a need to update the model daily which is trained by the previous whole month/week data.
There is a way to save your model within the sprakenvironment, like this ALSmodel.save("myModelPath"). With this model you are able to score all known customer/item pairs.
I guess if you want to score outside of spark you have to export the item & user factors into another system and calculate the mf yourself. There you are also able to update user iteractions for your recommendations.
With ALSmodel.userFactors and ALSmodel.itemFactors you are able to extract the factors of your model.
Why would you want to score outside of spark? You can just simply precalculate your predictions and serve them online. If you want to update the recommendations on a very high frequenzy level you have to go the suggested way. If you want to update your model only on a daily basis I would suggest that you simply retrain the model every.

magento attributes / multiple atts with one selection

really could do with your expert advice / knowledge & help...
We have a client who we built a magento site for. He sells parts for motorbikes, jet-skis, motocross etc..
We set up three attributes "Manufacturer" then "Model" then "Year" - and this was the selection process inside each product to drill down a price (as price changes on the models year) to achieve the pricing structure he wanted we used a simpler config products plugin that worked a charm. (which I found searching through these forums)
Problem.. The "Model" attribute is getting way too big (crashing the browser and timing out - and approaching what I have been told is the limit for attributes) so we have to rethink the logic (as last resort is to change the whole site and add them all as simple products and use filters instead - which the client does not want)
After days fo stress and researching we are still none the wiser...
one idea would be to split the model attribute into manufacturer.. so "Hond_Model" .. "Aprilla_Model" and so on.. but then we cant keep the structure of one product with all the options inside, be great of we could have inside the product (front end)
select honda model
select aprilla model
select Can Am Model
Year
but the user can choose one model from any three of the "model" drop downs, then it blanks out the other 2 model attributes and lets the user select the year to get a price.
But in the back end when I try this all three "Model" attributes have a red asterix and require an input in all three..
HELP!
Sorry if some of this is basic I am a designer who is learning magento with the help of my developer for the past 6 months so still new to this but already way out of my depth.
Any help would be so appreciated.
Given the level of complexity, the relative newness of your company with Magento, and particulars of the automotive fitment domain, it might be wise to buy an existing fitment module (e.g. Year Make Model Extension - not an endorsement - i have no firsthand knowledge of this extension) to bootstrap your development or to learn from. You should check with module vendor first to make sure that the code is suitable for this purpose (not obfuscated / encrypted, written using Magento conventions).
There are several approaches which can be taken depending on how frontend presentation and backend reporting should work, but these are too broad to be discussed here.

Categorizing product data model

I'm about to rebuild an e commerce website with a mid sized database of about 40,000 product.
One of the main reasons we are putting the old system to retire is the categorization mechanism.
We are looking for something that will allow us to place an item under multiple categories, or even better having no fixed category at all and we can classify products by putting descriptive labels on them and the front user can search for the products by using these labels. (I don't even sure that this can be done from a programming point of view).
It will be helpful to know what this model is called, whether it was implemented before? and even better if you can refer me to ready solutions.

Resources