update
Can I split the small test set into a validation set realB-v and a test set realB-t, then I fine-tune the model and test on the test set realB-v. Then, I swap the validation set and the test set and train a new model. Can I report the average results on two trainings?
original post
I have a pre-trained model M trained on the real dataset realA, I test it on another real dataset realB and get very poor results because realA and realB have domain gaps. Since real images in realB are difficult to acquire, I decide to generate synthetic images like realB and use these images syntheticA to fine-tune the model M.
I wonder if I still need to get a validation set? If so, the validation set should be splitted from syntheticA or realB? realB is already a very small set (300 images).
In my view, I don't think a validation set in this case is necessary. If I directly fine-tune the model and get hyperparameters according to the accuracy rate on realB, it won't cause generalization problems because the images I use for fine-tuning are all synthetic.
I'd like to hear your views. Thank you.
Related
I need to use the interaction variable feature of multiclass classification in H2OGradientBoostingEstimator in H2O in Python. I am not sure which parameter to use & how to use that. Can anyone please help me out with this?
Currently, I am using the below code -
pros_gbm = H2OGradientBoostingEstimator(nfolds=0,seed=1234, keep_cross_validation_predictions = False, ntrees=10, max_depth=3, learn_rate=0.01, distribution='multinomial')
hist_gbm = pros_gbm.train(x=predictors, y=target, training_frame=hf_train, validation_frame = hf_test,verbose=True)
GBM inherently creates interactions. You can extract information about feature interactions using the .feature_interaction() extractor method (for an H2O Model). More information is provided in the user guide and the Python docs.
If you want to explicitly add a new column that is the interaction between two numerics, you could create that manually by multiplying the two (or more) columns together to get a new interaction column.
For categorical interactions, there's also the the h2o.interaction() method in Python here to create interaction columns in the data (prior to sending it to the GBM or any algorithm).
I'm using AlexNet model in TFLearn and there is a method to define the regression layer, which is:
tflearn.layers.estimator.regression (incoming, placeholder='default', optimizer='adam', loss='categorical_crossentropy', metric='default', learning_rate=0.001, dtype=tf.float32, batch_size=64, shuffle_batches=True, to_one_hot=False, n_classes=None, trainable_vars=None, restore=True, op_name=None, validation_monitors=None, validation_batch_size=None, name=None)
and it states that "A metric can also be provided, to evaluate the model performance.". So I'm wondering when is this metric also used for validation or only used evaluation? If it's not used in validation then based on what metric does the validation work?
EDIT 1: I found out that the metric declared in regression() method is actually used for validating as well. The default metric is Accuracy. However one thing I don’t understand is that when I don't use validation_set (or set it to None), the summary while training still outputs the acc value. So how is this accuracy value computed?
EDIT 2: Found the answer here: https://github.com/tflearn/tflearn/issues/357
The training accuracy acc is based on your training data, while the validation accuracy val_acc is based on the validation data. So omitting validation data won't change the output.
I am using pycrfsuite now.
I Know crf training model's saving.
crf_trainer = pycrfsuite.Trainer()
crf_trainer.train('crf.crfsuite')
So, When I want to tag, i use the source.
crf_tagger = pycrfsuite.Tagger()
crf_tagger.open('crf.crfsuite')
But, I don't know how to recall the saved models for more training.
I would like to add a calculated attribute (property) to Products. It's value is to be calculated using a PHP function eg:
function CalculateCustomAttribute() {
...
//Do some calculations based on other Product attributes, date, etc
...
return $calculatedValue; // type float
}
This calculated attribute needs to be:
displayed in the Product page,
filterable through the "Layered Navigation", and
sortable in the "Product Listing".
Could this be done? And how?
What you want to do might be possible, but I am not sure that the approach you have described would be doable, I think it is too simplistic to work with the very complex Magento platform.
I had a similar project where the actual price of the products was constantly changing based on a few inputs and I was able to solve the problem fairly well, but it was definitely more complicated thank what you seem to be hoping for. I am not sure this scenario is helpful to you or not, but here it goes...
The basic idea was that I created new product attributes (eav attributes). These served as the inputs to determine what the price really should be. Note that in my case, these attributes were being updated fairly regularly by an outside process.
Then I created an observer on the "catalog_product_save_before" event that would simply do something like this:
//some calculations to get the $newPrice
$product->setPrice($newPrice);
So basically that will make it so that the price field will always be current whenever you save a product in the administrative screens.
Then also, since several of the attributes that were used as inputs were constantly changing (updated by an outside process), so we also had to add a magento cron job to run every so often, and it would recalculate the price for all the affected products with something like this...
//some calculations to get the $newPrice
$product->addAttributeUpdate("price", $newPrice, Mage::app()->getStore()->getStoreId());
So it all boils down to the fact that you should have the attribute saved in the db. And of course you need to find the specific spots of where to update that derived attribute. Maybe your requirements will vary slightly from what I have described, but it might get you on the right path at least.
Currently I'm developing a debate module (much like a scrum/kanban board) for a GPL application (e-cidadania) and I don't have any experience with complex backends. I have developed a basic frontend for it, but now I don't know what approach I should use for the ajax and django backends to save and manipulate the table and notes.
The table can be N rows and N columns, every row and column has a name and position inside the table. Every note has also a position, text and comments (managed with the django comments framework).
I thought to store the parent element of every note (so I can place it later) and store the name of the rows and columns like CSV strings. Is that a good approach?
A screenshot of the current frontend: http: //ur1. ca/4zn4h
Update: I almost forgot, the frontend has been done with jQuery Sortables (so the user can move the note around as he likes) and CSS3.
You just need to model your domain (that is, debates that look like scrum boards) within Django. Think about it in plain English first, like this:
The has debates. These consist of criteria, organised in rows and columns in a specific order. This creates cells, which can have notes inside them.
Then you can set to work translating this into model classes. Don't worry too much about the fields they contain, the most important bit is the relationships (so the ForeignKey bits):
class Debate(models.Model):
title = ...
class Column(models.Model):
title = ...
order = ...
board = models.ForeignKey(ScrumBoard, related_name='columns')
class Row(models.Model):
title = ...
order = ...
board = models.ForeignKey(ScrumBoard, related_name='rows')
class Cell(models.Model):
column = models.ForeignKey(Column)
row = models.ForeignKey(Row)
class Note(models.Model)
text = ...
cell = models.ForeignKey(Cell)
That might be overly complex for what you need, though. I'm not an expert in the problem you're trying to solve? My suggestion, Django is quick – so start hacking, and give it a go, and if it's all wrong then you can go back a few steps, clean out your database and try again.
You might find it useful to play with South, which does database migrations for when you do things like add/remove/edit fields in your models.