Suppres outputs from optuna.integration.lightgbm - lightgbm

I'm using the optuna.integration.lightgbm for training a LightGBM model.
The issue is theres a TON out outputs, and frankly I just want to disable them (or atleast find a way to regularize it).
I have tried this and lots of stuff e.g
import optuna
import optuna.integration.lightgbm as lgb
from lightgbm import log_evaluation
optuna.logging.set_verbosity(optuna.logging.ERROR) #Ignore outputs from Optuna when training
params = {
"objective": "softmax",
"metric":"multi_logloss",
"boosting_type": "gbdt",
"is_unbalance":True,
"num_classes":4,
"num_boost_round":10,
"verbosity":-1
}
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
callbacks = [early_stopping(100,verbose=False),log_evaluation(0)],
)
but I still get "early_stopping" outputs, validation outputs from each round etc. etc. as seen
There's even a suggestion of using log_evalution(), which I have also passed.
I can't think of more ways to (try) to ignore outputs.

Related

Is there a way to get a list of all my categorical and numerical features in Pycaret?

I am using Pycaret for a classification problem and I want to get a list of all the categorical and numerical variables inferred by setup() for EDA. Is there a way to do this?
I have tried looking at any function in the documentation but couldn't find anything.
Currently, I find only one way to do it in PyCaret 3.x by accessing the private variable of the Experiment object.
from pycaret.datasets import get_data
from pycaret.classification import *
data = get_data('bank', verbose=False)
exp = setup(data = data, target = 'deposit', session_id=123, verbose=False);
print(f'Ordinal features: {exp._fxs["Ordinal"]}')
print(f'Numeric features: {exp._fxs["Numeric"]}')
print(f'Date features: {exp._fxs["Date"]}')
print(f'Text features: {exp._fxs["Text"]}')
print(f'Categorical features: {exp._fxs["Categorical"]}')

PyQGIS, custom processing algorithm: How to use selected features only?

I want to create a custom processing algorithm with PyQGIS, which is able to take a vector layer as input (in this case of type point) and then do something with it's features. It's working well as long as I just choose the whole layer. But it doesn't work if I'm trying to work on selected features only.
I'm using QgsProcessingParameterFeatureSource to be able to work on selected features only. The option is shown and I can enable the checkbox. But when I'm executing the algorithm, I get NoneType as return of parameterAsVectorLayer.
Below you'll find a minimal working example to reproduce the problem:
from qgis.PyQt.QtCore import QCoreApplication
from qgis.core import (
QgsProcessing,
QgsProcessingAlgorithm,
QgsProcessingParameterFeatureSource
)
name = "selectedonly"
display_name = "Selected features only example"
group = "Test"
group_id = "test"
short_help_string = "Minimal working example code for showing my problem."
class ExampleProcessingAlgorithm(QgsProcessingAlgorithm):
def tr(self, string):
return QCoreApplication.translate('Processing', string)
def createInstance(self):
return ExampleProcessingAlgorithm()
def name(self):
return name
def displayName(self):
return self.tr(display_name)
def group(self):
return self.tr(group)
def groupId(self):
return group_id
def shortHelpString(self):
return self.tr(short_help_string)
def initAlgorithm(self, config=None):
self.addParameter(
QgsProcessingParameterFeatureSource(
'INPUT',
self.tr('Some point vector layer.'),
types=[QgsProcessing.TypeVectorPoint]
)
)
def processAlgorithm(self, parameters, context, feedback):
layer = self.parameterAsVectorLayer(
parameters,
'INPUT',
context
)
return {"OUTPUT": layer}
If I'm working on the whole layer, the output is {'OUTPUT': <QgsVectorLayer: 'Neuer Temporärlayer' (memory)>}, which is what I would expect.
If I'm working on selected features only, my output is {'OUTPUT': None}, which doesn't makes sense to me. I selected some of the features before executing of course.
I'm using QGIS-version 3.22 LTR, if it's relevant.
Can anybody tell me what I'm doing wrong?
I would suggest you trying to use the method 'parameterAsSource' in the 'processAlgorithm' method.
layer = self.parameterAsSource(
parameters,
'INPUT',
context
)

How does TextCategorizer.predict work with spaCy?

I've been following the spaCy quick-start guide for text classification.
Let's say I have a very simple dataset.
TRAIN_DATA = [
("beef", {"cats": {"POSITIVE": 1.0, "NEGATIVE": 0.0}}),
("apple", {"cats": {"POSITIVE": 0, "NEGATIVE": 1}})
]
I'm training a pipe to classify text. It trains and has a low loss rate.
textcat = nlp.create_pipe("pytt_textcat", config={"exclusive_classes": True})
for label in ("POSITIVE", "NEGATIVE"):
textcat.add_label(label)
nlp.add_pipe(textcat)
optimizer = nlp.resume_training()
for i in range(10):
random.shuffle(TRAIN_DATA)
losses = {}
for batch in minibatch(TRAIN_DATA, size=8):
texts, cats = zip(*batch)
nlp.update(texts, cats, sgd=optimizer, losses=losses)
print(i, losses)
Now, how do I predict whether a new string of text is "POSITIVE" or "NEGATIVE"?
This will work:
doc = nlp(u'Pork')
print(doc.cats)
It gives a score for each category we've trained to predict on.
But that seems at odds with the docs. It says I should use a predict method on the original subclass pipeline component.
That doesn't work though.
Trying textcat.predict('text') or textcat.predict(['text']) etc.. throws:
AttributeError Traceback (most recent call last)
<ipython-input-29-39e0c6e34fd8> in <module>
----> 1 textcat.predict(['text'])
pipes.pyx in spacy.pipeline.pipes.TextCategorizer.predict()
AttributeError: 'str' object has no attribute 'tensor'
The predict methods of pipeline components actually expect a Doc as input, so you'll need to do something like textcat.predict(nlp(text)). The nlp used there does not necessarily have a textcat component. The result of that call then needs to be fed into a call to set_annotations() as shown here.
However, your first approach is just fine:
...
nlp.add_pipe(textcat)
...
doc = nlp(u'Pork')
print(doc.cats)
...
Internally, when calling nlp(text), first the Doc for the text will be generated, and then each pipeline component, one by one, will run its predict method on that Doc and keep adding information to it with set_annotations. Eventually the textcat component will define the cats variable of the Doc.
The API docs from which you're citing for the other approach, kind of give you a look "under the hood". So they're not really conflicting approaches ;-)

Issue with topic word distributions after malletmodel2ldamodel in gensim

After training an LDA model on gensim LDA model i converted the model to a with the gensim mallet via the malletmodel2ldamodel function provided with the wrapper. Before and after the conversion the topic word distributions are quite different. The mallet version returns very rare topic word distribution after conversion.
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=13, id2word=dictionary)
model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldamallet)
model.save('ldamallet.gensim')
dictionary = gensim.corpora.Dictionary.load('dictionary.gensim')
corpus = pickle.load(open('corpus.pkl', 'rb'))
lda_mallet = gensim.models.wrappers.LdaMallet.load('ldamallet.gensim')
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda_mallet, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)
Here is the output from gensim original implementation:
I can see there was a bug around this issue which has been fixed with the previous versions of gensim. I am using gensim=3.7.1
Here is an optional function to use instead of malletmodel2ldamodel (reported to have bugs):
from gensim.models.ldamodel import LdaModel
import numpy
def ldaMalletConvertToldaGen(mallet_model):
model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, eta=0, iterations=1000, gamma_threshold=0.001, dtype=numpy.float32)
model_gensim.state.sstats[...] = mallet_model.wordtopics
model_gensim.sync_state()
return model_gensim
converted_model = ldaMalletConvertToldaGen(mallet_model)
I used it and it worked perfectly.

access DataFrame imported with FileChooser

I would import a csv file into python with FileChooser. Then when using rpy2, I can perform Statistical analyses with R I know much better compared to Python. Below is a piece of my code:
import pygtk
pygtk.require("2.0")
import gtk
from rpy2.robjects.vectors import DataFrame
def get_open_filename(self):
filename = None
chooser = gtk.FileChooserDialog("Open File...", self.window,
gtk.FILE_CHOOSER_ACTION_OPEN,
(gtk.STOCK_CANCEL, gtk.RESPONSE_CANCEL,
gtk.STOCK_OPEN, gtk.RESPONSE_OK))
response = chooser.run()
if response == gtk.RESPONSE_OK:
don = DataFrame.from_csvfile(chooser.get_filename())
print(don)
chooser.destroy()
return filename
When runing the code, don is printed. But the question is: in don, there are two columns, X and Y I can't access to perform analyses. Thanks for your kind help
Did you check the documentation about extracting elements from a DataFrame ?

Resources