GridSearch with XGBoost producing Depreciation error on infinite loop - performance

I am trying to do a hyperparameter tuning using GridSearchCV on XGBoost.But, I'm getting the following error.
/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.if diff:
This keeps on running forever. Given below is the code.
classifier = xgb.XGBClassifier()
from sklearn.grid_search import GridSearchCV
n_estimators=[10,50,100,150,200,250,300]
max_depth=[2,3,4,5,6,7,8,9,10]
learning_rate=[0.1,0.01,0.09,0.08,0.07,0.001]
colsample_bytree=[0.5,0.6,0.7,0.8,0.9]
min_child_weight=[1,2,3,4,5,6,7,8,9,10]
gamma=[0.001,0.01,0.1,0.2,0.3,0.4,0.5,1]
subsample=[0.5,0.6,0.7,0.8,0.9]
param_grid=dict(n_estimators=n_estimators,max_depth=max_depth,learning_rate=learning_rate,colsample_bytree=colsample_bytree,min_child_weight=min_child_weight,gamma=gamma,subsample=subsample)
grid = GridSearchCV(classifier, param_grid, cv=10, scoring='accuracy')
grid.fit(X, Y)
grid.grid_scores_
print(grid.best_score_)
print(grid.best_params_)
print(grid.best_estimator_)
# Predicting the Test set results
Y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)
I am using python3.5, XGBOOT and gridsearch library has already been preloaded. I am running this on google collaboratory.
Please suggest what is going wrong ?

Related

ljspeech Hugging Face examples not working

When trying to run the ljspeech example, I get the following error, even when the model is moved to the only GPU in the system. I am using Cuda 11.7, Pytorch 1.13.1, and Fairseq 0.12.2.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The code used:
from fairseq.checkpoint_utils import load_model_ensemble_and_task_from_hf_hub
from fairseq.models.text_to_speech.hub_interface import TTSHubInterface
import IPython.display as ipd
import torch
models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
"facebook/fastspeech2-en-ljspeech",
arg_overrides={"vocoder": "hifigan", "fp16": False}
)
model = models[0].to(torch.device('cuda'))
models[0] = model
TTSHubInterface.update_cfg_with_data_cfg(cfg, task.data_cfg)
generator = task.build_generator(models, cfg)
text = "Hello, this is a test run."
sample = TTSHubInterface.get_model_input(task, text)
wav, rate = TTSHubInterface.get_prediction(task, model, generator, sample)
ipd.Audio(wav, rate=rate)

PyTorch Lightning - How can I output a summary of training to the console

PyTorch Lighting can log to TensorBoard. How can I make it log to the console a table summarizing the training runs (similar to Huggingface's Transformers, shown below):
Epoch Training Loss Validation Loss Runtime Samples Per Second
1 1.220600 1.160322 39.574900 272.496000
2 0.945200 1.121690 39.706000 271.596000
3 0.773000 1.157358 39.734000 271.405000
You can write your own callback function and add it into the trainer.
from pytorch_lightning.utilities import rank_zero_info
class LoggingCallback(pl.Callback):
def on_validation_end(self, trainer: pl.Trainer, pl_module: pl.LightningModule):
rank_zero_info("***** Test results *****")
metrics = trainer.callback_metrics
# Log results
for key in sorted(metrics):
if key not in ["log", "progress_bar"]:
rank_zero_info("{} = {}\n".format(key, str(metrics[key])))

Loading multiple CSV files (silos) to compose Tensorflow Federated dataset

I am working on pre-processed data that were already siloed into separated csv files to represent separated local data for federated learning.
To correct implement the federated learning with these multiple CSVs on TensorFlow Federated, I am just trying to reproduce the same approach with a toy example in the iris dataset. However, when trying to use the method tff.simulation.datasets.TestClientData, I am getting the error:
TypeError: can't pickle _thread.RLock objects
The current code is as follows, first, load the three iris dataset CSV files (50 samples on each) into a dictionary from the filenames iris1.csv, iris2.csv, and iris3.csv:
silos = {}
for silo in silos_files:
silo_name = silo.replace(".csv", "")
silos[silo_name] = pd.read_csv(silos_path + silo)
silos[silo_name]["variety"].replace({"Setosa" : 0, "Versicolor" : 1, "Virginica" : 2}, inplace=True)
Creating a new dict with tensors:
silos_tf = collections.OrderedDict()
for key, silo in silos.items():
silos_tf[key] = tf.data.Dataset.from_tensor_slices((silo.drop(columns=["variety"]).values, silo["variety"].values))
Finally, trying to converting the Tensorflow Dataset into a Tensorflow Federated Dataset:
tff_dataset = tff.simulation.datasets.TestClientData(
silos_tf
)
That raises the error:
TypeError Traceback (most recent call last)
<ipython-input-58-a4b5686509ce> in <module>()
1 tff_dataset = tff.simulation.datasets.TestClientData(
----> 2 silos_tf
3 )
/usr/local/lib/python3.7/dist-packages/tensorflow_federated/python/simulation/datasets/from_tensor_slices_client_data.py in __init__(self, tensor_slices_dict)
59 """
60 py_typecheck.check_type(tensor_slices_dict, dict)
---> 61 tensor_slices_dict = copy.deepcopy(tensor_slices_dict)
62 structures = list(tensor_slices_dict.values())
63 example_structure = structures[0]
...
/usr/lib/python3.7/copy.py in deepcopy(x, memo, _nil)
167 reductor = getattr(x, "__reduce_ex__", None)
168 if reductor:
--> 169 rv = reductor(4)
170 else:
171 reductor = getattr(x, "__reduce__", None)
TypeError: can't pickle _thread.RLock objects
I also tried to use Python dictionary instead of OrderedDict but the error is the same. For this experiment, I am using Google Colab with this notebook as reference running with TensorFlow 2.8.0 and TensorFlow Federated version 0.20.0. I also used these previous questions as references:
Is there a reasonable way to create tff clients datat sets?
'tensorflow_federated.python.simulation' has no attribute 'FromTensorSlicesClientData' when using tff-nightly
I am not sure if this is a good way that derives for a case beyond the toy example, please, if any suggestion on how to bring already siloed data for TFF tests, I am thankful.
I did some search of public code in github using class tff.simulation.datasets.TestClientData, then I found the following implementation (source here):
def to_ClientData(clientsData: np.ndarray, clientsDataLabels: np.ndarray,
ds_info, is_train=True) -> tff.simulation.datasets.TestClientData:
"""Transform dataset to be fed to fedjax
:param clientsData: dataset for each client
:param clientsDataLabels:
:param ds_info: dataset information
:param train: True if processing train split
:return: dataset for each client cast into TestClientData
"""
num_clients = ds_info['num_clients']
client_data = collections.OrderedDict()
for i in range(num_clients if is_train else 1):
client_data[str(i)] = collections.OrderedDict(
x=clientsData[i],
y=clientsDataLabels[i])
return tff.simulation.datasets.TestClientData(client_data)
I understood from this snippet that the tff.simulation.datasets.TestClientData class requires as argument an OrderedDict composed by numpy arrays instead of a dict of tensors (as my previous implementation), now I changed the code for the following:
silos_tf = collections.OrderedDict()
for key, silo in silos.items():
silos_tf[key] = collections.OrderedDict(
x=silo.drop(columns=["variety"]).values,
y=silo["variety"].values)
Followed by:
tff_dataset = tff.simulation.datasets.TestClientData(
silos_tf
)
That correctly runs as the following output:
>>> tff_dataset.client_ids
['iris3', 'iris1', 'iris2']

Execute is not defined in IBM quantum computing lab

I am using IBM's quantum computing lab, and was following a tutorial made by IBM for getting started, and my code is throwing errors. I followed the tutorial exactly. Here is my code:
#-----------Cell 1:
import numpy as np
# Importing standard Qiskit libraries
from qiskit import QuantumCircuit, transpile, Aer, IBMQ
from qiskit.tools.jupyter import *
from qiskit.visualization import *
from ibm_quantum_widgets import *
from qiskit.providers.aer import QasmSimulator
# Loading your IBM Quantum account(s)
provider = IBMQ.load_account()
#-----------Cell 2:
# Build
#------
# Create a Quantum Circuit acting on the q register
circuit = QuantumCircuit(2, 2)
# Add a H gate on qubit 0
circuit.h(0)
# Add a CX (CNOT) gate on control qubit 0 and target qubit 1
circuit.cx(0, 1)
# Map the quantum measurement to the classical bits
circuit.measure([0,1], [0,1])
# END
# Execute
#--------
# Use Aer's qasm_simulator
simulator = Aer.get_backend('qasm_simulator')
# Execute the circuit on the qasm simulator
job = execute(circuit, simulator, shots=1000)
# Grab results from the job
result = job.result()
# Return counts
counts = result.get_counts(circuit)
print("\nTotal count for 00 and 11 are:",counts)
# END
# Visualize
#----------
# Import draw_circuit, then use it to draw the circuit
from ibm_quantum_widgets import draw_circuit
draw_circuit(circuit)
# Analyze
#--------
# Plot a histogram
plot_histogram(counts)
# END
This code throws this error:
Traceback (most recent call last):
File "/tmp/ipykernel_59/1801586149.py", line 26, in <module>
job = execute(circuit, simulator, shots=1000)
NameError: name 'execute' is not defined
Use %tb to get the full traceback.
I am new to IBM and quantum computing, how do I fix this error?
Here is the tutorial I was following if you need it: https://quantum-computing.ibm.com/lab/docs/iql/first-circuit
You did not import execute from qiskit.
Change
from qiskit import QuantumCircuit, transpile, Aer, IBMQ
to
from qiskit import QuantumCircuit, transpile, Aer, IBMQ, execute

Huggingface reformer for long document summarization

I understand reformer is able to handle a large number of tokens. However it does not appear to support the summarization task:
>>> from transformers import ReformerTokenizer, ReformerModel
>>> from transformers import pipeline
>>> summarizer = pipeline("summarization", model="reformer")
404 Client Error: Not Found for url: https://huggingface.co/reformer/resolve/main/config.json
...
How would you construct the pipeline "manually" to use reformer for summarization?
Try this:
summarizer = pipeline("summarization", model="google/reformer-enwik8")
via here.
However, this produces...
/lib/python3.7/site-packages/sentencepiece.py", line 177, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Resources