YellowbrickTypeError: could not find feature importances param on Pipeline - yellowbrick

I'm getting this error when calling
viz = feature_importances(model_lr, X_train, y_train, labels=final_features, topn=20, is_fitted=True, ax=axes[1,0], show=False)
Here is how the pipeline is setup:
clf_lr = Pipeline(
steps=[("preprocessor", preprocessor),
("feature_selector", RFE(estimator=RandomForestClassifier(random_state=42), verbose=3, n_features_to_select=15)) ,
("classifier", LogisticRegression(max_iter=3000, random_state=42)),
]
)
gs_lr = GridSearchCV(clf_lr,
lr_param_grid, #some param grid for lr, omitted her
cv=cv,
scoring={'bsl': bsl_scorer},
verbose=3,
error_score='raise',
n_jobs=-1,
refit='bsl')
model_lr = gs_lr.fit(X_train, y_train)

Related

ValueError with Sklearn LinearRegression model.predict()

I am trying to do a simple linear regression model to estimate the sales price of an item for borrowers we don't have contract information on. I'm using data from borrowers we do have price and payment info on and using sklearn's LinearRegression model but getting an error when I call the predict() method on the model. The exact error:
ValueError: X has 844 features, but LinearRegression is expecting 2529 features as input.
Here is my code, I feel like it's fairly straightforward. The build_customer_df is a method call that returns the dataframe with some column formatting, nothing fancy:
`
fp = Path('master_borrower.xlsx')
df = build_customer_df(fp)
df = df[['payment', 'trailer_sales_price']]
df = df[df['payment']!= 0]
X = df['payment'].values
y = df['trailer_sales_price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
X_test = X_test.reshape(1,-1)
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
y_test = y_test.reshape(1,-1)
model = linear_model.LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

AttributeError: 'Series' object has no attribute 'lower'?

"this is a code, folds are created but problem is with fit function"
"this is a code, folds are created but problem is with fit function"
"this is a code, folds are created but problem is with fit function"
data = pd.read_csv('Augsynonym.csv')
print(data)
txt= data['Text']
sent = data['Sentiment']
kf =KFold(n_splits=5)
model = LogisticRegression(solver= 'liblinear')
vectorizer = CountVectorizer()
acc_score = []
Xtrain=[]
xtest=[]
for train_set, test_set in kf.split(txt):
print(train_set, len(train_set))
print(test_set, len(test_set))
X_train , X_test = txt.iloc[train_set],txt.iloc[test_set]
y_train , y_test = sent[train_set] , sent[test_set]
Xtrain.append(X_train)
xtest.append(X_test)
xtrain = vectorizer.fit_transform(Xtrain)`enter code here`
testx = vectorizer.fit_transform(xtest)
model.fit(xtrain,y_train)
pred_values = model.predict(testx)
acc = accuracy_score(pred_values , y_test)
acc_score.append(acc)

Tensorflow 2.0 ImageAugmentation using tf.keras.preprocessing.image.ImageDataGenerator and tf.datasets: model.fit() is running infinitely

I am facing issue while running the fit() function in TensorFlow with augmented images(using ImageDataGenerator) passed as a dataset. The fit() function is running infinitely without stopping.
I tried it with the default code which was shared in Tensorflow documentation.
Please find the code snippet below:
train_data_generator = ImageDataGenerator(
rotation_range=20,
shear_range=0.5,
zoom_range=0.4,
rescale=1./255,
vertical_flip=True,
validation_split=0.2,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)
test_data_generator = ImageDataGenerator(rescale=1./255)
ftrain_generator = train_data_generator.flow(
X_train,
y_train,
batch_size=batch_size,
shuffle=True)
ftrain_generator_ds = tf.data.Dataset.from_generator(lambda : ftrain_generator,
output_types=(tf.float32, tf.float32),
output_shapes = ([batch_size, img_rows, img_cols, num_channel],[batch_size, num_classes]))
ftest_generator = test_data_generator.flow(
X_test,
y_test,
batch_size=batch_size,
shuffle=False)
ftest_generator_ds = tf.data.Dataset.from_generator(lambda : ftest_generator,
output_types=(tf.float32, tf.float32),
output_shapes = ([batch_size, img_rows, img_cols, num_channel],[batch_size, num_classes]))
ftrain_generator_ds = ftrain_generator_ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
ftest_generator_ds = ftest_generator_ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
model2.fit(ftrain_generator, epochs = num_epoch, validation_data=ftest_generator)

Can't get train and test sets

I applied k-fold cross validation to split data into train and test sets.
But when I want to get train and test sets I have these errors:
AttributeError: 'numpy.ndarray' object has no attribute 'iloc'
Thanks for your help.
y = df_dummies['Churn'].values
X = df_dummies.drop(columns = ['Churn'])
from sklearn.preprocessing import MinMaxScaler
features = X.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(X)
X = pd.DataFrame(scaler.transform(X))
X.columns = features
from sklearn.model_selection import KFold
kf=KFold(n_splits=5,shuffle=True)
for train,test in kf.split(X):
print("%s %s" % (train,test))
for train_index, test_index in kf.split(X):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
from sklearn.linear_model import LogisticRegression
CLF = LogisticRegression().fit(X_train, y_train)
print('Accuracy of Logistic regression classifier on training set: {:.2f}'
.format(CLF.score(X_train, y_train)))
print('Accuracy of Logistic regression classifier on test set: {:.2f}'
.format(CLF.score(X_test, y_test)))
NameError: name 'y_train' is not defined
The issue is that df_dummies['Churn'].values returns an array not a dataframe. But you are trying to get attributes from an array which don't exist. The iloc function is in pandas.DataFrame.
Use y = df_dummies['Churn'] instead.
Reference: https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc
PS: I don't know how these type of questions could be migrated to a sister site. Perhaps, someone who knows that could migrate this to cross-validated please.

Data reshaping in sklearn (Linear regression)

input code:
data = pd.read_csv('test.csv')
data.head()
data['Density'] = data['Flow [Veh/h]'] / data['Speed [km/h]']
data = data.replace(np.nan, 1)
X = data['Density']
y = data['Speed [km/h]']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(X_train,y_train) #HERE I GOT AN ERROR
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
You can try changing your variable X as the following:
X = data['Density'].values.reshape((-1, 1))
I had faced the same error, where my feature set had only one variable. The above change solved the issue for me.
Try using [[]] while taking the parameters:
X = data[['Density']]

Resources