How to adjust transparency (alpha) in seaborn swarmplot? - seaborn

I have a swarmplot:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True)
and I would like to adjust the transparency of the dots:
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
scatter_kws = {'alpha': 0.1})
sns.swarmplot(y = "age gap corr", x = "cluster",
data = scatter_data, hue = 'group', dodge=True,
plot_kws={'scatter_kws': {'alpha': 0.1}})
but neither of the above methods works.
any help is appreciated.

You can simply input the alpha argument directly in the swarmplot function:
import seaborn as sns
df = sns.load_dataset('diamonds').sample(1000)
sns.swarmplot(data=df, x='cut', y='carat', hue='color', alpha=0.5)
The documentation for swarmplot states
kwargs : key, value mappings
Other keyword arguments are passed through to matplotlib.axes.Axes.scatter().
Thus, you don't need to use scatter_kws={...}.
Compare this to, e.g., sns.lmplot, which states
{scatter,line}_kws : dictionaries
Additional keyword arguments to pass to plt.scatter and plt.plot.

Related

Assigning the result of corrplot to a variable

I am using the function corrplot from corrplot package to generate a plot of a correlation matrix created with cor.test (psych package).
When I try to save the result into a variable, the variable is NULL.
Anyone could advice, please?
library(corrplot)
library(psych)
library(ggpubr)
data(iris)
res_pearson.c_setosa<-iris%>%
filter(Species=="setosa")%>%
select(Sepal.Length:Petal.Width)%>%
corr.test(., y = NULL, use = "complete",method="pearson",adjust="bonferroni", alpha=.05,ci=TRUE,minlength=5)
corr.a<-corrplot(res_pearson.c_setosa$r[,1:3],
type="lower",
order="original",
p.mat = res_pearson.c_setosa$p[,1:3],
sig.level = 0.05,
insig = "blank",
col=col4(10),
tl.pos = "ld",
tl.cex = .8,
tl.srt=45,
tl.col = "black",
cl.cex = .8)+
my.theme #this is a theme() piece, but if I take this away, the result is a list rather than a plot
You can create your own function where you put recordPlot at the end to save the plot. After that you can save the output of the function in a variable. Here is a reproducible example:
library(corrplot)
library(psych)
library(ggpubr)
library(dplyr)
data(iris)
res_pearson.c_setosa<-iris%>%
filter(Species=="setosa")%>%
select(Sepal.Length:Petal.Width)%>%
corr.test(., y = NULL, use = "complete",method="pearson",adjust="bonferroni", alpha=.05,ci=TRUE,minlength=5)
your_function <- function(ff){
corr.a<-corrplot(ff$r[,1:3],
type="lower",
order="original",
p.mat = ff$p[,1:3],
sig.level = 0.05,
insig = "blank",
#col=col4(10),
tl.pos = "ld",
tl.cex = .8,
tl.srt=45,
tl.col = "black",
cl.cex = .8)
#my.theme #this is a theme() piece, but if I take this away, the result is a list rather than a plot
recordPlot() # save the latest plot
}
your_function(res_pearson.c_setosa)
p <- your_function(res_pearson.c_setosa)
p
Created on 2022-07-13 by the reprex package (v2.0.1)
As you can see, the variable p outputs the plot.

Fine-tune a pre-trained model

I am new to transformer based models. I am trying to fine-tune the following model (https://huggingface.co/Chramer/remote-sensing-distilbert-cased) on my dataset. The code:
enter image description here
and I got the following error:
enter image description here
I will be thankful if anyone could help.
The preprocessing steps I followed:
input_ids_t = []
attention_masks_t = []
for sent in df_train['text_a']:
encoded_dict = tokenizer.encode_plus(
sent,
add_special_tokens = True,
max_length = 128,
pad_to_max_length = True,
return_attention_mask = True,
return_tensors = 'tf',
)
input_ids_t.append(encoded_dict['input_ids'])
attention_masks_t.append(encoded_dict['attention_mask'])
# Convert the lists into tensors.
input_ids_t = tf.concat(input_ids_t, axis=0)
attention_masks_t = tf.concat(attention_masks_t, axis=0)
labels_t = np.asarray(df_train['label'])
and i did the same for testing data. Then:
train_data = tf.data.Dataset.from_tensor_slices((input_ids_t,attention_masks_t,labels_t))
and the same for testing data
It sounds like you are feeding the transformer_model 1 input instead of 3. Try removing the square brackets around transformer_model([input_ids, input_mask, segment_ids])[0] so that it reads transformer_model(input_ids, input_mask, segment_ids)[0]. That way, the function will have 3 arguments and not just 1.

Properly evaluate a test dataset

I trained a machine translation model using huggingface library:
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Replace -100 in the labels as we can't decode them.
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Some simple post-processing
decoded_preds, decoded_labels = postprocess_text(decoded_preds, decoded_labels)
result = metric.compute(predictions=decoded_preds, references=decoded_labels)
result = {"bleu": result["score"]}
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
result = {k: round(v, 4) for k, v in result.items()}
return result
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
model_dir = './models/'
trainer.save_model(model_dir)
The code above is taken from this Google Colab notebook. After the training, I can see the trained model is saved to the folder models and the metric is calculated. Now I want to load the trained model and do the prediction on a new dataset, here is what I tried:
dataset = load_dataset('csv', data_files='data/training_data.csv')
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
# Tokenize the test dataset
tokenized_datasets = train_test.map(preprocess_function_v2, batched=True)
test_dataset = tokenized_datasets['test']
model = AutoModelForSeq2SeqLM.from_pretrained('models')
model(test_dataset)
It threw the following error:
*** AttributeError: 'Dataset' object has no attribute 'size'
I tried the evaluate() function as well, but it said:
*** torch.nn.modules.module.ModuleAttributeError: 'MarianMTModel' object has no attribute 'evaluate'
And the function eval only prints the configuration of the model.
What is the proper way to evaluate the performance of the trained model on a new dataset?
Turned out that the prediction can be produced using the following code:
inputs = tokenizer(
questions,
max_length=max_input_length,
truncation=True,
return_tensors='pt',
padding=True).to('cuda')
translation = model.generate(**inputs)

How to update bokeh active interaction with GeoJSON as data source?

I have made an interactive choropleth map with bokeh, and I'm trying to add active interactions using the dropdown widget (Select). However, most tutorials and SO questions about active interactions use ColumnDataSource, and not GeoJSONDataSource.
The issue is that GeoJSONDataSource doesn't have a .data method like ColumnDataSource does, so idk exactly how the syntax works when updating it.
My dataset is a dictionary in the form of city_dict = {'Amsterdam': <some data frame>, 'Antwerp': <some data frame>, ...}, where the dataframe is in geojson format. I have already confirmed that this format works when making glyphs.
def update(attr, old, new):
s_value = dropdown.value
p.title.text = '%s', s_value
new_src1 = make_dataset(s_value)
val1 = GeoJSONDataSource(new_src1)
r1.data_source = val1
where make_dataset is a function that transforms my original dataset into a dataset that can feed into the GeoJSONDataSource function. make_dataset requires a string (name of the city) to work eg. 'Amsterdam'. It works on passive interactions.
The main plot code (removed unnecessary stuff) is:
dropdown = Select(value='Amsterdam', options = cities)
controls = WidgetBox(dropdown)
initial_city = 'Amsterdam'
a = make_dataset(initial_city)
src1 = GeoJSONDataSource(a)
p = figure(title = 'Amsterdam', plot_height = 750 , plot_width = 900, toolbar_location = 'right')
r1 = p.patches('xs','ys', source = src1, fill_color = {'field' :'norm', 'transform' : color_mapper})
dropdown.on_change('value', update)
layout = row(controls, p)
curdoc().add_root(layout)
I've added the error I get. error handling message Message 'PATCH-DOC' (revision 1) content: {'events': [{'kind': 'ModelChanged', 'model': {'type': 'Select', 'id': '1147'}, 'attr': 'value', 'new': 'Antwerp'}], 'references': []}: ValueError("expected a value of type str, got ('%s', 'Antwerp') of type tuple",)

Why spatial transformer network (STN) is not working on image

I am trying to subject a one image array to STN using code from https://github.com/kevinzakka/spatial-transformer-network :
def STNfn(x):
import tensorflow as tf
print(x.shape)
B,W,H,C = x.shape
# identity transform
initial = np.array([[1., 0, 0], [0, 1., 0]])
initial = initial.astype('float32').flatten()
# localization network
n_fc = 6
W_fc1 = tf.Variable(tf.zeros([H*W*C, n_fc]), name='W_fc1')
b_fc1 = tf.Variable(initial_value=initial, name='b_fc1')
h_fc1 = tf.matmul(tf.zeros([B, H*W*C]), W_fc1) + b_fc1
# spatial transformer layer
from stn import spatial_transformer_network as transformer
h_trans = transformer(x, h_fc1)
return h_trans
fname = 'testimage.jpg'
img = plt.imread(fname)
img = STNfn(np.array([img]))
However, I am getting following error:
TypeError: Input 'y' of 'Mul' Op has type uint8
that does not match type float32 of argument 'x'.
I have tried to replace float32 with np.uint8, but it does not help.
Where is the problem and how can it be solved?
n_fc = 6 has to be a float32 maybe? Not familar with Python, in Java it is like 6.0f for float and just 6 is integer.

Resources