How to use kde_kws parameters for seaborn.histplot()? - seaborn

I am trying to use sns.histplot() instead of sns.distplot() since I got the following message in colab:
FutureWarning: distplot is a deprecated function and will be removed
in a future version. Please adapt your code to use either displot (a
figure-level function with similar flexibility) or histplot (an axes-level function for histograms).
Code:
import pandas as pd
import seaborn as sns
df = sns.load_dataset('tips')
sns.histplot(df['tip'], kde=True, kde_kws={'fill' : True});
I got an error when passing kde_kws parameters inside sns.histplot():
TypeError: init() got an unexpected keyword argument 'fill'

From the documentation kde_kws= is intended to pass arguments "that control the KDE computation, as in kdeplot()." It is not entirely explicit which arguments those are, but they seem to be the ones like bw_method= and bw_adjust= that change the way the KDE is computed, rather than displayed. If you want to change the appearance of the KDE plot, the you can use line_kws=, but, as the name implies, the KDE is represented only by a line and therefore cannot be filled.
If you want both a histogram and a filled KDE, you need to combine histplot() and kdeplot() on the same axes
sns.histplot(df['tip'], stat='density')
sns.kdeplot(df['tip'], fill=True)

Related

Gensim's FastText KeyedVector out of vocab

I want to use the read-only version of Gensim's FastText Embedding to save some RAM compared to the full model.
After loading the KeyVectors version, I get the following Error when fetching a vector:
IndexError: index 878080 is out of bounds for axis 0 with size 761210
The error occurs when using words that should be out-of-vocabulary e.g. "lawyerxy" instead of "lawyer". The full model returns a vector for both.
from gensim.models import KeyedVectors
model = KeyedVectors.load("model.kv")
model .wv.__getitem__("lawyerxy")
So, my assumption is that the KeyedVectors do not offer FastText's out of vacabulary function - a key feature for my usecase. This limitation is not given in the documentation:
https://radimrehurek.com/gensim/models/word2vec.html
Can anyone prove that assumption and/or name a fix to allow vectors for "lawyerxy" etc. ?
The KeyedVectors name is (as of gensim-3.8.0) just an alias for class Word2VecKeyedVectors, which only maintains a simple word (as key) to vector (as value) mapping.
You shouldn't expect FastText's advanced ability to synthesize vectors for out-of-vocabulary words to appear in any model/representation that doesn't explicitly claim to offer that ability.
(I would expect a lookup of an out-of-vocabulary word to give a clearer KeyError rather than the IndexError you've reported. But, you'd need to show exactly what code created the file you're loading, and triggered the error, and the full error stack, to further guess what's going wrong in your case.)
Depending on how your model.kv file was saved, you might be able to load it, with retained OOV-vector functionality, by using the class FastTextKeyedVectors instead of plain KeyedVectors.

Module variable documentation error

I get the following error while documenting a module variable json_class_index (See source), which does not have a docstring.
The generated documentation seems to be fine. What is a good fix?
reading sources... [100%] sanskrit_data_schema_common
/home/vvasuki/sanskrit_data/sanskrit_data/schema/common.py:docstring of sanskrit_data.schema.common.json_class_index:3: WARNING: Unexpected indentation.
/home/vvasuki/sanskrit_data/sanskrit_data/schema/common.py:docstring of sanskrit_data.schema.common.json_class_index:4: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/vvasuki/sanskrit_data/sanskrit_data/schema/common.py:docstring of sanskrit_data.schema.common.json_class_index:7: WARNING: Unexpected indentation.
/home/vvasuki/sanskrit_data/sanskrit_data/schema/common.py:docstring of sanskrit_data.schema.common.json_class_index:8: WARNING: Inline strong start-string without end-string.
Edit:
PS: Note that removing the below docstring makes the error disappear, so it seems to be the thing to fix.
.. autodata:: json_class_index
:annotation: Maps jsonClass values to Python object names. Useful for (de)serialization. Updated using update_json_class_index() calls at the end of each module file (such as this one) whose classes may be serialized.
The warning messages indicate that the reStructuredText syntax of your docstrings is not valid and needs to be corrected.
Additionally your source code does not comply with PEP 8. Indentation should be 4 spaces, but your code uses 2, which might cause problems with Sphinx.
First make your code compliant with PEP 8 indentation.
Second, you must have two lines separating whatever precedes info field lists and the info field lists themselves.
Third, if the warnings persist, then look at the line numbers in the warnings—3, 4, 7, and 8—and the warnings themselves. It appears that the warnings correspond to this block of code:
#classmethod
def make_from_dict(cls, input_dict):
"""Defines *our* canonical way of constructing a JSON object from a dict.
All other deserialization methods should use this.
Note that this assumes that json_class_index is populated properly!
- ``from sanskrit_data.schema import *`` before using this should take care of it.
:param input_dict:
:return: A subclass of JsonObject
"""
Try this instead, post-PEP-8-ification, which should correct most of the warnings caused by faulty white space in your docstring:
#classmethod
def make_from_dict(cls, input_dict):
"""
Defines *our* canonical way of constructing a JSON object from a dict.
All other deserialization methods should use this.
Note that this assumes that json_class_index is populated properly!
- ``from sanskrit_data.schema import *`` before using this should take care of it.
:param input_dict:
:return: A subclass of JsonObject
"""
This style is acceptable according to PEP 257. The indentation is visually and vertically consistent, where the triple quotes vertically align with the left indentation. I think it's easier to read.
The fix was to add a docstring for the variable as follows:
#: Maps jsonClass values to Python object names. Useful for (de)serialization. Updated using update_json_class_index() calls at the end of each module file (such as this one) whose classes may be serialized.
json_class_index = {}

MATLAB ConnectedComponentLabeler does not work in for loop

I am trying to get a set of binary images' eccentricity and solidity values using the regionprops function. I obtain the label matrix using the vision.ConnectedComponentLabeler function.
This is the code I have so far:
files = getFiles('images');
ecc = zeros(length(files)); %eccentricity values
sol = zeros(length(files)); %solidity values
ccl = vision.ConnectedComponentLabeler;
for i=1:length(files)
I = imread(files{i});
[L NUM] = step(ccl, I);
for j=1:NUM
L = changem(L==j, 1, j); %*
end
stats = regionprops(L, 'all');
ecc(i) = stats.Eccentricity;
sol(i) = stats.Solidity;
end
However, when I run this, I get an error says indicating the line marked with *:
Error using ConnectedComponentLabeler/step
Variable-size input signals are not supported when the OutputDataType property is set to 'Automatic'.'
I do not understand what MATLAB is talking about and I do not have any idea about how to get rid of it.
Edit
I have returned back to bwlabel function and have no problems now.
The error is a bit hard to understand, but I can explain what exactly it means. When you use the CVST Connected Components Labeller, it assumes that all of your images that you're going to use with the function are all the same size. That error happens because it looks like the images aren't... hence the notion about "Variable-size input signals".
The "Automatic" property means that the output data type of the images are automatic, meaning that you don't have to worry about whether the data type of the output is uint8, uint16, etc. If you want to remove this error, you need to manually set the output data type of the images produced by this labeller, or the OutputDataType property to be static. Hopefully, the images in the directory you're reading are all the same data type, so override this field to be a data type that this function accepts. The available types are uint8, uint16 and uint32. Therefore, assuming your images were uint8 for example, do this before you run your loop:
ccl = vision.ConnectedComponentLabeler;
ccl.OutputDataType = 'uint8';
Now run your code, and it should work. Bear in mind that the input needs to be logical for this to have any meaningful output.
Minor comment
Why are you using the CVST Connected Component Labeller when the Image Processing Toolbox bwlabel function works exactly the same way? As you are using regionprops, you have access to the Image Processing Toolbox, so this should be available to you. It's much simpler to use and requires no setup: http://www.mathworks.com/help/images/ref/bwlabel.html

<bound method PolyCollection.get_paths of <matplotlib.collections.PolyCollection object

Is there a way to get at all the paths with matplotlib1.3.0?
I am using hexbin and create the following output: "hex31mm", which is a:
In [42]: type(hex31mm)
Out[42]: matplotlib.collections.PolyCollection
My aim is to use the method "get_paths" as is in "matplotlib 1.1.0" for the function linked below but with the newer version of "matplotlib 3.0.1"
Interestingly: "get_paths" under matplotlib 3.0.1, yields "802" distinct paths as below:
In [41]: len(hex31mm.get_paths())
Out[41]: 802
Yet "get_paths" under matplotlib 1.3.0, for this same object "hex31mm" yields only one path as below:
In[1] len(hex31mm.get_paths())
Out[1]: 1
Please check link below for more details, any help much appreciated!
NOTE:
I am sure the information for all paths are part of the object in both cases because the hexbin figure that plots up onto the screen is the same under both matplotlib versions, however I require the hexbin centres, hence my insistance of use on the "get_path" method for the linked function.
Sorry to sound repetitive but the function works fine in matplotlib1.1.0 but not under matplotlib1.3.0 and is supposed to return an array (n,2), and each element of that array is the centre (x,y) of n hexbins:
any hints, would be much appreciated...
I think in the newer versions of matplotlib the method: "get_offsets()" does the trick: "hex31mm.get_offsets()" returns the centres which is the output of the function ...

What is the corret syntax for using max function

Still using bloody OpenOffice Writer to customize my sale_order.rml report.
In my sale order I have 6 order lines with 6 different lead time to delivery. I need to show the maximum out of the six values.
After many attempt I have abandoned using the reduce function as it works erratically or not at all most of the time. I have never seen anything like this.
So I thought I'd give a try using max encapsulating a loop such as:
[[ max(repeatIn(so.order_line.delay,'d')) ]]
My maximum lead time being 20, I would expect to see 20 (yes well that would be too easy, wouldn't it!).
It returns
{'d': 20.0}
At least it contains the value I am after.
But; if I try and manipulate this result, it disappears altogether.
I have tried:
int(re.findall(r'[0-9]+', max(repeatIn(so.order_line.delay,'d')))[0])
which works great from the python window, but returns absolutely nothing in OpenERP.
I import the re from my sale_order.py file, which I have recompiled into sale_order.pyo:
import time
import re
from datetime import datetime, timedelta
from report import report_sxw
class order(report_sxw.rml_parse):
def __init__(self, cr, uid, name, context=None):
super(order, self).__init__(cr, uid, name, context=context)
self.localcontext.update({
'time': time,
'datetime': datetime,
'timedelta': timedelta,
're': re,
})
I have of course restarted the server many times. My test install sits on windows.
So can anyone tell me what I am doing wrong, because I can make it work from Python but not from OpenOffice Writer!
Thanks for your help!
EDIT 1:
The format
{'d': 20.0}
is, according to python, a dictionary. Still in Python, to extract the integer from a dictionary it is possible to do it like so:
>>> dict={'d': 20.0}
>>> print(dict['d'])
20.0
But how can I transpose this to OpenERP writer???
I have manage to get the result I wanted by importing functools and declaring the reduce function within the parameters of the sale_order.py file.
I then simply used a combination of reduce and max function and it works exactly as expected.
The correct syntax is as follow:
repeatIn(objects,'o')
reduce(lambda x, y: max(x, y.delay), o.order_line, 0)
Nothing else is required.
Enjoy!

Resources