seaborn FacetGrid empty - seaborn

A simple dataset, with one column named "measurement" with about 20 distinct values, and another one named "value".
g = sns.FacetGrid(data, col='measurement',col_wrap=4)
g.map(sns.displot,'value')
I get an error about the number of plots matplotlib.pyplot creates
RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
fig, axes = plt.subplots(nrow, ncol, **kwargs)
The 2 lines of code return me a long column with all the individual graphs and , at the end, an empty FacetGrid (snippet below). I have no idea why this is happening, anybody have a thought?
Thanks

When I tried,
g = sns.FacetGrid(data, col='measurement',col_wrap=4)
g.map(sns.displot,'value')
I also got same results, but changing to distplot or histplot gave me the desired results. If that's what you've wanted.

Related

Add lines between each cell in a Bivariate seaborn histplot blocks

I've been trying to add lines between each cell in a sns.histplot with two variables same than a sns.heatmap but I've failed every single time.
I've tried with linewidths argument since it is the way for doing that with a heatmap
penguins = sns.load_dataset("penguins")
sns.histplot(penguins, x="bill_depth_mm", y="body_mass_g", linewidths=1)
but nothing changes. I know I could aggregate the data first and then use a heatmap but I feel so dumb that I can do it in a single way. I'm using seaborn 0.11.2
Thanks in advance!
penguins = sns.load_dataset("penguins")
sns.histplot(
penguins, x="bill_depth_mm", y="body_mass_g", linewidths=1,
edgecolor="w" # <-- Here's what you're missing
)

gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars
convergence = np.fabs((bound - old_bound) / old_bound)
#dynamic topic model
def run_dtm(num_topics=18):
docs, years, titles = preprocessing(datasetType=2)
#resort document by years
Z = zip(years, docs)
Z = sorted(Z, reverse=False)
years_new, docs_new = zip(*Z)
#generate time slice
time_slice = Counter(years_new).values()
for year in Counter(years_new):
print year,' --- ',Counter(years_new)[year]
print '********* data set loaded ********'
dictionary = corpora.Dictionary(docs_new)
corpus = [dictionary.doc2bow(text) for text in docs_new]
print '********* train lda seq model ********'
ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=dictionary, time_slice=time_slice, num_topics=num_topics)
print '********* lda seq model done ********'
ldaseq.print_topics(time=1)
Hey guys, I'm using the dynamic topic models in gensim package for topic analysis, following this tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb, however I always got the same unexpected error. Can anyone give me some guidance? I'm really puzzled even thought I have tried some different dataset for generating corpus and dictionary.
The error is like this:
/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars
convergence = np.fabs((bound - old_bound) / old_bound)
The np.fabs error means it is encountering an error with NumPy. What NumPy and gensim versions are you using?
NumPy no longer supports Python 2.7, and Ldaseq was added to Gensim in 2016, so you might just not have a compatible version available. If you are recoding a Python 3+ tutorial to a 2.7 variant, you obviously understand a little bit about the version differences - try running it in a, say, 3.6.8 environment (you will have to upgrade sometime anyway, 2020 is the end of 2.7 support from Python itself). That might already help, I've gone through the tutorial and did not encounter this with my own data.
That being said, I have encountered the same error before when running LdaMulticore, and it was caused by an empty corpus.
Instead of running your code fully in a function, can you try to go through it line by line (or look at you DEBUG level log) and check whether your output has the expected properties: that, for example your corpus is not empty (or contains empty documents)?
If that happens, fix the preprocessing steps and try again - that at least helped me and helped with the same ldamodel error in the mailing list.
PS: not commenting because I lack the reputation, feel free to edit this.
This is the issue with the source code of ldaseqmodel.py itself.
For the latest gensim package(version 3.8.3) I am getting the same error at line 293:
ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars
convergence = np.fabs((bound - old_bound) / old_bound)
Now, if you go through the code you will see this:
enter image description here
You can see that here they divide the difference between bound and old_bound by the old_bound(which is also visible from the warning)
Now if you analyze further you will see that at line 263, the old_bound is initialized with zero and this is the main reason that you are getting this warning of divide by zero encountered.
enter image description here
For further information, I put a print statement at line 294:
print('bound = {}, old_bound = {}'.format(bound, old_bound))
The output I received is: enter image description here
So, in a single line you are getting this warning because of the source code of the package ldaseqmodel.py not because of any empty document. Although if you do not remove the empty documents from your corpus you will receive another warning. So I suggest if there are any empty documents in your corpus remove them and just ignore the above warning of division by zero.

D3 graph not updating on click

My line chart is not updating with new data once I click the black button and I'm not sure what I could possibly be doing wrong.
Block here:
Let's look at your NaN errors:
<path class="line" d="M0,324.19471776281716L0,NaNL155,NaNL155,270L310,270L310,353.84774728120146L465,353.84774728120146" transform="translate(78.1818,0)"></path>
Seems that we are missing two y values, we can see this if we split the path data into its x,y pairs:
M0,324.19471776281716
L0,NaN
L155,NaN
L155,270
L310,270
L310,353.84774728120146
L465,353.84774728120146
So, we need to check two things, one is the y scale, and the other is the data used in the y scale. The y scale looks ok, if it failed on one number it should fail on all numbers. Let's look at the csv data:
education,number
Bachelor's degree,2367
Degree in medicine, dentistry, veterinary medicine or optometry,5763
Earned doctorate,3862
Master's degree,1549
Here's our problem: we have a comma separated file type with lots of extra commas on the second row (not counting the column headers). We can see that is causing issues with the name of the second column in the alternate data: "Degree in medicine", the portion of the name beyond the comma is dropped. Let's entomb that data with quotations so that the commas won't count as delimiters:
education,number
Bachelor's degree,2367
"Degree in medicine, dentistry, veterinary medicine or optometry",5763
Earned doctorate,3862
Master's degree,1549
You're code in your update function is selecting #body when #body (id="body") doesn't seem to exist. Could you be meaning to use body instead to select the html body?

Importing images to prep for keras

I am trying to import a bunch of images and get them ready for keras. The goal here is to have an array of the following dimensions. (length, 160,329,3). As you can see my reshape function is commented out. The "print(images.shape) line returns (8037,). Not sure how to proceed to get the right array dimensions. For reference the 1st column in the csv file is a list of paths to the image in question. I have a function below that combines the path of the image inside the folder and the path to the folder.
When I run the commented out reshape function I get the following error. "ValueError: cannot reshape array of size 8037 into shape (8037,160,320,3)"
import csv
import cv2
f = open('/Users/username/Desktop/data/driving_log.csv')
csv_f = csv.reader(f)
m=[]
for row in csv_f:
n=(row)
m.append(n)
images=[]
for i in range(len(m)):
img=(m[i][1])
img=img.lstrip()
path='/Users/username/Desktop/data/'
img=path+img
image=cv2.imread(img)
images.append(image)
item_num = len(images)
images=np.array(images)
#images=np.array(images).reshape(item_num, 160, 320, 3)
print(images.shape) #returns (8037,)
Can you print the shape of an image before it is appended to images to verify it is what you expect? Even better would be adding an imshow in the loop to make sure you're loading the images you expect (only need to do for one or two). cv2.imread does not throw an error if there isn't an image at the file path you give it, so your array might be all None which would yield the exact behavior you've described.
If that is the problem, check the img variable and make sure it's pointing exactly where you want it to.
Turns out it was including the first line of the CSV file which was heading. After I sorted that out it ran great. It gave me the requested shape.
images=[]
for i in range(1,len(labels)):
img=(m[i][1])
img=img.lstrip()
path='/Users/user/Desktop/data/'
img=path+img
image=cv2.imread(img)
images.append(image)

What is the corret syntax for using max function

Still using bloody OpenOffice Writer to customize my sale_order.rml report.
In my sale order I have 6 order lines with 6 different lead time to delivery. I need to show the maximum out of the six values.
After many attempt I have abandoned using the reduce function as it works erratically or not at all most of the time. I have never seen anything like this.
So I thought I'd give a try using max encapsulating a loop such as:
[[ max(repeatIn(so.order_line.delay,'d')) ]]
My maximum lead time being 20, I would expect to see 20 (yes well that would be too easy, wouldn't it!).
It returns
{'d': 20.0}
At least it contains the value I am after.
But; if I try and manipulate this result, it disappears altogether.
I have tried:
int(re.findall(r'[0-9]+', max(repeatIn(so.order_line.delay,'d')))[0])
which works great from the python window, but returns absolutely nothing in OpenERP.
I import the re from my sale_order.py file, which I have recompiled into sale_order.pyo:
import time
import re
from datetime import datetime, timedelta
from report import report_sxw
class order(report_sxw.rml_parse):
def __init__(self, cr, uid, name, context=None):
super(order, self).__init__(cr, uid, name, context=context)
self.localcontext.update({
'time': time,
'datetime': datetime,
'timedelta': timedelta,
're': re,
})
I have of course restarted the server many times. My test install sits on windows.
So can anyone tell me what I am doing wrong, because I can make it work from Python but not from OpenOffice Writer!
Thanks for your help!
EDIT 1:
The format
{'d': 20.0}
is, according to python, a dictionary. Still in Python, to extract the integer from a dictionary it is possible to do it like so:
>>> dict={'d': 20.0}
>>> print(dict['d'])
20.0
But how can I transpose this to OpenERP writer???
I have manage to get the result I wanted by importing functools and declaring the reduce function within the parameters of the sale_order.py file.
I then simply used a combination of reduce and max function and it works exactly as expected.
The correct syntax is as follow:
repeatIn(objects,'o')
reduce(lambda x, y: max(x, y.delay), o.order_line, 0)
Nothing else is required.
Enjoy!

Resources