Seaborn Plot Cluster Characteristics Without Group Column - seaborn

I've got a pandas DataFrame which has ~20 columns (features) and have run a clustering algorithm on them. I added the cluster assignment to the DataFrame and called it group. I would like to print the means of all the features by group using Seaborn, for example, using a barchart. I have tried the following:
import seaborn as sns
data = sns.load_dataset("penguins")
data["group"] = data.index % 10
sns.catplot(col="group", kind="bar", ci=None, data=data, col_wrap=5)
However, I do not want the group column included in this barchart (last column). How can I get rid of this group column?

Related

DAX equivlent for Countif in excel

=COUNTIF(J11:AI11,">0")
column headings are Wk1,..,W26
I want to count the number of cells that are greater than 0
I see lot of example of counting within a column, I need to count instances in a row(s) and create a new column.
what about unpivoting those columns in Power Query, then you can apply COUNTX function?
Check this site: https://support.microsoft.com/en-us/office/unpivot-columns-power-query-0f7bad4b-9ea1-49c1-9d95-f588221c7098
After unpivoting, the solution would be
COUNTX(
FILTER(table,table[Values]>0);table[Values])
Or you can run Pandas script in Power Query and run a count across rows, see:
https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-python-in-query-editor
Thread regarding to row-wise operations in Pandas:
What does axis in pandas mean?

PowerBI groupby with filters

My company has tasked with slicing the information on turnover and to create different graphs.
My source data looks like this: Relevant columns are: Voluntary/Involuntary, Termination Reason, Country, Production, and TermDateKey
I am trying to get counts using different filters on the data. I managed to get the basic monthly total using the formula:
Term Month Count = GROUPBY('Turnover Source','Turnover Source'[TermDateKey],"Turnover Total Count", COUNTX(CURRENTGROUP(),'Turnover Source'[TermDateKey]))
This gave me a new sheet with the counts for each month.
Table that shows TermDateKey on Column 1, and Counts on column 2
I am trying to add onto this table by adding counts but using different filters.
For example, I am trying to add another column that gives me the monthly count but filtered for 'Turnover Source'[Voluntary/Involuntary]=="Voluntary". Then another column for 'Turnover Source'[Voluntary/Involuntary]=="Involuntary" and so on. I have not found anywhere that shows me how to do this and when I add in the FILTER function it says that GROUPBY(...) can only work on CURRENTGROUP().
Can some one point me to a resource that will give me the solution I need? I am at a loss, thank you all.
It looks like you may not be aware that you don't have to calculate all possible groupings with DAX formulas.
The very nature of Power BI is that you use a column like "Termination Reason" on an X axis or in the legend of a visual. Any measure that you have created on values of another column, for e.g. a count of all rows, will then automatically be calculated to be grouped by the values in "Termination Reason", giving you a count of each of the values in the column.
You do NOT need DAX functions to calculate the grouping values for each measure for each column value combination.
Here is some simple sample data that has been grouped into dates and colours, one chart showing a count of each colour and one chart showing a sum of the Value column. No DAX was written for that.
If your scenario is different, please explain.

Python/Pandas - merging one to many csv for denormalization

I have a bunch of large csv files that were extracted out of a relational database. So for example I have customers.csv , address.csv and customer-address.csv that maps the key values for the relationships. I found an answer on how to merge the files here :
Python/Panda - merge csv according to join table/csv
So right now my code looks like this:
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
df3 = pd.read_csv(file3)
df = (df3.merge(df1, left_on='CID', right_on='ID')
.merge(df2, left_on='AID', right_on='ID', suffixes=('','_'))
.drop(['CID','AID','ID_'], axis=1))
print (df)
Now I noticed that I have files with a one to many relationship and with the code above pandas is probably overriding values when there are multiple matches for one key.
Is there a method to join files with a one to many (many to many) relationship? I'm thinking of creating a full (redundant) row for each foreign key. So basically denormalization.
The answer to my question is to perform an outer join. With the code below pandas creates a new row for every occurence of one of the id's in the left or right dataframe thus creating a denormalized table.
df1.merge(df2, left_on='CID', right_on='ID', how='outer')

Importing shapefile data using GeoDataFrame

I am using GeoDataFrame for the data importing. But have the following problems. Actually this function works well for some shapefiles, but does not work so well for some specific shapefiles and I am wondering why
data = GeoDataFrame.from_file('bayarea_general.shp')
fiona/ogrext.pyx in fiona.ogrext.Iterator.__next__ (fiona/ogrext.c:17244)()
fiona/ogrext.pyx in fiona.ogrext.FeatureBuilder.build (fiona/ogrext.c:3254)()
IndexError: list index out of range

Cross table with two datasets (one as the row and the other as the column)

I have two datasets in my birt report :
Lesson (date)
Student (name)
and I would like to know how to create a cross table using the date (red) as the column names and name (blue) as the row names as shown below :
The cells will stay empty.
I have try to use the Cross Tab but it seems that I can only use one dataset.
For information I am stuck with the version 2.5.2. I say this in case someone writes about a practical functionality available in the later version of birt... :-)
Where both datasets are coming from the same relational data source, the simplest way to achieve this would normally be:
Replace the existing two datasets with a single dataset, in which the two original datasets are cross-joined to each other;
create a crosstab from the new dataset, with the new dataset columns as the data cube groups.

Resources