How can I turn this into multiple line plots? - seaborn

I assigned the x - axis as the number of days, 365, and my response variable for y-axis. However, upon using seaborn, it sets my x-axis as the whole number of rows I have.
Dataset
sns.lineplot(x="DATE", y="TMEAN", data=TS, hue="YEAR", style="YEAR") plt.show
Lineplot Result
I tried to use pivot and melt, but to no avail.

An easy way to fix your problem is to use the "day of the year" instead of the column "DATE". You can extract that as a separate column from the date like in this example:
data = {'DATE': pd.date_range(start='2013-01-01', end='2021-12-31'),
'YEAR': pd.date_range(start='2013-01-01', end='2021-12-31').year,
'DAY': pd.date_range(start='2013-01-01', end='2021-12-31').dayofyear,
'TMEAN': your_temp_data}
TS = pd.DataFrame(data=data)
sns.lineplot(data=TS, x='DAY', y='TMEAN', hue="YEAR")
plt.show()
Here, the "dayofyear" gives you your x-axis (being each day without the information of which year the temperature belongs to) and the coloring hue visualizes the year.

Related

How do I calculate the median values for a year from a 29 years dataset both on hourly basis?

From a longterm dataset on hourly basis, I want to have median values for each hour of 1 representative year. For example: The median value of the first hour from January the 1st for the representative year is calculated from January the 1st from every year in the dataset. The dataset is available here:https://github.com/sugarello/sugarello/blob/master/dfsolarbwdlz.csv
After trying rolling() and groupby(), I ended up creating new data frames by defining criteria for the index.
So far I tried:
import numpy as np
import pandas as pd
from datetime import datetime
import matplotlib.pyplot as plt
dfsolar = pd.read_csv('dfsolarbwdlz.csv', delimiter=';')
dfsolar['MESS_DATUM'] = pd.to_datetime(dfsolar['MESS_DATUM'], format='%Y%m%d%H')
dfsolar.set_index('MESS_DATUM')
dfsolar.index = dfsolar['MESS_DATUM']
dfsolarr = dfsolar.drop(columns=["MESS_DATUM"])
By defining criteria for month, day and hour I partially receive the data I am looking for. It is not practical though at all cause I have to repeat it 8760 times. For example only for the 13.th hour of January 1st:
dfsolarWI00 = dfsolarr[((dfsolarr.index.month == 1) & (dfsolarr.index.day == 1) & (dfsolarr.index.hour == 13))]
The output of my last attempt looks like: here
I assume one solution within sort_index()/sort(). However I wasnt able to set up an adequate searching algorithm.
Am I on the right track? What is an elegant solution for my problem?
After looking deeper into the conditions of the groupby-Method, I reordered as follows:
dfsolarrtest = dfsolarr.groupby([dfsolarr.index.month, dfsolarr.index.day, dfsolarr.index.hour]).median()
dfsolarrtest.plot(figsize=(80,40))
and produced the following plot:
If I am not mistaken, I found my solution by reordering the groupby-conditions based on the changing parts of the date of my given format. However:
I generated a dataset consisting of 8784 rows which definitely does not equal to 8760 hours. Also, single median values by:
median_example = dfsolarr[((dfsolar.index.month == 1) & (dfsolarr.index.hour == 16))]
median_example.median()
Are not equal to the exact same date from the calculated dataset with groupby.
Any help?

New Column or Measure for NAICS ID based on first two numbers

Use first two digits of Column to give a name to a new column.
I have a list of companies and their NAICS ID. I would like to filter these into a pie chart but I don't want the 90000 different names (just the general ex. Agriculture or Mining). I want to utilize the first two digits in for the column to identify its general name. I am trying to use the DAX expression Switch to get this started. Is there a filter to do this within PowerBI?
I haven't started yet since I am not sure if this is possible.
You could simply create a calculated column based off of the original NAICS code using the following:
FirstTwoDigitsOfNAICS :=
SWITCH (
TRUE (),
LEFT ( 'Table'[NAICSCode] ) = x, "Something",
LEFT ( 'Table'[NAICSCode] ) = y, "Something Else"
)
This DAX will simply pull the first two characters from the entire code.

Amchart error with the baseInterval set as month

I trying to use amchart setting
dateAxis.baseInterval = {
"timeUnit": "month",
"count": 1
}
But i have an error to show the data, when i have more than one day in the month with data, the graph show more than one bullet for the same month.
for example if I have the next data
2019-10-11 => 20
2019-10-12 => 30
in place to display
(2019-10) => 50
the graph show the next data
(2019-10) => 20,
(2019-10) => 30
Thanks in advance.
AmCharts v4 doesn't aggregate your data for you. baseInterval merely tells the chart how to render your data with the minimum intervals between your points. Setting it to month with multiple data points in the same month will display multiple points; this is as designed.
If you intend to display your data in monthly intervals and have some data points where more than one point is in the same month, you need to manually aggregate your data beforehand - in your case, convert that point to a single data item in October with a value of 50.

Displaying Max, Min, Avg across bar chart Tableau

I have a bar chart with X axis as discrete date value and Y axis as number of records.
eg: x axis (Filtered Date)- 1st Oct, 2nd Oct, 3rd Oct etc
y axis (Number of Records)- 30, 4, 3 etc
Now, I have to create a table to get Max, Min and Avg. Value of the 'Number of Record'.
I have written a Calculated Field as MAX([Number of Records]) to get the maximum of Number of Records in this case 30 but I always get a value of 1.
How do I define the values to get max, min and avg. ?
Thanks,
Number of Records is an automatically calculated field that tableau generates when importing a datasource. You can right click on it and see the definition of the calculation: 1.
As you currently have your field defined, tableau will look for the maximum value of the column. It will always be 1 because that is the only value in that field for every record.
It sounds like you are actually trying to calculate the maxiuum of the sum of the number of records for your aggregation level (in your case date). You should be able to easily accomplish this using Level of Detail (LOD) expressions, or table calculations. Something like the following:
WINDOW_MAX(SUM([Number of Records]))

How to show date in normal format in jqgrid when formatter is used

Date column is created using colmodel below.
This column shows values like 0101.0101.7070 for every date column.
if formatter:date is removed, date is correct.
How to show normal date values with formatter:date ?
{ "name":"Date",
"formatter":"date",
"datefmt":"dd.mm.yyyy",
"formatoptions":{"newformat":"dd.mm.yyyy"},
"editable":true
}
Update.
Data is passed from server using json in same dd.mm.yyyy format like
{"total":2,"page":1,"records":57,"rows":
[{"id":"9279","cell":["9279","42","","10.08.2011","","","","False"]},
{"id":"9278","cell":["9278","41","","12.08.2011","","","","False"]},
...
Using d.m.y formats in column options as suggested shows proper dates but with 2 year digits only..
I'm looking for a 4-digit year numbers. I tried d.m.yyyy format but this shows 8 digit year numbers and 1 for month and day as 01.01.70707070
I also tried to add srcformat: 'dd.mm.yyyy' to formatoptions but this does not change the wrong result.
To display the day as the number you should use j or d. For the displaying month as the number you should use n or m. The d and m includes 0 padding at the beginning if needed. The 'y' means two digit year format and Y means four digit year.
So you probably need srcformat: 'd.m.Y' or srcformat: 'j.n.Y'.
Use d.m.y instead of dd.mm.yy ,

Resources