Using for loop to write values to Excel - Python - for-loop

I have a dataset, and I am trying to use a for loop to scan through all columns and find the max values in each column. Then I am trying to write those max values along with their indices to an excel sheet. When I print the output, I get the values printed, but I cannot write any values to excel sheet. Please find attached my code with the dataset.
import pandas as pd
import numpy as np
import os
import pandas as pd
import numpy as np
import random as rd
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
import plotly.graph_objects as go
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
from scipy.signal import find_peaks, peak_prominences
path = os.path.join('c:' + os.sep, 'Users', 'nsin0029', 'stitching_and_switching', 'eb_iwp_lane_1_radargram.csv')
radargram = pd.read_csv(path).T
print(radargram)
import pandas as pd
for column in radargram:
all_traces = (radargram[column])
maxValues = [all_traces.max()]
index_of_max= [(all_traces.idxmax())]
peakval = pd.DataFrame([maxValues],[index_of_max])
print(index_of_max,maxValues)
writer = pd.ExcelWriter('Book2.xlsx', engine='xlsxwriter')
peakval.to_excel(writer, sheet_name='Sheet 1',index=True)
writer.save()
dataset:
0 1 2 3 4 5 6 7
0 1.0 2.00 3.0 4.00 5.0 6.00 7.0
0.1 0.0 0.05 0.1 0.15 0.2 0.25 0.3
0.117 0.0 -6668751.00 1838.0 -5417580.00 -5570270.0 -5715139.00 -5827291.0
0.156 0.0 -6482365.00 1838.0 -5173158.00 -5293488.0 -5415709.00 -5440243.0
0.195 0.0 -6244235.00 1838.0 -4941067.00 -5016929.0 -5113173.00 -5051908.0
... ... ... ... ... ... ... ...
4.844 0.0 2919134.00 1838.0 297721.00 132719.0 -12214.00 -216.0
4.883 0.0 3181017.00 1838.0 419417.00 210137.0 42928.00 30737.0
4.922 0.0 3322660.00 1838.0 532466.00 288072.0 95415.00 64886.0
4.961 0.0 3342207.00 1838.0 636585.00 356015.0 141400.00 95323.0
5 0.0 3257029.00 1838.0 721372.00 419059.0 185229.00 121306.0
any help would be much appreciated

Related

from the yfinance library how can I read the ex-dividend date?

This code should return the ex-dividend date:
import yfinance as yf
yf.Ticker('AGNC').info['exDividendDate']
but I get this as an output:
1661817600
I am wondering if there is a way to get the date from that number ?
It looks like this number is obtained based on seconds. In order to get the real date, you can use pd.to_datetime to convert the seconds to calendar date.
import pandas as pd
pd.to_datetime(1661817600, unit='s')
Out[6]: Timestamp('2022-08-30 00:00:00')
or you can use the built-in datetime package in Python.
from datetime import datetime
print(datetime.fromtimestamp(1661817600))
2022-08-30 08:00:00

saving dataframe and corresponding chart in a single pdf file in python matplotlib

I have a dataframe:
id1 id2 fields valid invalid missing
0 1001.0 0.0 State 158.0 0.0 0.0
1 1001.0 0.0 Zip 156.0 0.0 2.0
2 1001.0 0.0 Race 128.0 20.0 10.0
3 1001.0 0.0 LastName 158.0 0.0 0.0
4 1001.0 0.0 Email 54.0 0.0 104.0
... ... ... ... ... ... ...
28859 5276.0 36922.0 Phone 0.0 0.0 8.0
28860 5276.0 36922.0 Email 1.0 0.0 7.0
28861 5276.0 36922.0 State 8.0 0.0 0.0
28862 5276.0 36922.0 office ID 8.0 0.0 0.0
28863 5276.0 36922.0 StreetAdd 8.0 0.0 0.0
with an initial goal of grouping into individual id and create a pdf file. I was able to create a pdf file from the plot I created but I would like to save the dataframe that goes with the graph in the same pdf file.
# read the csv file
cme_df = pd.read_csv('sample.csv')
# fill na with 0
cme_df = cme_df.fillna(0)
# iterate through the unique id2 in the file
for i in cme_df['id2'].unique():
with PdfPages('home/'+'id2_'+str(i)+'.pdf') as pdf:
cme_i = cme_df[cme_df['id2'] == i].sort_values('fields')
print(cme_i)
# I feel this is where I must have something to create or save the table into pdf with the graph created below #
# create the barh graph
plt.barh(cme_i['fields'],cme_i['valid'], color = 'g', label='valid')
plt.barh(cme_i['fields'],cme_i['missing'], left = cme_i['valid'],color='y',label='missing')
plt.barh(cme_i['fields'],cme_i['invalid'],left = cme_i['valid']+cme_i['missing'], color='r',label='invalid')
plt.legend(bbox_to_anchor=(0.5, -0.05), loc='upper center', shadow=True, ncol=3)
plt.suptitle('valid, invalid, missing', fontweight='bold')
plt.title('id2: '+ str(i))
pdf.savefig()
plt.clf()
my code above prints the table in the results window, then goes to creating the horizontal bar. the last few lines save the graph into pdf. I would like save both the dataframe and the graph in a single file.
In some searches, it suggested to convert to html then to pdf, I cannot seem to make it work.
cme_i.to_html('id2_'+str(i)+'.html')
# then convert to pdf
pdf.from_file(xxxxx)

How do I combine these two line plots together using seaborn?

I want to combine these two plots together into one. What is the recommended approach to do this? Is there a way to do it using a single dataframe?
sns.relplot(x="epoch", y="loss", kind="line", color='orange', ci="sd", data=losses_df);
sns.relplot(x="epoch", y="loss", kind="line", color='red', ci="sd", data=val_losses_df);
My data for first is the following, with columns in this order ['epoch'], ['loss']
losses_df
0.0 0.156077
0.0 0.013558
0.0 0.007013
1.0 0.029891
1.0 0.008320
1.0 0.003487
2.0 0.017474
2.0 0.006232
2.0 0.002457
3.0 0.013332
3.0 0.004897
3.0 0.001900
4.0 0.010947
4.0 0.003905
4.0 0.001594
5.0 0.009127
5.0 0.003195
5.0 0.001341
6.0 0.007751
6.0 0.002681
6.0 0.001157
7.0 0.006605
7.0 0.002218
7.0 0.000972
8.0 0.005630
8.0 0.001867
8.0 0.000832
9.0 0.004839
9.0 0.001671
9.0 0.000748
val_losses_df
0.0 0.048945
0.0 0.006090
0.0 0.002332
1.0 0.024670
1.0 0.006243
1.0 0.002337
2.0 0.022344
2.0 0.006609
2.0 0.002626
3.0 0.022037
3.0 0.007156
3.0 0.003080
4.0 0.022025
4.0 0.008209
4.0 0.003835
5.0 0.022751
5.0 0.009226
5.0 0.004209
6.0 0.024093
6.0 0.009950
6.0 0.004783
7.0 0.025410
7.0 0.011130
7.0 0.005279
8.0 0.028299
8.0 0.012204
8.0 0.005969
9.0 0.028623
9.0 0.013037
9.0 0.006519
And my plots so far (except I want them combined in one plot with a legend)
You could combine the two dataframes to one:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# losses_df = pd.read_csv(...)
# val_losses_df = pd.read_csv(...)
losses_df['type'] = 'losses'
val_losses_df['type'] = 'val_losses'
combined_df = pd.concat([losses_df, val_losses_df])
sns.relplot(x="epoch", y="loss", kind="line", hue="type", palette=['orange', 'red'], ci="sd", data=combined_df)
plt.tight_layout()
plt.show()
relplot() is a Figure level function that creates a new Figure at each call.
Use lineplot() instead.
fig, ax = plt.subplots()
sns.lineplot(x="epoch", y="loss", color='orange', ci="sd", data=losses_df, ax=ax);
sns.lineplot(x="epoch", y="loss", color='red', ci="sd", data=val_losses_df, ax=ax);

Looking for a way to speed up the write to file portion of my Python code

I have a simple code that reads in a data file ~2Gb, extracts the columns of data that I need and then writes that data as columns to another file for later processing. I ran the code last night and it took close to nine hours to complete. I ran the two sections separately and have determined that the portion that writes the data to a new file is the problem. I would like to ask if anyone can point out why it is so slow the way I have written it as well as suggestions on a better method.
sample of data being read in
26980300000000 26980300000000 39 13456502685696 1543 0
26980300000001 26980300000000 38 13282082553856 1523 0.01
26980300000002 26980300000000 37 13465223692288 1544 0.03
26980300000003 26980300000000 36 13290803560448 1524 0.05
26980300000004 26980300000000 35 9514610851840 1091 0.06
26980300000005 26980300000000 34 9575657897984 1098 0.08
26980300000006 26980300000000 33 8494254129152 974 0.1
26980300000007 26980300000000 32 8520417148928 977 0.12
26980300000008 26980300000000 31 8302391459840 952 0.14
26980300000009 26980300000000 30 8232623931392 944 0.16
Code
F = r'C:\Users\mass_red.csv'
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w = open(r'C:\Users\Outfiles\acc1_out3.txt','a')
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
w.close()
return()
N = 47000000
f = open(F)
f.readline()
nlines = islice(f, N)
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ')
columns = line.split()
tid = int(columns[1])
m = float(columns[3])
r = float(columns[5])
filesave(tid,m,r)
You open and close the file for each line. Open it once at the beginning.
In modern Python, most file use should be done with with statements. Open is easily seen to be done once in the header, and close is automatic. Here is a general template for line processing.
inp = r'C:\Users\mass_red.csv'
out = r'C:\Users\Outfiles\acc1_out3.txt'
with open(inp) as fi, open(out, 'a') as fo:
for line in fi:
...
if keep:
...
fo.write(whatever)
Here's a simplified but complete version of your code:
#!/usr/bin/env python
from __future__ import print_function
from itertools import islice
nlines_limit = 47000000
with open(r'C:\Users\mass_red.csv') as input_file, \
open(r'C:\Users\Outfiles\acc1_out3.txt', 'w') as output_file:
next(input_file) # skip line
for line in islice(input_file, nlines_limit):
columns = line.split()
try:
tid = int(columns[1])
m = float(columns[3])
r = float(columns[5])
except (ValueError, IndexError):
pass # skip invalid lines
else:
print(tid, m, r, sep='\t', file=output_file)
I don't see commas in your input; so I've removed line.replace(',', ' ') from the code.

python pandas index by time

I have a csv file that looks like this:
"06/09/2013 14:08:34.930","7.2680542849633447","1.6151231744362988","0","0","21","1546964992","15.772567829158248","1577332736","8360","21.400382061280961","0","15","0","685","0","0","0","0","0","0","0","4637","0"
the csv includes 1 month daily values (24 hrs)
I have a need to load it to pandas and then get some stats on data (min, max) but I need the data to include data records for all days only working hours (between 8:00 to 18:00)
I am very new to pandas library
Load your data:
import pandas as pd
from datetime import datetime
df = pd.read_csv('data.csv', header=None, index_col=0)
Filter your data for working hours from 8:00 to 18:00:
work_hours = lambda d: datetime.strptime(d, '%d/%m/%Y %H:%M:%S.%f').hour in range(8, 18)
df = df[map(work_hours, df.index)]
Get the min and max of the first data column:
min, max = df[1].min(), df[1].max()

Resources