I am very new to Python and I am having trouble executing my algorithmic trading strategy on more than one security at a time. I am currently using these lines of code for the stocks:
data_p = pd.read_csv('AAPL_30m.csv', index_col = 0, parse_dates = True)
data_p.drop(columns = ['Adj Close'])
Does anyone know how I would go about properly adding more securities?
Since no data is provided, I can only give you a rough idea on how this can be done. Change directory to the folder with all your data series in csv files:
import pandas as pd
import os
os.chdir(r'C:\Users\username\Downloads\new')
files = os.listdir()
Assume the files in the folder is
['AAPL.csv',
'AMZN.csv',
'GOOG.csv']
Then start with an empty dictionary d and loop through all the files in the directory to read as pandas dataframe. Eventually combine all of them to one big dataframe (if you find it more useful)
d = {}
for f in files:
name = f.split('.')[0]
df = pd.read_csv(f)
....
*** Do your processing ***
....
d[name] = df.copy()
dff = pd.concat(d)
Since I do not know your format and your index, I assume you can do pd.concat(d), alternatively, you may also try out pd.DataFrame(d)
Related
I want to convert my labels in yolo format to coco format
I have tried
https://github.com/Taeyoung96/Yolo-to-COCO-format-converter
And
Pylabel
They all have a bugs.
I want to train on detectron 2 but it fails to load the dataset because of the wrong json file.
Thanks everybody
Could you try with this tool (disclaimer: I'm the author)? It is not (yet) a Python package so you need to downloads the repo first. This should ressemble something like:
from ObjectDetectionEval import *
from pathlib import Path
def main() -> None:
path = Path("/path/to/annotations/") # Where the .txt files are
names_file = Path("/path/to/classes.names")
save_file = Path("coco.json")
annotations = AnnotationSet.from_yolo(gts_path).map_labels(names)
# If you need to change the labels
# names = Annotation.parse_names_file(names_file)
# annotations.map_labels(names)
annotations.save_coco(save_file)
if __name__ == "__main__":
main()
If you need more control (coordinate format, images location and extension, etc.) you should use the more generic AnnotationSet.from_txt(). If it does not suit your needs you can easily implement your own parser using AnnotationSet.from_folder().
I am trying to input a dataset from Kaggle into this notebook from the Tensorflow docs in order to train a CycleGAN model. My current approach is to download the folders into my notebook and loop through the paths of each image and use cv2.imread(path) to add the uint8 image data to a list. But this doesn't work and I know my current approach is wrong because the code provided by google requires a Prefetch dataset.
Here's my current code (excluding the opencv part)
import os
# specify the img directory path
art_path = "/content/abstract-art-gallery/Abstract_gallery/Abstract_gallery/"
land_path = "/content/landscape-pictures/"
def grab_path(folder, i_count=100):
res = []
for file in range(i_count):
if os.listdir(folder)[0].endswith(('.jpg', '.png', 'jpeg')):
img_path = folder + os.listdir(folder)[0]
res.append(img_path)
return res
art_path, land_path = grab_path(art_path), grab_path(land_path)
print(art_path)
print(land_path)
The error in the code comes here:
train_horses = train_horses.cache().map(
preprocess_image_train, num_parallel_calls=AUTOTUNE).shuffle(
BUFFER_SIZE).batch(BATCH_SIZE)
Is there a simpler approach to this problem?
import pathlib
import tensorflow as tf
import numpy as np
#tf.autograph.experimental.do_not_convert
def read_image(path):
image_string = tf.io.read_file(path)
image = DataUtils.decode_image(image_string,(image_size))
return image
AUTO = tf.data.experimental.AUTOTUNE
paths = np.array([x for x in pathlib.Path(IMAGE_PATHS_DIR).rglob('*.jpg')])
dataset = tf.data.Dataset.from_tensor_slices((paths.astype(str)))
dataset = dataset.map(self.read_image)
dataset = dataset.shuffle(2048)
dataset = dataset.prefetch(AUTOTUNE)
Is it possible to read pdf/audio/video files(unstructured data) using Apache Spark?
For example, I have thousands of pdf invoices and I want to read data from those and perform some analytics on that. What steps must I do to process unstructured data?
Yes, it is. Use sparkContext.binaryFiles to load files in binary format and then use map to map value to some other format - for example, parse binary with Apache Tika or Apache POI.
Pseudocode:
val rawFile = sparkContext.binaryFiles(...
val ready = rawFile.map ( here parsing with other framework
What is important, parsing must be done with other framework like mentioned previously in my answer. Map will get InputStream as an argument
We had a scenario where we needed to use a custom decryption algorithm on the input files. We didn't want to rewrite that code in Scala or Python. Python-Spark code follows:
from pyspark import SparkContext, SparkConf, HiveContext, AccumulatorParam
def decryptUncompressAndParseFile(filePathAndContents):
'''each line of the file becomes an RDD record'''
global acc_errCount, acc_errLog
proc = subprocess.Popen(['custom_decrypt_program','--decrypt'],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(unzippedData, err) = proc.communicate(input=filePathAndContents[1])
if len(err) > 0: # problem reading the file
acc_errCount.add(1)
acc_errLog.add('Error: '+str(err)+' in file: '+filePathAndContents[0]+
', on host: '+ socket.gethostname()+' return code:'+str(returnCode))
return [] # this is okay with flatMap
records = list()
iterLines = iter(unzippedData.splitlines())
for line in iterLines:
#sys.stderr.write('Line: '+str(line)+'\n')
values = [x.strip() for x in line.split('|')]
...
records.append( (... extract data as appropriate from values into this tuple ...) )
return records
class StringAccumulator(AccumulatorParam):
''' custom accumulator to holds strings '''
def zero(self,initValue=""):
return initValue
def addInPlace(self,str1,str2):
return str1.strip()+'\n'+str2.strip()
def main():
...
global acc_errCount, acc_errLog
acc_errCount = sc.accumulator(0)
acc_errLog = sc.accumulator('',StringAccumulator())
binaryFileTup = sc.binaryFiles(args.inputDir)
# use flatMap instead of map, to handle corrupt files
linesRdd = binaryFileTup.flatMap(decryptUncompressAndParseFile, True)
df = sqlContext.createDataFrame(linesRdd, ourSchema())
df.registerTempTable("dataTable")
...
The custom string accumulator was very useful in identifying corrupt input files.
I need to import multiple images (10.000) in Matlab (2013b) from a subdirectory of the predefined directory of Matlab.
I don't know the exact names of the images.
I tried this:
file = dir('C:\Users\user\Documents\MATLAB\train');
NF = length(file);
for k = 1 : NF
img = imread(fullfile('C:\Users\user\Documents\MATLAB\train', file(k).name));
end
But it throws this error though I ran it with the Admin privileges:
Error using imread (line 347)
Can't open file "C:\Users\user\Documents\MATLAB\train\." for reading;
you may not have read permission.
The "dir" command returns the virtual directory elements "." (self directory) and ".." parent, as your error message shows.
A simple fix is to use a more specific dir call, based on your image types, perhaps:
file = dir('C:\Users\user\Documents\MATLAB\train\*.jpg');
Check the output of dir. The first two "files" are . and .., which is similar to the behaviour of the windows dir command.
file = dir('C:\Users\user\Documents\MATLAB\train');
NF = length(file);
for k = 3 : NF
img = imread(fullfile('C:\Users\user\Documents\MATLAB\train', file(k).name));
end
In R2013b you would have to do
file = dir('C:\Users\user\Documents\MATLAB\train\*.jpg');
If you have R2014b with the Computer Vision System Toolbox then you can use imageSet:
images = imageSet('C:\Users\user\Documents\MATLAB\train\');
This will create an object containing paths to all image files in the train directory, regardless of format. Then you can read the i-th image like this:
im = read(images, i);
I'm just getting to grips with pandas (which is awesome) and what I need to do is read in compressed genomics type files from ftp sites into a pandas dataframe.
This is what I tried and got a ton of errors:
from pandas.io.parsers import *
chr1 = 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/chr_rpts/chr_1.txt.gz'
CHR1 = read_csv(chr1, sep='\t', compression = 'gzip', skiprows = 10)
print type(CHR1)
print CHR1.head(10)
Ideally I'd like to do something like this:
from pandas.io.data import *
AAPL = DataReader('AAPL', 'yahoo', start = '01/01/2006')
The interesting part of this question is how to stream a (gz) file from ftp, which is discussed here, where it's claimed that the following will work in Python 3.2 (but won't in 2.x, nor will it be backported), and on my system this is the case:
import urllib.request as ur
from gzip import GzipFile
req = ur.Request(chr1) # gz file on ftp (ensure startswith 'ftp://')
z_f = ur.urlopen(req)
# this line *may* work (but I haven't been able to confirm it)
# df = pd.read_csv(z_f, sep='\t', compression='gzip', skiprows=10)
# this works (*)
f = GzipFile(fileobj=z_f, mode="r")
df = pd.read_csv(f, sep='\t', skiprows=10)
(*) Here f is "file-like", in the sense that we can perform a readline (read it line-by-line), rather than having to download/open the entire file.
.
Note: I couldn't get the ftplib library to readline, it wasn't clear whether it ought to.