How to convert yolo annotations to coco format. Json? - image

I want to convert my labels in yolo format to coco format
I have tried
https://github.com/Taeyoung96/Yolo-to-COCO-format-converter
And
Pylabel
They all have a bugs.
I want to train on detectron 2 but it fails to load the dataset because of the wrong json file.
Thanks everybody

Could you try with this tool (disclaimer: I'm the author)? It is not (yet) a Python package so you need to downloads the repo first. This should ressemble something like:
from ObjectDetectionEval import *
from pathlib import Path
def main() -> None:
path = Path("/path/to/annotations/") # Where the .txt files are
names_file = Path("/path/to/classes.names")
save_file = Path("coco.json")
annotations = AnnotationSet.from_yolo(gts_path).map_labels(names)
# If you need to change the labels
# names = Annotation.parse_names_file(names_file)
# annotations.map_labels(names)
annotations.save_coco(save_file)
if __name__ == "__main__":
main()
If you need more control (coordinate format, images location and extension, etc.) you should use the more generic AnnotationSet.from_txt(). If it does not suit your needs you can easily implement your own parser using AnnotationSet.from_folder().

Related

How can I export layered drawings from drawio to create "animated" slides in beamer?

When preparing lectures, or conference presentations with beamer, I usually use layered drawings. Then for graphics included in consecutive slides ("frames" in beamer), I simply use different sets of layers.
For graphics created in IPE, I have created a dedicated expallviews.lua script.
Unfortunately, for graphics created with diagrams.net locally run as drawio-desktop, no such automated export of various layers exists. The only way is to manually select the visible layers in GUI and then export consecutive drawings to a set of PDF files.
Is there a more convenient method to solve that problem?
The described problem has been reported in issues 405 and 737 in the drawio-desktop repository.
After reviewing those issues, I have found a method based on automated (instead of a manual via GUI) changing the visibility of layers and exporting such drawings to the set of PDF files. The proposed method is described in the comment to the issue 405. It uses a simple Python script:
#!/usr/bin/python3
"""
This script modifies the visibility of layers in the XML
file with diagram generated by drawio.
It works around the problem of lack of a possibility to export
only the selected layers from the CLI version of drawio.
Written by Wojciech M. Zabolotny 6.10.2022
(wzab01<at>gmail.com or wojciech.zabolotny<at>pw.edu.pl)
The code is published under LGPL V2 license
"""
from lxml import etree as let
import xml.etree.ElementTree as et
import xml.parsers.expat as pe
from io import StringIO
import os
import sys
import shutil
import zlib
import argparse
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--layers", help="Selected layers, \"all\", comma separated list of integers or integer ranges like \"0-3,6,7\"", default="all")
PARSER.add_argument("--layer_prefix", help="Layer name prefix", default="Layer_")
PARSER.add_argument("--outfile", help="Output file", default="output.drawio")
PARSER.add_argument("--infile", help="Input file", default="input.drawio")
ARGS = PARSER.parse_args()
INFILENAME = ARGS.infile
OUTFILENAME = ARGS.outfile
# Find all elements with 'value' starting with the layer prefix.
# Return tuples with the element and the rest of 'value' after the prefix.
def find_layers(el_start):
res = []
for el in el_start:
val = el.get('value')
if val is not None:
if val.find(ARGS.layer_prefix) == 0:
# This is a layer element. Add it, and its name
# after the prefix to the list.
res.append((el,val[len(ARGS.layer_prefix):]))
continue
# If it is not a layer element, scan its children
res.extend(find_layers(el))
return res
# Analyse the list of visible layers, and create the list
# of layers that should be visible. Customize this part
# if you want a more sophisticate method for selection
# of layers.
# Now only "all", comma separated list of integers
# or ranges of integers are supported.
def build_visible_list(layers):
if layers == "all":
return layers
res = []
for lay in layers.split(','):
# Is it a range?
s = lay.find("-")
if s > 0:
# This is a range
first = int(lay[:s])
last = int(lay[(s+1):])
res.extend(range(first,last+1))
else:
res.append(int(lay))
return res
def is_visible(layer_tuple,visible_list):
if visible_list == "all":
return True
if int(layer_tuple[1]) in visible_list:
return True
try:
EL_ROOT = et.fromstring(open(INFILENAME,"r").read())
except et.ParseError as perr:
# Handle the parsing error
ROW, COL = perr.position
print(
"Parsing error "
+ str(perr.code)
+ "("
+ pe.ErrorString(perr.code)
+ ") in column "
+ str(COL)
+ " of the line "
+ str(ROW)
+ " of the file "
+ INFILENAME
)
sys.exit(1)
visible_list = build_visible_list(ARGS.layers)
layers = find_layers(EL_ROOT)
for layer_tuple in layers:
if is_visible(layer_tuple,visible_list):
print("set "+layer_tuple[1]+" to visible")
layer_tuple[0].attrib['visible']="1"
else:
print("set "+layer_tuple[1]+" to invisible")
layer_tuple[0].attrib['visible']="0"
# Now write the modified file
t=et.ElementTree(EL_ROOT)
with open(OUTFILENAME, 'w') as f:
t.write(f, encoding='unicode')
The maintained version of that script, together with a demonstration of its use is also available in my github repository.

How do I create a prefetch dataset from a folder of images?

I am trying to input a dataset from Kaggle into this notebook from the Tensorflow docs in order to train a CycleGAN model. My current approach is to download the folders into my notebook and loop through the paths of each image and use cv2.imread(path) to add the uint8 image data to a list. But this doesn't work and I know my current approach is wrong because the code provided by google requires a Prefetch dataset.
Here's my current code (excluding the opencv part)
import os
# specify the img directory path
art_path = "/content/abstract-art-gallery/Abstract_gallery/Abstract_gallery/"
land_path = "/content/landscape-pictures/"
def grab_path(folder, i_count=100):
res = []
for file in range(i_count):
if os.listdir(folder)[0].endswith(('.jpg', '.png', 'jpeg')):
img_path = folder + os.listdir(folder)[0]
res.append(img_path)
return res
art_path, land_path = grab_path(art_path), grab_path(land_path)
print(art_path)
print(land_path)
The error in the code comes here:
train_horses = train_horses.cache().map(
preprocess_image_train, num_parallel_calls=AUTOTUNE).shuffle(
BUFFER_SIZE).batch(BATCH_SIZE)
Is there a simpler approach to this problem?
import pathlib
import tensorflow as tf
import numpy as np
#tf.autograph.experimental.do_not_convert
def read_image(path):
image_string = tf.io.read_file(path)
image = DataUtils.decode_image(image_string,(image_size))
return image
AUTO = tf.data.experimental.AUTOTUNE
paths = np.array([x for x in pathlib.Path(IMAGE_PATHS_DIR).rglob('*.jpg')])
dataset = tf.data.Dataset.from_tensor_slices((paths.astype(str)))
dataset = dataset.map(self.read_image)
dataset = dataset.shuffle(2048)
dataset = dataset.prefetch(AUTOTUNE)

Multiple Securities Trading Algorithm

I am very new to Python and I am having trouble executing my algorithmic trading strategy on more than one security at a time. I am currently using these lines of code for the stocks:
data_p = pd.read_csv('AAPL_30m.csv', index_col = 0, parse_dates = True)
data_p.drop(columns = ['Adj Close'])
Does anyone know how I would go about properly adding more securities?
Since no data is provided, I can only give you a rough idea on how this can be done. Change directory to the folder with all your data series in csv files:
import pandas as pd
import os
os.chdir(r'C:\Users\username\Downloads\new')
files = os.listdir()
Assume the files in the folder is
['AAPL.csv',
'AMZN.csv',
'GOOG.csv']
Then start with an empty dictionary d and loop through all the files in the directory to read as pandas dataframe. Eventually combine all of them to one big dataframe (if you find it more useful)
d = {}
for f in files:
name = f.split('.')[0]
df = pd.read_csv(f)
....
*** Do your processing ***
....
d[name] = df.copy()
dff = pd.concat(d)
Since I do not know your format and your index, I assume you can do pd.concat(d), alternatively, you may also try out pd.DataFrame(d)

How to read the latest image in a folder using python?

I have to read the latest image in a folder using python. How can I do this?
Another similar way, with some pragmatic (non-foolproof) image validation added:
import os
def get_latest_image(dirpath, valid_extensions=('jpg','jpeg','png')):
"""
Get the latest image file in the given directory
"""
# get filepaths of all files and dirs in the given dir
valid_files = [os.path.join(dirpath, filename) for filename in os.listdir(dirpath)]
# filter out directories, no-extension, and wrong extension files
valid_files = [f for f in valid_files if '.' in f and \
f.rsplit('.',1)[-1] in valid_extensions and os.path.isfile(f)]
if not valid_files:
raise ValueError("No valid images in %s" % dirpath)
return max(valid_files, key=os.path.getmtime)
Walk over the filenames, get their modification time and keep track of the latest modification time you found:
import os
import glob
ts = 0
found = None
for file_name in glob.glob('/path/to/your/interesting/directory/*'):
fts = os.path.getmtime(file_name)
if fts > ts:
ts = fts
found = file_name
print(found)

use pandas to retrieve files over FTP

I'm just getting to grips with pandas (which is awesome) and what I need to do is read in compressed genomics type files from ftp sites into a pandas dataframe.
This is what I tried and got a ton of errors:
from pandas.io.parsers import *
chr1 = 'ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/chr_rpts/chr_1.txt.gz'
CHR1 = read_csv(chr1, sep='\t', compression = 'gzip', skiprows = 10)
print type(CHR1)
print CHR1.head(10)
Ideally I'd like to do something like this:
from pandas.io.data import *
AAPL = DataReader('AAPL', 'yahoo', start = '01/01/2006')
The interesting part of this question is how to stream a (gz) file from ftp, which is discussed here, where it's claimed that the following will work in Python 3.2 (but won't in 2.x, nor will it be backported), and on my system this is the case:
import urllib.request as ur
from gzip import GzipFile
req = ur.Request(chr1) # gz file on ftp (ensure startswith 'ftp://')
z_f = ur.urlopen(req)
# this line *may* work (but I haven't been able to confirm it)
# df = pd.read_csv(z_f, sep='\t', compression='gzip', skiprows=10)
# this works (*)
f = GzipFile(fileobj=z_f, mode="r")
df = pd.read_csv(f, sep='\t', skiprows=10)
(*) Here f is "file-like", in the sense that we can perform a readline (read it line-by-line), rather than having to download/open the entire file.
.
Note: I couldn't get the ftplib library to readline, it wasn't clear whether it ought to.

Resources