YoloV5 way too less FPS, how can I fix it? - performance

Can you help me to improve my FPS with Yolov5s?
I am using yolov5s for Real-Time detection in a game, but I do have very less FPS (about 30-40) and sometimes only 0.
I am using it with my custom dataset:
model = torch.hub.load('ultralytics/yolov5', 'custom', path=r'C:\Users\stefa\Downloads\best2.pt') # local repo
And with mss to screen capture:
with mss.mss() as sct:
# Part of the screen to capture
monitor = {"top": 40, "left": 0, "width": 800, "height": 640}
I hope someone can help me to improve my FPS.
Or should I use Yolov5n, if yes, how do I use / download it?
Thank you!

There are several ways to improve your FPS rate:
The easier, the better. Try with smaller models such as YOLOv5s or YOLOv5n
What about using your GPU for inference? Add this line: --GPU. Example: $ python detect.py --source 0 --gpu #--source 0 = webcam, make sure you change it.
Reduce your field vision to only a small bounding box (try with 480x480) close to your weapon. Maybe you will need to resize your training set to meet this requirement.
Check this: https://www.youtube.com/watch?v=Yqwz4QGDNh0&ab_channel=SometimesRobots
Good luck!
PD. For further details about YOLOv5 usage, please visit the official website: https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data

Related

How identify numbers in Chrome Extension using Uipath?

I have this screenshot in a Chrome extension where numbers appear randomly.
When numbers are like 0.001, 0.006, 0.008, 0.001092 (format like 0.00XXXX) then go to actions that I programmed.
However when numbers are 0.02, 0.06, 0.4, 4.2 then it continues waiting and checking until next number have the first condition (0.00XXXX).
I’m a newbie with Uipath. Could you give me a image of how could be structured the flowchart in uipath workspace? How can I program that?
Not sure if I quite understand the question, but if you're wanting to create a workflow based on the result of the 'Double' you've obtained.
Then just create an If like below:

Tensorflow dequeue is very slow on Cloud ML

I am trying to run a CNN on the cloud (Google Cloud ML) because my laptop does not have a GPU card.
So I uploaded my data on Google Cloud Storage. A .csv file with 1500 entries, like so:
| label | img_path |
| label_1| /img_1.jpg |
| label_2| /img_2.jpg |
and the corresponding 1500 jpgs.
My input_fn looks like so:
def input_fn(filename,
batch_size,
num_epochs=None,
skip_header_lines=1,
shuffle=False):
filename_queue = tf.train.string_input_producer(filename, num_epochs=num_epochs)
reader = tf.TextLineReader(skip_header_lines=skip_header_lines)
_, row = reader.read(filename_queue)
row = parse_csv(row)
pt = row.pop(-1)
pth = filename.rpartition('/')[0] + pt
img = tf.image.decode_jpeg(tf.read_file(tf.squeeze(pth)), 1)
img = tf.to_float(img) / 255.
img = tf.reshape(img, [IMG_SIZE, IMG_SIZE, 1])
row = tf.concat(row, 0)
if shuffle:
return tf.train.shuffle_batch(
[img, row],
batch_size,
capacity=2000,
min_after_dequeue=2 * batch_size + 1,
num_threads=multiprocessing.cpu_count(),
)
else:
return tf.train.batch([img, row],
batch_size,
allow_smaller_final_batch=True,
num_threads=multiprocessing.cpu_count())
Here is what the full graph looks like (very simple CNN indeed):
Running the training with a batch size of 200, then most of the compute time on my laptop (on my laptop, the data is stored locally) is spent on the gradients node which is what I would expect. The batch node has a compute time of ~12ms.
When I run it on the cloud (scale-tier is BASIC), the batch node takes more than 20s. And the bottleneck seems to be coming from the QueueDequeueUpToV2 subnode according to tensorboard:
Anyone has any clue why this happens? I am pretty sure I am getting something wrong here, so I'd be happy to learn.
Few remarks:
-Changing between batch/shuffle_batch with different min_after_dequeue does not affect.
-When using BASIC_GPU, the batch node is also on the CPU which is normal according to what I read and it takes roughly 13s.
-Adding a time.sleep after queues are started to ensure no starvation also has no effect.
-Compute time is indeed linear in batch_size, so with a batch_size of 50, the compute time would be 4 times smaller than with a batch_size of 200.
Thanks for reading and would be happy to give more details if anyone needs.
Best,
Al
Update:
-Cloud ML instance and Buckets were not in the same region, making them in the same region improved result 4x.
-Creating a .tfrecords file made the batching take 70ms which seems to be acceptable. I used this blog post as a starting point to learn about it, I recommend it.
I hope this will help others to create a fast data input pipeline!
Try converting your images to tfrecord format and read them directly from graph. The way you are doing it, there is no possibility of caching and if your images are small, you are not taking advantage of the high sustained reads from cloud storage. Saving all your jpg images into a tfrecord file or small number of files will help.
Also, make sure your bucket is a single region bucket in a region that had gpus and that you are submitting to cloudml in that region.
I've got the similar problem before. I solved it by changing tf.train.batch() to tf.train.batch_join(). In my experiment, with 64 batch size and 4 GPUs, it took 22 mins by using tf.train.batch() whilst it only took 2 mins by using tf.train.batch_join().
In Tensorflow doc:
If you need more parallelism or shuffling of examples between files, use multiple reader instances using the tf.train.shuffle_batch_join
https://www.tensorflow.org/programmers_guide/reading_data

tensorflow code optimization strategy

Please excuse the broadness of this question. Maybe once I know more perhaps I can ask more specifically.
I have performance sensitive piece of tensorflow code. From the perspective of someone who knows little about gpu programming, I would like to know what guides or strategies would be a "good place to start" to optimizing my code. (single gpu)
Perhaps even a readout of how long was spent on each tensorflow op would be nice...
I have a vague understanding that
Some operations go faster when assigned to a cpu rather than a gpu, but it's not clear which
There is a piece of google software called "EEG" that I read about in a
paper that may one day be open sourced.
There may also be other common factors at play that I am not aware of..
I wanted to give a more complete answer about how to use the Timeline object to get the time of execution for each node in the graph:
you use a classic sess.run() but specifying arguments options and run_metadata
you then create a Timeline object with the run_metadata.step_stats data
Here is in example code:
import tensorflow as tf
from tensorflow.python.client import timeline
x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)
# Run the graph with full trace option
with tf.Session() as sess:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
sess.run(res, options=run_options, run_metadata=run_metadata)
# Create the Timeline object, and write it to a json
tl = timeline.Timeline(run_metadata.step_stats)
ctf = tl.generate_chrome_trace_format()
with open('timeline.json', 'w') as f:
f.write(ctf)
You can then open Google Chrome, go to the page chrome://tracing and load the timeline.json file.
You should something like:

Dust and scratch removal with open source graphic libraries

I am trying to automate the cleanup process of a large amount of scanned films. I have all the images in 48-bit RGBI TIFF files (RGB + Infrared), and I can use the infrared channel to create masks for dust removal. I wonder if there is any decent open source implementation of in-painting that I can use to achieve this (all the other software I use for batch processing are open source libraries I access through Ruby interfaces).
My first choice was ImageMagick, but I couldn't find any advanced in-painting option in it (maybe I am wrong, though). I have heard this can be done with MagickWand libraries, but I haven't been able to find a concrete example yet.
I have also had a look at OpenCV, but it seems that OpenCV's in-paint method accept only 8-bit-per-channel images, while I must preserve the 16.
Is there any other library, or even an interesting code snippet I am not aware of? Any help is appreciated.
Samples:
Full Picture
IR Channel
Dust and scratch mask
What I want to remove automatically
What I consider too large to remove with no user intervention
You can also download the original TIFF file here. It contains two alpha channels. One is the original IR channel, and the other one is the IR channel already prepared for dust removal.
I have had an attempt at this, and can go some way to achieving some of your objectives... I can read in your 16-bit image, detect the dust pixels using the IR channel data, and replace them, and write out the result without any alpha channel and all the while preserving your 16-bit data.
The part that is lacking is the replacement algorithm - I have just propagated the next pixel from above. You, or someone cleverer than me on Stack Overflow, may be able to implement a better algorithm but this may be a start.
It is in Perl, but I guess it could be readily converted to another language. Here is the code:
#!/usr/bin/perl
use strict;
use warnings;
use Image::Magick;
# Open the input image
my $image = Image::Magick->new;
$image->ReadImage("pa.tiff");
my $v=0;
# Get its width and height
my ($width,$height)=$image->Get('width','height');
# Create output image of matching size
my $out= $image->Clone();
# Remove alpha channel from output image
$out->Set(alpha=>'off');
# Load Red, Green, Blue and Alpha channels of input image into arrays, values normalised to 1.0
my (#R,#G,#B,#A);
for my $y (0..($height-1)){
my $j=0;
my #RGBA=$image->GetPixels(map=>'RGBA',height=>1,width=>$width,x=>0,y=>$y,normalize=>1);
for my $x (0..($width-1)){
$R[$x][$y]=$RGBA[$j++];
$G[$x][$y]=$RGBA[$j++];
$B[$x][$y]=$RGBA[$j++];
$A[$x][$y]=$RGBA[$j++];
}
}
# Now process image
my ($d,$r,$s,#colours);
for my $y (0..($height-1)){
for my $x (0..($width-1)){
# See if IR channel says this is dust, and if so, replace with pixel above
if($A[$x][$y]<0.01){
$colours[0]=$R[$x][$y-1];
$colours[1]=$G[$x][$y-1];
$colours[2]=$B[$x][$y-1];
$R[$x][$y]=$R[$x][$y-1];
$G[$x][$y]=$G[$x][$y-1];
$B[$x][$y]=$B[$x][$y-1];
$out->SetPixel(x=>$x,y=>$y,color=>\#colours);
}
}
}
$out->write(filename=>'out.tif',compression=>'lzw');
The result looks like this, but I had to make it a JPEG just to fit on SO:
I cannot comment, so I write an answer.
I suggest using G'Mic with the filter "inpaint".
You should load the image, take the IR image and convert it to b/w, then tell the filter inpaint to fill the areas marked in the IR image.
OpenCV has a good algorithm for image inpaiting, which is basically what you were searching for.
https://docs.opencv.org/3.3.1/df/d3d/tutorial_py_inpainting.html
If that will not help then only Neural Networks algorithms

Munin Graph - How To Set Max Upper Limit For mysql slowqueries & munin stats?

Wow, this is my very first post on stackoverflow! Been using results for years, but this is the first time I'm 100% stumped and decided to join!
I use Munin to monitor and graph stuff like CPU, Memory, Loads, etc. on my VPS.
Sometimes I get a huge statistical outlier data point that throws my graphs out of whack. I want to set the upper limit for these graphs to simply avoid having these outliers impact the rest of the data view.
After hours of digging and experimenting I was able to change the upper limit on Loads by doing the following:
cd /etc/munin/plugins
pico load
I changed: echo 'graph_args --base 1000 -l 0'
to: echo 'graph_args --base 1000 -l 0 -u 5 --rigid'
It worked perfectly!
Unfortunately I've tried everything to get munin stats processing time and mysql slowqueries to have an upper limit and can't figure it out!
Here is the line in mysql_slowqueries
echo 'graph_args --base 1000 -l 0'
... and for munin_stats
"graph_args --base 1000 -l 0\n",
I've tried every combo of -u and --upper-limit for both of those and nothing I do is impacting the display of the graph to show a max upper limit.
Any ideas on what I need to change those lines to so I can get a fixed upper limit max?
Thanks!
I highly encourage playing with the scripts, even though you run the risk of them being overwritten by an update. Just back them up and replace them if you think it's needed. If you have built or improved things, don't forget to share them with us on github: https://github.com/munin-monitoring/munin
When you set --upper-limit to 100 and your value is 110, your graph will run to 110. If you add --rigid, your graph scale will stay at 100, and the line will be clipped, which is what you wanted in this case.
Your mysql_slowqueries graph line should read something like (it puts a limit on 100):
echo 'graph_args --base 1000 -l 0 --upper-limit 100 --rigid'
Changing the scripts ist highly discouraged since with the next update they might be replaced by the package manager and ando your changes.
Munin gives you different ways to define limits on the settings. One the node itself as well as on the server.
You can find (sort of) an answer in the FAQ.
For me it worked really nicely to just create a file named /etc/munin/plugin-conf.d/load.conf with the following content:
[load]
env.load_warning 5
env.load_critical 10
Restart munin-node to apply the changes and on the next update of the graph you can see theat the "warning" and "critical" levels have been set by clocking on the load-graph in the overview (table below the graphs)

Resources