How to draw bounding box in same aspect ratio with dlib imglab - dlib

I use dlib for my project. Basically it detects car in video stream. I use fhog_object_detector of dlib. When do training, it's hard to draw all object with same aspect ratio with dlib imglab tool. I have to draw object in all images, and manually change the object size in created xml file. Am I doing the right way, or it has a better way to do that work? I know dlib have just release a good CNN object detector for multi scale object detection, but because my computer doesn't have a GPU, so I can't use that.
Hope some one have a problem like me, and have found a solution.

Draw all the boxes as accurately around each object as you can. You want to have a well annotated dataset. That is useful in and of itself. Then, when you load the data for training, adjust the aspect ratios of the loaded boxes in your code before training. That is also a good time to look for outliers like really small boxes or boxes with extreme aspect ratios as you would probably want to exclude them.

Related

If Core Graphics uses Metal under the hood, can a Metal implementation run faster than a CG one? Why?

Let's say I want to develop a Paint app and need to implement a brush engine. For a raster brush, you basically need to stamp a texture on touch locations with a given spacing.
-- Task: Composite a small image (brush tip) over a bigger one.
I decided to build a prototype first in CG using a CGContext to render the stamps and found out it performed pretty well even with coalesced touches and a decent size canvas (CGContext output size).
However, since I need to paint onto really big textures (8000x6000 would be great), I decided to give metal a chance. I know that this task might be trivial for someone with a background in Metal but I'm new in this field. So I tried to use CIFilters (Metal backed) for compositing the brush over the canvas and displaying it in a custom MetalImageView: GTKView.
I thought having the canvas and the brush as CIImages and displaying them in a Metal Layer would already be more performant than the naive CG implementation. But it's not. The CIFilter approach renders the entire canvas every single stamp(at: Point), whether in CG I just refresh a small rect around that point.
Now, I think I could accomplish that with the CIFilter if I could change the extent that is computed. I don't know if that can be done with Core Image, but I'm sure in metal would be really easy for someone with experience.
-- Question: Can a pure metal implementation be faster stamping images than the CG one, given that CG runs with Metal under the hood? If so, how faster? Is it worth learning how to do it, or should I better spend that time improving the CG implementation?
Note that I'm asking for a raster brush, not a vector brush with Bezier Paths which is way easier to code and runs faster but textured brushes can't be used.
I really appreciate any help.
There is actually a chapter in the Core Image Programming Guide about that. They describe continuous painting into the same texture using the CIImageAccumulator class. You can also download the sample app.
I think performance-wise there shouldn't be a huge difference. You should be able to optimize heavily by telling Core Image the region of interest and domain of definition (extent) of your brush stroke filter. Then it should be able to render only the necessary parts of the image instead of the whole thing in every frame.

Is there a way to create simple animations "on the fly" in modern OpenGL?

I think this requires a bit of background information:
I have been modding Minecraft for a while now, but I alway wanted to make my own game, so I started digging into the freshly released LWJGL3 to actually get things done. Yes, I know it's a bit ow level and I should use an engine and stuff...indeed, I already tried some engines and they never quite match what I want to do, so I decided I want to tackle the problem at its root.
So far, I kind of understand how to render meshes, move the "camera", etc. and I'm willing to take the learning curve.
But the thing is, at some point all the tutorials start to explain how to load models and create skeletal animations and so on...but I think I do not really want to go that way. A lot of things in working with Minecraft code was awful, but I liked how I could create models and animations from Java code. Sure, it did not look super realistic, but since I'm not great with Blender either, I doubt having "classic" models and animations would help. Anyway, in that code, I could rotate a box around to make a creature look at a player, I could use a sinus function to move legs and arms (or wings, in my case) and that was working, since Minecraft used immediate mode and Java could directly tell the graphics card where to draw each vertex.
So, actual question(s): Is there any good way to make dynamic animations in modern (3.3+) OpenGL? My models would basically be a hierarchy of shapes (boxes or whatever) and I want to be able to rotate them on the fly. But I'm not sure how to organize that. Would I store all the translation/rotation-matrices for each sub-shape? Would that put a hard limit on the amount of sub-shapes a model could have? Did anyone try something like that?
Edit: For clarification, what I did looked something like this:
Create a model: https://github.com/TheOnlySilverClaw/Birdmod/blob/master/src/main/java/silverclaw/birds/client/model/ModelOstrich.java
The model is created as a bunch of boxes in the constructor, the render and setRotationAngles methods set scale and rotations.
You should follow one opengl tutorial in order to understand the basics.
Let me suggest "Learning Modern 3D Graphics Programming", and especially this chapter, where you move one robot arm with multiple joints.
I did a port in java using jogl here, but you can easily port it over lwjgl.
What you are looking for is exactly skeletal animation, the only difference being the fact you do not want to load animations for your bones but want to compute / generate transforms on the fly.
You basically have a hierarchy of bones, and geometry attached to it. It looks like you want to manipulate this geometry "rigidly", so before sending your meshes / transforms to the GPU (the classic way), you want to start by computing the new transforms in model or world space, then send those freshly computed matrices to draw your geometries on the gpu the standard way.
As Sorin said, to compute each transform you simply have to iterate over your hierarchy and accumulate transforms given the transform of the parent bone and your local transform w.r.t the parent.
Yes and no.
You can have your hierarchy of shapes and store a relative transform for each.
For example the "player" whould have a translation to 100,100, 10 (where the player is), and then the "head" subcomponent would have an additional translation of 0,0,5 (just a bit higher on the z axis).
You can store these as matrices (they can encode translation, roation and scaling) and use glPushMatrix and glPop matrix to add and remove a matrix to a stack maintained by openGL.
The draw() function(or whatever you call it) should look something like :
glPushMatrix();
glMultMatrix(my_transform); // You can also just have glTranslate, glRotate or anything else.
// Draw my mesh
for (child : children) { child.draw(); }
glPopMatrix();
This gives you a hierarchical setup so that objects move with their parent. Alternatively you can have a stack in the main memory and do the multiplications yourself (use a library). I think the openGL stack may have a limit (implementation dependent), but if you handle it yourself the only limit is the amount of ram you can use. Once all the matrices are multiplied rendering is done in the same amount of time, that is it doesn't matter for performance how deep a mesh is in the hierarchy.
For actual animations you need to compute the intermediate transformations. For example for a crouch animation you probably want to have a few frames in between so that the camera doesn't just jump to the low position. You can do this with a time based linear interpolation between the start and end positions, but this only covers simple animations and you still have to implement it yourself.
Anything more complicated (i.e. modify the mesh based on the bone links) you would need to implement yourself.

Augment reality like zookazam

What algorithms are used for augmented reality like zookazam ?
I think it analyze image and find planes by contrast, but i don't know how.
What topics should I read before starting with app like this?
[Prologue]
This is extremly broad topic and mostly off topic in it's current state. I reedited your question but to make your question answerable within the rules/possibilities of this site
You should specify more closely what your augmented reality:
should do
adding 2D/3D objects with known mesh ...
changing light conditions
adding/removing body parts/clothes/hairs ...
a good idea is to provide some example image (sketch) of input/output of what you want to achieve.
what input it has
video,static image, 2D,stereo,3D. For pure 2D input specify what conditions/markers/illumination/LASER patterns you have to help the reconstruction.
what will be in the input image? empty room, persons, specific objects etc.
specify target platform
many algorithms are limited to memory size/bandwidth, CPU power, special HW capabilities etc so it is a good idea to add tag for your platform. The OS and language is also a good idea to add.
[How augmented reality works]
acquire input image
if you are connecting to some device like camera you need to use its driver/framework or something to obtain the image or use some common API it supports. This task is OS dependent. My favorite way on Windows is to use VFW (video for windows) API.
I would start with some static file(s) from start instead to ease up the debug and incremental building process. (you do not need to wait for camera and stuff to happen on each build). And when your App is ready for live video then switch back to camera...
reconstruct the scene into 3D mesh
if you use 3D cameras like Kinect then this step is not necessary. Otherwise you need to distinguish the object by some segmentation process usually based on the edge detections or color homogenity.
The quality of the 3D mesh depends on what you want to achieve and what is your input. For example if you want realistic shadows and lighting then you need very good mesh. If the camera is fixed in some room you can predefine the mesh manually (hard code it) and compute just the objects in view. Also the objects detection/segmentation can be done very simply by substracting the empty room image from current view image so the pixels with big difference are the objects.
you can also use planes instead of real 3D mesh as you suggested in the OP but then you can forget about more realistic quality of effects like lighting,shadows,intersections... if you assume the objects are standing straight then you can use room metrics to obtain the distance from camera. see:
selection criteria for different projections
estimate measure of photographed things
For pure 2D input you can also use the illumination to estimate the 3D mesh see:
Turn any 2D image into 3D printable sculpture with code
render
Just render the scene back to some image/video/screen... with added/removed features. If you are not changing the light conditions too much you can also use the original image and render directly to it. Shadows can be achieved by darkening the pixels ... For better results with this the illumination/shadows/spots/etc. are usually filtered out from the original image and then added directly by rendering instead. see
White balance (Color Suppression) Formula?
Enhancing dynamic range and normalizing illumination
The rendering process itself is also platform dependent (unless you are doing it by low level graphics in memory). You can use things like GDI,DX,OpenGL,... see:
Graphics rendering
You also need camera parameters for rendering like:
Transformation of 3D objects related to vanishing points and horizon line
[Basic topics to google/read]
2D
DIP digital image processing
Image Segmentation
3D
Vector math
Homogenous coordinates
3D scene reconstruction
3D graphics
normal shading
paltform dependent
image acquisition
rendering

Image analysis - how can I extract image label out of bottle

I am thinking of using OpenCV library for image analysis. Basically I want to automate in my project the extraction of image label from wine bottle.
This is the sample input image:
This is the sample output:
I am thinking what should be my general strategy to extract the image. I am not asking for direct code. Just want to know the general approach to solve the problem.
Thanks!
Sorry for vage answer but in applied computer vision is no such thing like general approach.
some will disagree of course but in reality
all CV applications are custom made for some specific purpose/task
in your case is the idea to find cylindric and probably standing object (bottle)
and then finding of irregular parts in it
I would do it like this:
1.remove noise as much as possible (smooth/sharpen filters)
2.(optionaly) reduce image data (via (i)FT or (i)DCT for example)
3.segmentate objects (usually by homogenity of color or by edge detection or by booth)
4.identify bottle object (by color,shape,or illumination (glass is transparent))
5.identify objects inside bottle (homogenity,not transparent,usually sharp edges,color is not good some labels are black on dark glass)
6.(optional) project label back from cylindric space to flat texture
[notes]
create app with many scrollbars and checkboxes
to be able to change all tresholds and enable disable filters or their order on the run
all parts will take a lot of tweaking of tresholds and weights
you have to do a lot of trial and error runs to find the best filters and their config for your task

Transform a set of 2d images representing all dimensions of an object into a 3d model

Given a set of 2d images that cover all dimensions of an object (e.g. a car and its roof/sides/front/read), how could I transform this into a 3d objdct?
Is there any libraries that could do this?
Thanks
These "2D images" are usually called "textures". You probably want a 3D library which allows you to specify a 3D model with bitmap textures. The library would depend on platform you are using, but start with looking at OpenGL!
OpenGL for PHP
OpenGL for Java
... etc.
I've heard of the program "Poser" doing this using heuristics for human forms, but otherwise I don't believe this is actually theoretically possible. You are asking to construct volumetric data from flat data (inferring the third dimension.)
I think you'd have to make a ton of assumptions about your geometry, and even then, you'd only really have a shell of the object. If you did this well, you'd have a contiguous surface representing the boundary of the object - not a volumetric object itself.
What you can do, like Tomas suggested, is slap these 2d images onto something. However, you still will need to construct a triangle mesh surface, and actually do all the modeling, for this to present a 3D surface.
I hope this helps.
What there is currently that can do anything close to what you are asking for automagically is extremely proprietary. No libraries, but there are some products.
This core issue is matching corresponding points in the images and being able to say, this spot in image A is this spot in image B, and they both match this spot in image C, etc.
There are three ways to go about this, manually matching (you have the photos and have to use your own brain to find the corresponding points), coded targets, and texture matching.
PhotoModeller, www.photomodeller.com, $1,145.00US, supports manual matching and coded targets. You print out a bunch of images, attach them to your object, shoot your photos, and the software finds the targets in each picture and creates a 3D object based on those points.
PhotoModeller Scanner, $2,595.00US, adds texture matching. Tiny bits of the the images are compared to see if they represent the same source area.
Both PhotoModeller products depend on shooting the images with a calibrated camera where you use a consistent focal length for every shot and you got through a calibration process to map the lens distortion of the camera.
If you can do manual matching, the Match Photo feature of Google SketchUp may do the job, and SketchUp is free. If you can shoot new photos, you can add your own targets like colored sticker dots to the object to help you generate contours.
If your images are drawings, like profile, plan view, etc. PhotoModeller will not help you, but SketchUp may be just the tool you need. You will have to build up each part manually because you will have to supply the intelligence to recognize which lines and points correspond from drawing to drawing.
I hope this helps.

Resources