I have a Dockerfile I made for a tensorflow object detection API pipeline, which is currently using intelpython3 and conda's tensorflow-gpu. I'm using it to run models like single-shot and faster r-cnn.
I'm curious if the hassle required to change the simple conda install tensorflow-gpu to add everything needed to build tensorflow from source into the Dockerfile would result in a worthwhile training speed increase. The server I'm currently using has: Intel Xeon E5-2687W v4 3.00 GHz, (4) Nvidia GTX 1080, 128GB RAM, SSD.
I don't need any benchmarks (unless they exist), but I really have no idea whatsoever what to expect performance-wise between the two, so any estimates would be greatly appreciated. Also if someone wouldn't mind explaining which parts of training would actually see optimizations versus using the conda tensorflow-gpu, that would be really awesome.
Related
I am working on a GPU server from my college with the computing capability less than 3.0, Windows 7 Professional, 64-bit operating system and 48GB RAM. I have tried to install tensorflow earlier but then I got to know that my GPU cannot support it.
I now want to work on keras but as tensorflow is not there so will it work or not as I am also not able to import it?
I have to do video processing and have to work on big video datasets for Dynamic Sign Language Recognition. Can anyone suggest me what I can do to get going in the field of Deep Learning with such GPU server? Or if I want to work on CPU only, then will there be any problem in this field of video processing?
I also have an HP Probook 440 G4 Laptop with Windows 10 Pro so is it better than the GPU server I have or not?
I am totally new to this field and cannot find a way to work properly in it.
Your opinions are needed right now!
The 'dxdiag' information for my laptop is shown and .
Thanks in advance!
For Keras to work you need either Tensorflow or Theano. Your Laptop seems to have a GeForce 930M GPU. This card has a compute capability or 5.0 according to the NVIDIA documentation (https://developer.nvidia.com/cuda-gpus). So you are better off with your Laptop if my research was right.
I guess you will use CNN with your video processing and therefore I would advise you to use a GPU. You can run your code also on a CPU but training will be much slower since GPUs are made for parallel computing and CPUs are not (the big matrix multiplication profit a lot from the parallel computing).
Maybe you could try a cloud computing provider if you think training is too slow on your laptop
I have recently upgraded from tf 1.4 to 1.5. The process went through smoothly and everything seems to work as before. But I noticed that the training performance has dropped significantly, typically from 30% to 130%. Training time for my models increased from about 1 hour to more than two hours. The GPU load also dropped about 50%.
I am wondering what caused this performance reduction? How can I fix the problem. My system configuration is: windows 7; x64; AMD CPU; GTX-1070/8GB, python 3.5.2.
I also noticed a significant decrease in performance with TF 1.5.0 on Windows 7 64bit, just tried it today.
Did upgrade to CUDA 9.0 and cuDNN 7.0. And I have Intel Xeon, Quadro K4000, Python 3.6.4
Will downgrade TF back to 1.4 and CUDA/cuDNN respectively, to make sure. If I find that it is faster again with the lower versions, I will start an Issue on TF github and reference this post.
EDIT :
I ended up testing all four of tensorflow 1.4.0, 1.5.0, -gpu 1.4.0, and -gpu 1.5.0 on a couple different networks I had been working on.
When I wrote my original response, the network I was working on was just a fairly simple RNN network. So I think it is known that GPU are actually lower performing than CPU for RNN networks! The hypothetical reason would be, and I think it makes sense, that RNN have far fewer components of the computations that can be parallelizeable. GPU are able to perform very fast because they contain a very high number of cores, which can compute in parallel. Indeed, when using OpenHardwareMonitor, the GPU core total load only got up to 60% peak on 1.4.0 and 52% peak on 1.5.0
So on that network, the computer's Xeon CPU actually does a pretty good job.
Interestingly, there still was a small slowdown going from 1.4.0 to 1.5.0, about 25% longer on the -gpu version and about 7% longer on the regular version.
But when I test on a different network that contained Convolutional operations, the GPU did perform significantly faster, AND 1.5.0 was faster than 1.4.0 in both -gpu and regular versions by around 10%.
So at the end of the day, I think it depends what type of network/operations you're working with, in deciding if -gpu version is best and if 1.4.0 or 1.5.0 is best.
I'm looking into performing object detection (not just classification) using CNNs; I currently only have access to Windows platforms but can install Linux distributions if necessary. I would like to assess a number of existing techniques, but most available code is for Linux.
I am aware of the following:
Faster RCNN (CNTK, Caffe w/ Matlab)
R-CNN (Matlab with loads of
toolboxes)
R-FCN (Caffe w/ Matlab)
From what I can see, there are no TensorFlow implementations for Windows currently available. Am I missing anything, or do I just need to install Ubuntu if I want to try more?
EDIT: A windows version of YOLO can be found here: https://github.com/AlexeyAB/darknet
There is a Tensorflow implementation for Windows but honestly it's always 1-2 steps behind Linux.
One thing I would add to your list (which was not already mentioned) is MaskRCNN
You can also look for some implementation of it for Tensorflow like this one: https://github.com/CharlesShang/FastMaskRCNN
I'm looking for CPU architecture, which is supported by GCC (and is still maintained) for which is easiest to implement software simulator.
It should be something simple, with flat memory model, 16bit+ address space, 16-32 bit ALU and good code dencity is prefered as for it will be running programs with program memory limitations.
Just few words about origin of those requirements. I need virtual CPU for running 'sandboxed' programs. That will be running on microcontrollers with ~5 KBytes RAM, ARM CPU ~20 MHz clock speed.
Performance is non an issue at all, what I really need is writing C/C++ programs and then running them in sandbox without stdlib. For writing programs GCC can help, just need implement vcpu for one of target architectures.
I've got acquainted with ARMv7-m, avr32 references and found them pretty accaptable but some more powerfull then I need. The less/simpler code I need to write for vcpu implementation, the sooner I will have what I need and less bugs will be there.
UPDATE:
Seems like I found what I need. Is was already answered here: What is the smallest, simplest CPU that gcc can compile for?
Thank you all.
I'm doing a VS2010 Installation on a VM Machine and 1 on a Physical PC.
VM Spec:
Xeon CPU 3.33 GHZ (duo core)
Windows 7 64 Bit
4 GB of Ram
Physical PC:
Duo Core CPU (speed unknown at this time)
Windows 7 64 Bit
6 GB of Ram
My Question is, what is the best way to run some sort of benchmark test with VS2010 to determine what has the best performance?
Thanks.
Real life benchmarking is easy to do: Take a project you are working on (or any project similar to it in structure), and measure a time of a full rebuild.
If the project does not take long enough for the results to be representative or interesting, then I say why would you care about the performance at all?
As project by different teams differ a lot (some use more templates, some more of complex and difficult to optimize expressions, some lots of small files, some lots of libraries ..., some C++, some C#), I doubt there could exist a "universal benchmark project" useful enough to you. Taking the real project your developers are working on is the most representative you can do.
If you want just to have some rough "order of magnitude" comparison, you can simply download some large enough open source project in the same language as you do. E.g. for C you might want to try something like OGG library source or LibPNG source.