I'm having a problem with ffmpeg video encoding using GPU (CUDA).
I have 2x nVidia GTX 1050 Ti
The problem comes when i try to do multiple parallel encodings. More than 2 processes and ffmpeg dies like this:
[h264_nvenc # 0xcc1cc0] OpenEncodeSessionEx failed: out of memory (10)
The problem is nvidia-smi shows there are a lot of resources available on the gpu:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.66 Driver Version: 384.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:41:00.0 Off | N/A |
| 40% 37C P0 42W / 75W | 177MiB / 4038MiB | 30% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 105... Off | 00000000:42:00.0 Off | N/A |
| 40% 21C P8 35W / 75W | 10MiB / 4038MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
the second GPU doesn't seem to be used at all, and there's more than enough memory left on the first one, to support the 3rd file.
Any ideas would be extremely helpful!
Actually your card is 'non-qualified' (in terms of NVIDIA) and supports only 2 simultaneous sessions. You could consult with https://developer.nvidia.com/video-encode-decode-gpu-support-matrix#Encoder or download NVENC SDK, which contains pdf with license terms for qualified and non-qualified GPUs. There are some patches for drivers which disables session count checking, you could try them https://github.com/keylase/nvidia-patch
Since there's no codes about how you apply the encoding context, I can't tell why the second gpu is not used. Have you specified using it in av_opt_set() or command line argument?
The more important problem here is geforce cards cannot own more than 2 encoding sessions in one system. If you need more, you have to use those expensive ones like quadro, tesla etc.
Related
For my first pentesting certification exam I have to prepare the virtual lab in order to locally analyze a vulnerable binary and build a BOF-exploit, which I then have to use against a remote target machine. As far as I know I will not have any access on the target host except the vulnerable service. So it wont be possible to analyze the program on the target machine, as in the labs and the BOF exam prep course of tryhackme. I will have to setup an own local target machine, run the binary there, analyze it, prepare the exploit and run it against the remote target machine.
Now I am facing multiple problems while setting up my local virtual test environment.
I installed both, a Windows 7 32-Bit and a Windows 10 32-Bit virtual machine. On both machines I installed Python 2.7.1, Immunity Debugger and mona.py. On Windows 7 there was no Defender running, on Windows 10 I disabled Defender Real-Time-Protection.
Afterwards I uploaded the binary to both machines and went through the standard process of building an OSCP-level stack-based BOF-exploit:
Crashing the program with a fuzzer
Identify the offset to the return address
Identify bad characters
Next, I wanted to uso mona.py to find a JMP ESP instruction (or something similar) as I always did in the labs. Now the problems started. mona.py returned 0 pointers when I entered the following command:
!mona jmp -r esp -cpb "\x00\x0a\0d"
Usually (in the labs I did) I got a list of possible JMP ESP commands with the memory addresses. But in my own environment I got the following mona-output:
0BADF00D !mona jmp -r esp -cpb "\x00\x0a\x0d"
---------- Mona command started on 2022-07-16 17:59:06 (v2.0, rev 616) ----------
0BADF00D [+] Processing arguments and criteria
0BADF00D - Pointer access level : X
0BADF00D - Bad char filter will be applied to pointers : "\x00\x0a\x0d"
0BADF00D [+] Generating module info table, hang on...
0BADF00D - Processing modules
0BADF00D - Done. Let's rock 'n roll.
0BADF00D [+] Querying 1 modules
0BADF00D - Querying module 32bitftp.exe
6ED20000 Modules C:\Windows\System32\rasadhlp.dll
0BADF00D - Search complete, processing results
0BADF00D [+] Preparing output file 'jmp.txt'
0BADF00D - (Re)setting logfile jmp.txt
0BADF00D Found a total of 0 pointers
0BADF00D
0BADF00D [+] This mona.py action took 0:00:03.265000
I recognized that only one module (32bitftp.exe) has been queried. In the course lab, much more (system) modules have been queried. So I asked myself why and used the
!mona modules
command to check the modules. I got the following output:
0BADF00D !mona modules
---------- Mona command started on 2022-07-16 18:04:03 (v2.0, rev 616) ----------
0BADF00D [+] Processing arguments and criteria
0BADF00D - Pointer access level : X
0BADF00D [+] Generating module info table, hang on...
0BADF00D - Processing modules
0BADF00D - Done. Let's rock 'n roll.
0BADF00D -----------------------------------------------------------------------------
------------------------------------------------------------
0BADF00D Module info :
0BADF00D -----------------------------------------------------------------------------
------------------------------------------------------------
0BADF00D Base | Top | Size | Rebase | SafeSEH | ASLR | NXCompat |
OS Dll | Version, Modulename & Path
0BADF00D -----------------------------------------------------------------------------
------------------------------------------------------------
0BADF00D 0x74ef0000 | 0x75010000 | 0x00120000 | True | True | True | False |
True | 10.0.19041.789 [ucrtbase.dll] (C:\Windows\System32\ucrtbase.dll)
0BADF00D 0x715a0000 | 0x715b6000 | 0x00016000 | True | True | True | False |
True | 10.0.19041.1151 [NLAapi.dll] (C:\Windows\system32\NLAapi.dll)
0BADF00D 0x74e70000 | 0x74eeb000 | 0x0007b000 | True | True | True | False |
True | 10.0.19041.789 [msvcp_win.dll] (C:\Windows\System32\msvcp_win.dll)
0BADF00D 0x72ee0000 | 0x72f7f000 | 0x0009f000 | True | True | True | False |
True | 10.0.19041.1 [apphelp.dll] (C:\Windows\SYSTEM32\apphelp.dll)
0BADF00D 0x74480000 | 0x74511000 | 0x00091000 | True | True | True | False |
True | 10.0.19041.1 [DNSAPI.dll] (C:\Windows\SYSTEM32\DNSAPI.dll)
0BADF00D 0x760f0000 | 0x761af000 | 0x000bf000 | True | True | True | False |
True | 7.0.19041.546 [msvcrt.dll] (C:\Windows\System32\msvcrt.dll)
0BADF00D 0x72880000 | 0x72afe000 | 0x0027e000 | True | True | True | False |
True | 10.0.19041.546 [CoreUIComponents.dll]
(C:\Windows\System32\CoreUIComponents.dll)
0BADF00D 0x76ef0000 | 0x7708e000 | 0x0019e000 | True | True | True | False |
True | 10.0.19041.1023 [ntdll.dll] (C:\Windows\SYSTEM32\ntdll.dll)
0BADF00D 0x68df0000 | 0x68e06000 | 0x00016000 | True | True | True | False |
True | 10.0.19041.1 [pnrpnsp.dll] (C:\Windows\system32\pnrpnsp.dll)
0BADF00D 0x640b0000 | 0x640c0000 | 0x00010000 | True | True | True | False |
True | 10.0.19041.546 [wshbth.dll] (C:\Windows\system32\wshbth.dll)
[...]
Every module has ASLR, Rebase, SafeSEH enabled. I have some basic knowledge about these security mechanisms but I'm pretty sure the exam will not require me to bypass them. In the labs, there have always been modules with ASLR, Rebase and SafeSEH disabled. So I came to the conclusion that mona.py didn't show me a result because these mechanisms are running.
My next idea was of course that I should turn off ASLR and DEP on my local Windows machines. After some research, I found out that on Windows 7 DEP can be disabled with the following command
bcdedit.exe /set {current} nx AlwaysOff
and ASLR can be disabled by using regedit to set a new 32-Bit DWORD value "MoveImages" under [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]. After a reboot ASLR should be disabled.
But its not! If I use
!mona modules
after the reboot, the output stays the same. Still, all security mechanisms (including ASLR) are turned on. After some further research I was not able to find a way to disable it in Windows 7.
So I tried it on Windows 10. Here I did not have to create a new registry key. DEP and ASLR could be disabled under "Windows Security -> App and Browser Control -> Exploit Protection". After a reboot, the mechanisms should be disabled. But again: They are not!
If I load the program into ImmunityDebugger and use
!mona modules
to show the modules, the table is still unchanged, showing that all system modules have turned ASLR on.
Of course I was able to get a JMP ESP instruction from kernel32.dll for example with the following command:
!mona jmp -r esp -cpb "\x00\x0a\x0d" -m kernel32.dll
If I use it to exploit the BOF while the local Windows 7/10 system is still running, that works fine. But after a reboot, the system modules addresses changed, thanks to ASLR and the addresses wont work anymore.
And of course, if I use the exploit against the remote target system, the exploit will fail.
So my questions are:
What am I doing wrong? (Maybe I think in the wrong way)
How can I really disable ASLR and DEP on Windows 7/10 systems?
In the exam, how can I know which modules on the remote target server have ASLR turned on? Even if I manage to turn of my local ASLR it might be that I'm unlucky and pick a module that has turned on ASLR on the remote target host...
Since my exam is not far anymore I would be very, very happy if someone could help me out with this. Anyway thanks so much that you took your time to read all this until here :)
20 days ago I've successfully provisioned for ESP-32 and work fine with this device.
Today I've successfully provisioned the second ESP-32 chip on another computer:
5.40 MiB / 5.40 MiB [------------------------------------] 100.00% 14.69 MiB p/s
looking for available hardware identities on disk
no hardware identities found on disk, claiming new hardware identity
Flashing device on port /dev/ttyUSB0
+--------------------------+--------------------------------------+
| SETTING | VALUE |
+--------------------------+--------------------------------------+
| Firmware | v1.0.2 |
| Device Model | esp32-4mb |
| Hardware ID | XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
| Hardware Batch & Seq. No | 2020-11-10#524 |
| context | remote |
| broker.host | device.toit.io |
| broker.cn | device.toit.io |
| broker.port | 9426 |
| wifi.ssid | SureDemo |
| wifi.password | suremote |
+--------------------------+--------------------------------------+
erasing device flash
successfully erased device flash
writing device partitions
successfully written device partitions
reading hardware chip information
successfully read hardware chip information
+--------------------------+--------------------------------------+
| SETTING | VALUE |
+--------------------------+--------------------------------------+
| factory device model | esp32-4mb |
| factory firmware version | v1.0.2 |
| chip ID | |
+--------------------------+--------------------------------------+
device was successfully flashed
However, I cannot start the application on this device:
michael_k # michaelk: ~ /toit_apps/Hsm2/tests $ toit run test_hsm_switch_async_4.toit
No default device set. Provide the device name (with the --device flag) to the command
michael_k # michaelk: ~ /toit_apps/Hsm2/tests $
I realized that this new device needs to be given a different name from my default device micrcx-1. By the way, I can see my first appliance:
michael_k # michaelk: ~ /toit_apps/Hsm2/tests $ toit devices
+--------------------------------------+----------+-------------------+----------+
| DEVICE ID | NAME | LAST SEEN | FIRMWARE |
+--------------------------------------+----------+-------------------+----------+
| XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX | micrcx-1 | Apr 29 2021 04:05 | v1.0.2 |
+--------------------------------------+----------+-------------------+----------+
michael_k#michaelk:~/toit_apps/Hsm2/tests$
So, the question is: how to give a name to a new additional device and how to run an application on it?
Thanks in advance, MK
PS. Naturally, I could be wrong, but as far as I remember, the name of the first device was assigned by toit system automatically. I had nothing to do with this. micrcx is my computer's identifier.
It might be that your device wasn't claimed yet.
In the current release (but hopefully not in future releases), provisioning a device only puts the Toit framework on the device. At this point it is not yet associated with your account and must be claimed.
You can simply run:
toit device claim <hardware-ID> or toit device claim <hardware-ID> --name=<some-name>.
If no name is provided, then the system generates one. Typically these are built out of two words, for example nervous-plastic. You can always change the names at a later point.
Alternatively you can claim the device in the web UI. There is a "CLAIM OR REPLACE DEVICE" button on the top right of the "Devices" view.
FYI: I have edited your post to remove the hardware ID of the new device, so nobody else claims the device in the meantime.
I'm new to the omnisci open source community. I have followed the instruction (https://www.omnisci.com/docs/latest/4_ubuntu-apt-gpu-os-recipe.html) to install the omnisci (open source version) into my ubuntu 18.04LTS
~$ sudo systemctl start omnisci_server
~$ $OMNISCI_PATH/bin/omnisql
Password:
User mapd connected to database maps
omnisql>
I have also install the CUDA driver 10.0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 415.27 Driver Version: 415.27 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|
0 TITAN V Off | 00000000:17:00.0 Off | N/A |
| 33% 48C P8 30W / 250W | 421MiB / 12036MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
|
1 GeForce GTX 108... Off | 00000000:65:00.0 Off | N/A |
| 30% 53C P8 20W / 250W | 172MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
|
2 GeForce GTX 108... Off | 00000000:66:00.0 On | N/A |
| 63% 81C P0 70W / 250W | 829MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
However, when I run the simple query on the sample dataset, it reports the error:
omnisql> \t
omnisci_states
omnisci_counties
omnisci_countries
nyc_trees_2015_683k
omnisql> select * from omnisci_counties;
Exception: device kernel image is invalid
My friend and I had a discussion on this issue. We believe this is because I have 2 different types GPUs on my machine. I need to specify one type of GPUs when start the omnisql sever, because the query engine of omnisci will confuse to initialize the parameters for two different types of cards on board.
Does anyone have any idea or suggestion?
I just found out by myself. The GPU devices for omnisci sever have to be consistent. You could use multiple cards, but they have to be same type.
For instance, in my case, set up the parameters in omnisci.conf
port = 6274
http-port = 6278
calcite-port = 6279
data = "/var/lib/omnisci/data"
null-div-by-zero = true
num-gpus = 2
start-gpu = 1
When use $sudo systemctl start omnisci_server to start the server, the file omnisci.conf will be automatically loaded.
When using multiple GPUs, they need to be the same model. Per the OmniSci FAQ:
https://www.omnisci.com/docs/latest/7_faq.html#multi-gpus
Does OmniSci support a single server with different GPUs? For example,
can I install OmniSci on one server with two NVIDIA GTX 760 GPUs and
two NVIDIA GTX TITAN GPUs?
OmniSci does not support mixing different
GPU models. Initially, you might not notice many issues with that
configuration because the GPUs are the same generation. However, in
this case you should consider removing the GTX 760 GPUs, or configure
OmniSci to not use them.
To configure OmniSci to use specific GPUs:
Run the nvidia-smi command to see the GPU IDs of the GTX 760s. Most
likely, the GPUs are grouped together by type. Edit the omnisci_server
config file as follows: If the GTX 760 GPUs are 0,1, configure
omnisci_server with the option start-gpu=2 to use the remaining two
TITAN GPUs. If the GTX 760s are 2,3, add the option num-gpus=2 to the
config file. The location of the config file depends on how you
installed OmniSci.
I have a quite small neural network with two fully connected sigmoid subnetworks 10->100->10, whose output is concatenated and then fed into another 20->100->1 network.
This architecture is quite small, having just a few weights matrices that have maximum dimension 20x100=2000 weights.
Even if I am using theano with all flags set to use gpu acceleration, the system reaches only 132 iterations (datapoints!) per second. I am not using minibatches because it's not your typical neural network, but a matrix factorization model.
The system I am using has the following specs:
OS
uname -a
Linux node081 2.6.32-358.18.1.el6.x86_64 #1 SMP Wed Aug 28 17:19:38 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
software
Python 3.5.2 (compiled by myself)
theano 0.9.0dev2.dev-ee4c4e21b9e9037f2aa9626c3d779382840ea2e3
NumPy 1.11.2
cpu
Intel(R) Xeon(R) CPU E5-2620 0 # 2.00GHz
with nproc that returns 12, hence might have 12 processors (or cores?)
gpu
this is the output of nvidia-smi (with my process running):
+------------------------------------------------------+
| NVIDIA-SMI 5.319.49 Driver Version: 319.49 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K20m On | 0000:03:00.0 Off | Off |
| N/A 29C P0 49W / 225W | 92MB / 5119MB | 19% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 5903 python3 77MB |
+-----------------------------------------------------------------------------+
command and env vars
OMP_NUM_THREADS=8 THEANO_FLAGS=mode=FAST_RUN,device=gpu,init_gpu_device=gpu,floatX=float32,nvcc.flags=-D_FORCE_INLINES,print_active_device=True,enable_initial_driver_test=True,warn_float64=raise,force_device=True,assert_no_cpu_op=raise python3 $#
theano config settings set at run-time
theano.config.optimizer='fast_run'
theano.config.openmp=True
theano.config.openmp_elemwise_minsize=4
theano.config.floatX='float32'
theano.config.assert_no_cpu_op='raise'
I also tried to deactivate openmp, and it works slightly slower.
It seems that I took all the precautions to make sure I have gpu acceleration correctly set. What might the reason of getting only 132 gradient updates at every second? Is there any further check I need to perform?
In theano,
The compilation is much faster on optimizer=fast_compile than with optimizer=fast_run.
Using the new-backend, with the help of new optimizer, compilation time has increased by 5X on certain networks. I'd suggest you should always stick with the new backend. You can use the new backend by using device=cudaflag.
While you're using the new backend, I'd advice you to use the bleeding edge if you want speed. There are a lot of optimizations being done every week that has potential to give great speeds.
From the docs, you can set the flag allow_gc=Falseto get faster speed
You can set config.nvcc.fastmath flag to True if you require some speed up from the division and multiplication operations at the cost of precision.
If you have convolutional operations in your network, you can set some config.dnn flags depending upon your network and needs. Also, setting cnmem flag will help.
Finally, whenever you're reporting a code to be slow, please share profiling results for helping the development :)
When debugging x86 assembly code in VS2013, I needed to check the contents of the FLAGS register. However, when I've enabled "Flags" in Register Window, I got:
OV = 0 UP = 0 EI = 1 PL = 1 ZR = 0 AC = 1 PE = 0 CY = 1
Those don't correspond to typical ODITSZAPC flags of x86; can anyone explain to me what's going on? Are those just weird names for the same flags?
I have a 64-bit Core i7; can it affect the displayed names?
| Overflow | OV |
| Direction | UP |
| Interrupt | EI |
| Sign | PL |
| Zero | ZR |
| Auxiliary | AC |
| Parity | PE |
| Carry | CY |
MSDN reference
Yes of course they are the same flags, what else?
But those really are misleading. When UP=1 it's actually reverse back direction (STD), also when PL=1 it's actually sign/negative. Why did VS designer tried to break ASM-thing that ain't broken for ages was beyond my comprehension though.
The same way in GNU's gdb, they called instruction pointer (IP) as $pc, DWORD/DD as word (w), WORD/DW as half-word (h), and QUADWORD/DQ as (g) from GIANT??
C++ Programmers are really weird, they liked to break convention for the sake of it.