How to run a prediction on GPU? - h2o

I am using h2o4gpu and the parameters which i have set are
h2o4gpu.solvers.xgboost.RandomForestClassifier model.
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bytree=1.0, gamma=0, learning_rate=0.1, max_delta_step=0,
max_depth=8, min_child_weight=1, missing=nan, n_estimators=100,
n_gpus=1, n_jobs=-1, nthread=None, num_parallel_tree=1, num_round=1,
objective='binary:logistic', predictor='gpu_predictor',
random_state=123, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=False, subsample=1.0, tree_method='gpu_hist')
When i am training this model and then predicting, everything is running fine on GPU.
However, when i am saving my model in pickle and then loading back into another notebook and then running a prediction through predict_proba on it, then everything is running on CPU.
Why is my prediction not running on GPU?

The predictions are meant to run on CPU so you don't need a GPU to actually use the model.

Related

What can be done to lower UE4Editor startup time?

Status: the problem lowered, but compared to other users reports it persists.
I have moved to UE4.27.0 and the startup time lowered from 11 (v4.26.2) to 6 minutes! (the RAM usage lowered too!) But doesnt compare to the speed other ppl report "almost instantly"...
It is not compiling anything, not even shaders, it is like the 6th time I run it for one project.
Should I try to disable plugins? but Im new with UE and dont want to difficult my usage. Tho, for ex., I have nothing VR related to test so it could really be initially disabled.
HD READ SPEED? NO
I have tested moving UE4Editor whole engine path (100GB) to a 3xSSD(Stripes), but the UE4Editor startup time remained the same. My HD were it is too, is fast but not so fast as the 3xSSD.
CPU USAGE? MAY BE if it could use 4 cores could solve it?
UE4Editor startup uses A SINGLE CORE ONLY, i can confirm with htop and system monitor, it is possible to see only a single core being used 100% and it changes between the 4 cores, so only one is used at 100% per time.
I tested this command line parameter -USEALLAVAILABLECORES after the project URL for UE4Editor, but nothing changed. I read that option is ignored in some machines, so may be if I patch it's usage it could work on mine?
GPU? no?
a report about an integrated graphics card (weak one) says it doesnt interfere with the startup time.
LOG for UE4Editor v4.27.0 with the new biggest intervals ("..." means ommited log lines to make it easier to read; "!(interval in seconds)" is just to easy reading it (no ommitted lines here)):
[2021.09.15-23.38.20:677][ 0]LogHAL: Linux SourceCodeAccessSettings: NullSourceCodeAccessor
!22s
[2021.09.15-23.38.42:780][ 0]LogTcpMessaging: Initializing TcpMessaging bridge
[2021.09.15-23.38.42:782][ 0]LogUdpMessaging: Initializing bridge on interface 0.0.0.0:0 to multicast group 230.0.0.1:6666.
!16s
[2021.09.15-23.38.58:158][ 0]LogPython: Using Python 3.7.7
...
[2021.09.15-23.39.01:817][ 0]LogImageWrapper: Warning: PNG Warning: Duplicate iCCP chunk
!75s
[2021.09.15-23.40.16:951][ 0]SourceControl: Source control is disabled
...
[2021.09.15-23.40.26:867][ 0]LogAndroidPermission: UAndroidPermissionCallbackProxy::GetInstance
!16s
[2021.09.15-23.40.42:325][ 0]LogAudioCaptureCore: Display: No Audio Capture implementations found. Audio input will be silent.
...
[2021.09.15-23.41.08:207][ 0]LogInit: Transaction tracking system initialized
!9s
[2021.09.15-23.41.17:513][ 0]BlueprintLog: New page: Editor Load
!23s
[2021.09.15-23.41.40:396][ 0]LocalizationService: Localization service is disabled
...
[2021.09.15-23.41.45:457][ 0]MemoryProfiler: OnSessionChanged
!13s
[2021.09.15-23.41.58:497][ 0]LogCook: Display: CookSettings for Memory: MemoryMaxUsedVirtual 0MiB, MemoryMaxUsedPhysical 16384MiB, MemoryMinFreeVirtual 0MiB, MemoryMinFreePhysical 1024MiB
SPECS:
I'm using ubuntu 20.04.
My CPU is 4 cores 3.6GHz.
GeForce GT 710 1GB.
Related question but for older UE4: https://answers.unrealengine.com/questions/987852/view.html
Unreal Engine needs a high-end pc with a lot of RAM, fast SSD's, a good CPU and a medium graphic card. First of all there are always some shaders that needs to be compiled from the engine, and a lot of assets to be loaded in the startup time. As I can see you're on Linux you are probably using a self-compiled Unreal Engine version.... not the best thing to do for a newbie, because this may cause several problems on load time, startup, compiling and a lot of other stuff. If it's the first times you're using Unreal, try using it on Windows, it's all easier.

GPU usage shows zero when CUDA with PyTorch using on Windows

I have pytorch script.
import torch
torch.cuda.is_available()
# True
device=torch.device('cuda:0')
# I moved my tensors to device
But Windows Task Manager shows zero GPU (NVIDIA GTX 1050TI) usage when pytorch script running
Speed of my script is fine and if I had changing torch.device to CPU instead GPU a speed become slower, therefore cuda (GPU) is working. Why Windows Task Manager doesn't show GPU usage?
Sample of my code:
device=torch.device("cuda:0")
model=torch.load('mymodel.pth', map_location=torch.device(device))
image=Image.open('picture.png').convert('RGB')
transform=transforms.Compose([
transforms.Resize(224),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
input=transform(image)
input=torch.unsqueeze(input, 0)
input=input.to(device)
output=model(input)
Windows task manager overall utilization does not seem to include cuda usage. Make sure you select the cuda option in the graphs.
For details see: https://medium.com/#michaelceber/gpu-monitoring-on-windows-10-for-machine-learning-cuda-41088de86d65
Just calling torch.device('cuda:0') doesn't actually use the GPU. It's just an identifier for a device.
Instead, following the documentation, you should move your tensors and models to the GPU.
torch.randn((2,3), device=torch.device('cuda:0'))
# Or
tensor = torch.randn((2,3))
cuda0 = torch.device('cuda:0')
tensor.to(cuda0)
Please install GPU-Z and then you will be able to see the correct GPU load in Windows.

Unpredictable CUDNN_STATUS_NOT_INITIALIZED on Windows

I am running keras neural network training and prediction on GTX 1070 on Windows 10. Most times it is working, but from time to time it complains
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
It cannot be explained neither by literally error meaning nor by OOM error.
How to fix?
Try limiting your gpu usage with set gpu option per_process_gpu_memory_fraction.
Fiddle around with it to see what works and what doesn't.
I recommend using .7 as a starting baseline.
I met the problem sometimes on Windows10 and Keras.
Reboot solve the problem for a short time, but happen again.
I refer to https://github.com/fchollet/keras/issues/1538
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
set_session(tf.Session(config=config))
the settings solve the halt problem.
Got the solution for this problem.
I had the same problem on Windows 10 with Nvidia GEforce 920M.
Search for the correct version of cudnn library. If the version is not compatable with the CUDA version it won't throw the error while tensorflow installation but will interfere during memory allocation in the GPU.
DO check your CUDA and CUDNN versions. Also follow the instructions about creation of sessions mentioned above.
Finally the issue is now resolved for me, I spent many hours struggling with this.
I recommend follow all the steps of installation properly as mentioned in
links
TensorFlow-
https://www.tensorflow.org/install/install_windows
and for CuDNN -
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-windows
for me this wasn't enough, I tried updating my GeForce Game Ready Driver from GeForce Experience window, and after restart it started working for me.
GeForce Experience
the driver can also be downloaded from link https://www.geforce.com/drivers
Similar to what other people are saying, enabling memory growth for your GPUs can resolve this issue.
The following works for me by adding to the beginning of the training script:
# Using Tensorflow-2.4.x
import tensorflow as tf
try:
tf_gpus = tf.config.list_physical_devices('GPU')
for gpu in tf_gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except:
pass
the tf doku help me a lot Allowing GPU memory growth
The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in the ConfigProto by:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
or
with tf.Session(graph=graph_node, config=config) as sess:
...
The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Retrieve RISC-V processor context after execution in FPGA

I'm loading RISC-V into a Zedboard and I'm running a benchmark (provided in riscv-tools) without booting riscv-linux, in this case:
./fesvr-zynq median.riscv
It finishes without errors, giving as result the number of cycles and instret.
My problem is that I want more information, I would like to know the processor context after the execution (register bank values and memory) as well as the result given by the algorithm. Is there any way to know this from the FPGA execution? I know that it can be done with the simulator but I need to run it on FPGA.
Thank you.
Do it the same way it gives you the cycles and instret data. Check out riscv-tests/benchmarks/common/*. The code is running bare metal so you can write whatever code you want and access any of the CSRs, registers or memory, and then you can use a basic version of printf to display the information.

How can I shrink the OS region in RAM through U-boot?

From my understanding, after a PC/embedded system booted up, the OS will occupy the entire RAM region, the RAM will look like this:
Which means, while I'm running a program I write, all the variables, dynamic memory allocated in the stacks, heaps and etc, will remain inside the region. If I run firefox, paint, gedit, etc, they will also be running in this region. (Is this understanding correct?)
However, I would like to shrink the OS region. Below is an illustration of how I want to divide the RAM:
The reason that I want to do this is because, I want to store some data receive externally through the driver into the Custom Region at fixed physical location, then I will be able to access it directly from the user space without using copy_to_user().
I think it is possible to do that by configuring u-boot, but I have no experience in u-boot, can anyone give me some directions where to begin with, such as: do I need to modify the source of u-boot, or changing the environment variables of u-boot will be sufficient?
Or is there any alternative method of doing this?
Any help is much appreciated. Thanks!
p/s: I'm using TI ARM processor, and booting up from an SD card, I'm not sure if it matters.
The platform is ARM. min_addr and max_addr will not work on these platform since these are for Intel-only implementations.
For the ARM platform try to look at "mem=size#start" kernel parameter. Read up on Documentation/kernel-parameters.txt and arch/arm/kernel/setup.c. This option is available on most new Linux code base (ie. 2.6.XX).
You need to set the following parameters:
max_addr=some_max_physical
min_addr=some_min_physical
to be passed to the kernel through uboot in the 'bootargs' u-boot environment variable.
I found myself trying to do the opposite recently - in other words get Linux to use the additional memory in my system - although I'm using Barebox rather than u-boot on a OMAP4 platform.
I found (a bit to my surprise) that once the Barebox MLO first stage boot-loader was aware of the extra RAM, the kernel then detected and used it as well without any bootargs. Since the memory size is not passed anywhere on the boot-line, I can only assume the kernel inspects the memory mappings set up by the boot-loader to determine RAM size. This suggests that modifying your u-boot to not map all of the RAM is the way to go.
On the subject of boot-args, there was a time when you it was recommended that you mapped out a chunk of RAM (used by the frame buffer?) on OMAP4 systems, using the boot-line. It's still unclear whether this is still necessary.

Resources