lightgbm.basic.LightGBMError: bin size 1141 cannot run on GPU - lightgbm

I am using lihgtgbm GPU for training, and the error "[LightGBM] [Fatal] bin size 1141 cannot run on GPU" is prompted. The error is "lightgbm.basic.LightGBMError: bin size 1141 cannot run on GPU".
[LightGBM] [Warning] Categorical features with more bins than the configured maximum bin number found.
[LightGBM] [Warning] For categorical features, max_bin and max_bin_by_feature may be ignored with a large number of categories.
[LightGBM] [Info] Number of positive: 15108, number of negative: 1492392
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 13145
[LightGBM] [Info] Number of data points in the train set: 1507500, number of used features: 94
[LightGBM] [Fatal] bin size 1141 cannot run on GPU
Traceback (most recent call last):
File "train.py", line 68, in <module>
main()
File "train.py", line 59, in main
train(params, tc)
File "D:\python\model_ls\ywls\model.py", line 120, in train
callbacks=callbacks
File "D:\server\Anaconda3\envs\lgb-gpu\lib\site-packages\lightgbm\engine.py", line 222, in train
booster = Booster(params=params, train_set=train_set)
File "D:\server\Anaconda3\envs\lgb-gpu\lib\site-packages\lightgbm\basic.py", line 2673, in __init__
ctypes.byref(self.handle)))
File "D:\server\Anaconda3\envs\lgb-gpu\lib\site-packages\lightgbm\basic.py", line 141, in _safe_call
raise LightGBMError(_LIB.LGBM_GetLastError().decode('utf-8'))
lightgbm.basic.LightGBMError: bin size 1141 cannot run on GPU

You can look at the following link, which is about the introduction to "max_bin", you can set it as max_bin=255
LGBM max_bin
max_bin, default = 255, type =int, aliases: max_bins,
constraints: max_bin > 1
max number of bins that feature values will be bucketed in small number of bins may reduce training accuracy but may increase general power (deal with over-fitting)
LightGBM will auto compress memory according to max_bin.
For example, LightGBM will use uint8_t for feature value if max_bin=255

Related

numpy.core._exceptions.MemoryError: Unable to allocate 4.69 GiB for an array with shape (102400, 64, 64, 3) and data type float32

I want to denoise images using below code with GAN.
https://gitee.com/cleryer-1/image-denoise-using-wasserstein-gan
This is my modified version of code: https://drive.google.com/drive/folders/1xdRZ5IOnAx_VyhvBB67x19EoDe6CjHUZ?usp=sharing (i only changed image pathways in config.py )
I'm trying to implement same code on my own test and training data which is as follows:
(http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_LR_bicubic_X2.zip)
(http://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_LR_bicubic_X2.zip)
I followed the first link above and reshaped and added noise to my own data(256*256 size) with the help of "python image_operation.py -h" in conda. I also changed the amount of virtual memory but problem still occurs. I only changed path in the config.py file with original work.
It gave 2 different errors at different times, these are:
'NoneType' object is not iterable.
numpy.core._exceptions.MemoryError: Unable to allocate 4.69 GiB for an array with shape (102400, 64, 64, 3) and data type float32.
I also tried it on Google Colab Pro+ but it gives an same error. At the last line, "train()"Should i change something about filepaths?
**runfile('C:/Users/eren.alici/Desktop/pythons/train.py', wdir='C:/Users/eren.alici/Desktop/pythons')
generating patches
done
Traceback (most recent call last):
File "C:\Users\eren.alici\Desktop\pythons\train.py", line 83, in
train()
File "C:\Users\eren.alici\Desktop\pythons\train.py", line 24, in train
truth, noise = get_patch(truth, noise)
File "C:\Users\eren.alici\Desktop\pythons\utils.py", line 198, in get_patch
return np.array(out_raw), np.array(out_noise)
MemoryError: Unable to allocate 4.69 GiB for an array with shape (102400, 64, 64, 3) and data type float32**
edit: i tried to train just 20 image and it gives this error this time:
C:\Users\asus\AppData\Local\Microsoft\WindowsApps\python3.8.exe C:\Users\asus\PycharmProjects\pythonProject1\train.py
2022-12-31 16:50:37.740732: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2022-12-31 16:50:37.743082: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
generating patches
done
2022-12-31 16:51:48.724730: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-12-31 16:51:48.727183: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2022-12-31 16:51:48.851474: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: DESKTOP-OLN8V00
2022-12-31 16:51:48.851846: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-OLN8V00
2022-12-31 16:51:48.860257: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-31 16:51:48.952904: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1e5e84fe770 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-12-31 16:51:48.953235: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
File "C:\Users\asus\PycharmProjects\pythonProject1\train.py", line 84, in <module>
train()
File "C:\Users\asus\PycharmProjects\pythonProject1\train.py", line 45, in train
for times in epoch_bar(range(truth.shape[0] // BATCH_SIZE)):
File "C:\Users\asus\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\progressbar\progressbar.py", line 150, in __next__
value = next(self.__iterable)
TypeError: 'NoneType' object is not an iterator

YOLOV5 running on mac

I have configured the environment to run on the new Metal Performance Shaders (MPS) backend for GPU training acceleration for PyTorch and when running Yolov5 on my Macbook M2 Air it is always creating an error.
RES_DIR = set_res_dir()
if TRAIN:
!python /Users/krishpatel/yolov5/train.py --data /Users/krishpatel/yolov5/roboflow/data.yaml --weights yolov5s.pt \
--img 640 --epochs {EPOCHS} --batch-size 32 --device mps --name {RES_DIR}
this is the error
screenshot of the error
UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
t = t[j] # filter
0%| | 0/20 [00:16<?, ?it/s]
Traceback (most recent call last):
File "/Users/krishpatel/yolov5/train.py", line 630, in <module>
main(opt)
File "/Users/krishpatel/yolov5/train.py", line 524, in main
train(opt.hyp, opt, device, callbacks)
File "/Users/krishpatel/yolov5/train.py", line 307, in train
loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size
File "/Users/krishpatel/yolov5/utils/loss.py", line 125, in __call__
tcls, tbox, indices, anchors = self.build_targets(p, targets) # targets
File "/Users/krishpatel/yolov5/utils/loss.py", line 213, in build_targets
j, k = ((gxy % 1 < g) & (gxy > 1)).T
NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
what should i do? because when running in just cpu it is taking a very very very long time? any suggestions would be appreciated.
So I tried to search everywhere but couldn't find anything for macbook. I think if it doesn't help i would have to run it on google colab but then what would be the point of buying expensive macbook and not use gpu for running.

Stable Diffusion Videos - DLL load failed while importing _ufuncs: %1 is not a valid Win32 application

I'm trying to run the Stable Diffusion Videos package and have installed the package and logged into Hugging Face. When I try to run the provided code from the package's GitHub page, I run into the ImportError: DLL load failed while importing _ufuncs: %1 is not a valid Win32 application. error. I have tried various solutions to this error but so far none of the solutions have worked.
I'm running Windows 11, 64 bit, and Python 3.10. I've read about missing dll files but am unsure how to find / install them to possibly fix this problem.
from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch
pipeline = StableDiffusionWalkPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
video_path = pipeline.walk(
prompts=['a cat', 'a dog'],
seeds=[42, 1337],
num_interpolation_steps=3,
height=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
width=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
output_dir='dreams', # Where images/videos will be saved
name='animals_test', # Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, # Number of diffusion steps per image generated. 50 is good default
)

Pyspark ML error object has no attribute map

Below is my dataframe and code
df=
a b c d
1 3 10 110
2 5 12 112
3 6 17 112
4 8 110 442
Below is my code
spark =SparkSession.builder.appName('dev_member_validate_spark').config('spark.sql.crossJoin.enabled','true').getOrCreate()
sqlCtx=SQLContext(spark)
from pyspark.ml.linalg import DenseVector
from pyspark.mllib.regression import LabeledPoint
temp = df.select("a","b").map(lambda line:LabeledPoint(line[0],[line[1:]]))
When I am executing temp= line I get below error
Error:Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-
1.cdh5.7.0.p0.120904/lib/spark2/python/pyspark/sql/dataframe.py", line 964, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'map'
I am using pyspark 2.1 with Cloudera 5.10
I am doing the above scripting with reference to link:
https://databricks.com/product/getting-started-guide/machine-learning
Please help me in resolving this issue.
That is because Dataframe simply does not have 'map' attribute. Prior to Spark 2.0, it had, but not anymore. Databricks did not update the tutorial. You can map by converting to rdd, i.e. df.rdd
First please notice:
There are two separate ML libraries:
The first (from which you've imported the linear algebra libraries), is pyspark.ml.
The second is pyspark.mllib from which you have imported LabelPoint.
Trying to interop these two packages is a road filled with pain. Try to stick to one, and stay on it.
Second, as for the exception you've got:
temp = df.select("a","b").map(...)
df is a DataFrame, which doesn't have a map method.
But please take my first advice - don't mix mllib and ml modules.

How to measure performance impact of GCC linking option -Wl,-z,relro,-z,now on binary startup on ARM

I'm trying to find a way to measure the start-up performance impact of using relro and early binding linkage options on an ARM platform.
Someone can suggest me how to find the time spent linking shared libraries for a binary compiled with that options ?
Many thanks.
Edit 1:
No time information on my machine.
root#arm:/# LD_DEBUG=statistics /bin/date
1470: number of relocations: 90
1470: number of relocations from cache: 3
1470: number of relative relocations: 1207 Thu Jan 1 00:17:00 UTC 1970
1470:
1470: runtime linker statistics:
1470: final number of relocations: 108
1470: final number of relocations from cache: 3
If you are using GLIBC:
$ LD_DEBUG=statistics /bin/date
4494:
4494: runtime linker statistics:
4494: total startup time in dynamic loader: 932928 clock cycles
4494: time needed for relocation: 299052 clock cycles (32.0%)
4494: number of relocations: 106
4494: number of relocations from cache: 4
4494: number of relative relocations: 1276
4494: time needed to load objects: 420660 clock cycles (45.0%)
Fri Feb 28 16:40:48 PST 2014
Build your binary with and without -z,relro and compare the numbers.

Resources