Out of memory issue in gensim topic modeling - gensim

I wanted to successfully run LDAseq model on my very huge corpus. I finally want to extract 100 topics from it.
I am getting an error "out of memory" on the step of ldaseq model. This is because I have a huge token and I don't want to truncate it. How to resolve this memory issue?
Windows-10-10.0.17763-SP0
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)]
NumPy 1.17.0
SciPy 1.3.0
gensim 3.8.0
FAST_VERSION 0
My expected result is the same as shown in the documentation. I need a topic-term and topic-doc matrix finally.

Use on the MMCorpus of gensim.corpora.MMcorpus
It's similar to UCI Bow easy to build.
https://radimrehurek.com/gensim/corpora/mmcorpus.html

Related

Julia package load extremely slow in first run

I'm using Julia 1.5.2 under Linux 5.4.0 and waited around 15 minutes for Pkg.add("DifferentialEquations"). Then I started the Kernel in Jupyter Notebook and ran the following code. It took terribly 1 minute to execute (the actual first time that I did this it took 225s).
t = time()
using Printf
using BenchmarkTools
using OrdinaryDiffEq
using Plots
tt = time() - t
#sprintf("It took %f seconds to import Printf, BenchmarkTools, OrdinaryDiffEq and Plots.", tt)
# It took 58.545894 seconds to import Printf, BenchmarkTools, OrdinaryDiffEq and Plots.
Finally, I done the same as above, but for each package. This is the summary:
Printf: 0.004755973815917969
BenchmarkTools: 0.06729602813720703
Plots: 19.99405598640442
OrdinaryDiffEq: 19.001102209091187
I know from here that Pkg was slow in the past, but I think that 15 minutes isn't a normal installing time at all. However, this is not my big problem.
I know that Julia needs to compile everything everytime the Kernel is started or some package is loaded. But it obviously is not a compilation time, it's a compilation eternity.
Can anyone figure out why this is so terribly slow? And, if it's normal, wouldn't it be better to provide precompiled packages to Pkg such as numpy and friends are in Python? Or at least compile forever in the first using?
Thank you!
My complete Platform Info:
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Core(TM) i3-6100U CPU # 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
This problem is generally called latency or time-to-first-plot (TTFP) when referring to julia-lang. There are some discussions you can find when using these keywords.
A nice recent analysis of this problem is assessed in the article "Analyzing sources of compiler latency in Julia: method invalidations"
At the time of writing (end 2020, stable release v1.5.3), no general solution is available but strategies of massive precompilation of packages instead of JIT is discussed, with marginal success.

Is it possible to get the width and height of a .gif file in scala? [duplicate]

I am able to read png file. But getting ArrayIndexOutOfBoundsException: 4096 while reading gif file.
byte[] fileData = imageFile.getFileData();
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(fileData);
RenderedImage image = ImageIO.read(byteArrayInputStream)
Exception thrown looks like
java.lang.ArrayIndexOutOfBoundsException: 4096
at com.sun.imageio.plugins.gif.GIFImageReader.read(Unknown Source)
at javax.imageio.ImageIO.read(Unknown Source)
at javax.imageio.ImageIO.read(Unknown Source)
what could be the issue and what is the resolution?
Update 3: Solution
I ended up developing my own GifDecoder and released it as open source under the Apache License 2.0. You can get it from here: https://github.com/DhyanB/Open-Imaging. It does not suffer from the ArrayIndexOutOfBoundsException issue and delivers decent performance.
Any feedback is highly appreciated. In particular, I'd like to know if it works correctly for all of your images and if you are happy with its speed.
I hope this is helpful to you (:
Initial answer
Maybe this bug report is related to or describes the same problem: https://bugs.openjdk.java.net/browse/JDK-7132728.
Quote:
FULL PRODUCT VERSION :
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) 64-Bit Server VM (build 22.0-b10, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Microsoft Windows [Version 6.1.7601]
A DESCRIPTION OF THE PROBLEM :
according to specification
http://www.w3.org/Graphics/GIF/spec-gif89a.txt
> There is not a requirement to send a clear code when the string table is full.
However, GIFImageReader requires the clear code when the string table is full.
GIFImageReader violates the specification, clearly.
In the real world, sometimes people finds such high compressed gif image.
so you should fix this bug.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
javac -cp .;PATH_TO_COMMONS_CODEC GIF_OverflowStringList_Test.java
java -cp .;PATH_TO_COMMONS_CODEC GIF_OverflowStringList_Test
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
complete normally. no output
ACTUAL -
ArrayIndexOutOfBounds occurs.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4096
at com.sun.imageio.plugins.gif.GIFImageReader.read(GIFImageReader.java:1
075)
at javax.imageio.ImageIO.read(ImageIO.java:1400)
at javax.imageio.ImageIO.read(ImageIO.java:1322)
at GIF_OverflowStringList_Test.main(GIF_OverflowStringList_Test.java:8)
REPRODUCIBILITY :
This bug can be reproduced always.
The bug report also provides code to reproduce the bug.
Update 1
And here is an image that causes the bug in my own code:
Update 2
I tried to read the same image using Apache Commons Imaging, which led to the following exception:
java.io.IOException: AddStringToTable: codes: 4096 code_size: 12
at org.apache.commons.imaging.common.mylzw.MyLzwDecompressor.addStringToTable(MyLzwDecompressor.java:112)
at org.apache.commons.imaging.common.mylzw.MyLzwDecompressor.decompress(MyLzwDecompressor.java:168)
at org.apache.commons.imaging.formats.gif.GifImageParser.readImageDescriptor(GifImageParser.java:388)
at org.apache.commons.imaging.formats.gif.GifImageParser.readBlocks(GifImageParser.java:251)
at org.apache.commons.imaging.formats.gif.GifImageParser.readFile(GifImageParser.java:455)
at org.apache.commons.imaging.formats.gif.GifImageParser.readFile(GifImageParser.java:435)
at org.apache.commons.imaging.formats.gif.GifImageParser.getBufferedImage(GifImageParser.java:646)
at org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1378)
at org.apache.commons.imaging.Imaging.getBufferedImage(Imaging.java:1292)
That looks very similar to the problem we have with ImageIO, so I reported the bug at the Apache Commons JIRA: https://issues.apache.org/jira/browse/IMAGING-130.
I encountered the exact same problem you did, but I had to stick to an ImageIO interface, which no other library did. Apart from Jack's great answer, I simply patched the existing GIFImageReader class with a few lines of code, and got it marginally working.
Copy this link into PatchedGIFImageReader.java and use as such:
reader = new PatchedGIFImageReader(null);
reader.setInput(ImageIO.createImageInputStream(new FileInputStream(files[i])));
int ub = reader.getNumImages(true);
for (int x=0;x<ub;x++) {
BufferedImage img = reader.read(x);
//Do whatever with the new img bufferedimage
Be sure to change the package name to whatever you're using.
Unfortunately results may vary, as the patch was a 1 minute bugfix that basically just exits the loop if it goes past the buffer. Some gifs it loads fine, others have a few visual artifacts.
Such is life. If anyone knows a better fix instead of mine, please do tell.

Unpredictable CUDNN_STATUS_NOT_INITIALIZED on Windows

I am running keras neural network training and prediction on GTX 1070 on Windows 10. Most times it is working, but from time to time it complains
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
It cannot be explained neither by literally error meaning nor by OOM error.
How to fix?
Try limiting your gpu usage with set gpu option per_process_gpu_memory_fraction.
Fiddle around with it to see what works and what doesn't.
I recommend using .7 as a starting baseline.
I met the problem sometimes on Windows10 and Keras.
Reboot solve the problem for a short time, but happen again.
I refer to https://github.com/fchollet/keras/issues/1538
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
set_session(tf.Session(config=config))
the settings solve the halt problem.
Got the solution for this problem.
I had the same problem on Windows 10 with Nvidia GEforce 920M.
Search for the correct version of cudnn library. If the version is not compatable with the CUDA version it won't throw the error while tensorflow installation but will interfere during memory allocation in the GPU.
DO check your CUDA and CUDNN versions. Also follow the instructions about creation of sessions mentioned above.
Finally the issue is now resolved for me, I spent many hours struggling with this.
I recommend follow all the steps of installation properly as mentioned in
links
TensorFlow-
https://www.tensorflow.org/install/install_windows
and for CuDNN -
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-windows
for me this wasn't enough, I tried updating my GeForce Game Ready Driver from GeForce Experience window, and after restart it started working for me.
GeForce Experience
the driver can also be downloaded from link https://www.geforce.com/drivers
Similar to what other people are saying, enabling memory growth for your GPUs can resolve this issue.
The following works for me by adding to the beginning of the training script:
# Using Tensorflow-2.4.x
import tensorflow as tf
try:
tf_gpus = tf.config.list_physical_devices('GPU')
for gpu in tf_gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except:
pass
the tf doku help me a lot Allowing GPU memory growth
The first is the allow_growth option, which attempts to allocate only as much GPU memory based on runtime allocations: it starts out allocating very little memory, and as Sessions get run and more GPU memory is needed, we extend the GPU memory region needed by the TensorFlow process. Note that we do not release memory, since that can lead to even worse memory fragmentation. To turn this option on, set the option in the ConfigProto by:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
or
with tf.Session(graph=graph_node, config=config) as sess:
...
The second method is the per_process_gpu_memory_fraction option, which determines the fraction of the overall amount of memory that each visible GPU should be allocated. For example, you can tell TensorFlow to only allocate 40% of the total memory of each GPU by:
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Pandas/ Numpy (v0.19.1/ v1.11.2) dataframe performance Issues when accessing by index

I have a function f that I pass two pandas.DataFrames. I'm iterating through the columns of the first one. It contains index values of the second one. The index is a string, more particularly a MD5 string like '1950abcbdf69bc4b6da8d950e87f538f'. I use those indexes to retrieve rows of the second dataframe. Here's the code:
def f(df_A, df_B):
for row in df_A.itertuples():
hash_index=row[1]
fields_B = df_B.ix[hash_index].values # <== VERY SLOW
It works very fine on my laptop (Ubuntu 16.04.1 LTS, VM), but due performance issues, I moved to a server VM (Debian GNU/Linux 8 (jessie), I needed more RAM). The server uses:
'3.5.2 (default, Dec 3 2016, 16:49:26) \n[GCC 4.9.2]'
numpy==1.11.2
pandas==0.19.1
My laptop has:
'3.5.2 (default, Nov 17 2016, 17:05:23) \n[GCC 5.4.0 20160609]'
numpy==1.11.1
pandas==0.18.1
To mention the most relevant data. The big problem is, that the Server is way slower (factor 1000 or even more). In the code example I marked the line with "VERY SLOW". It takes the server 0.094 seconds to execute that line. .loc[] was even slower. Can you imagine a reason for this.
Once I've written my question I already noticed the different version of numpy and pandas. Thus I just downgraded the version of both packages and now it works like a charm. Thus either numpy or pandas at it's newest version is buggy here...

Runtime system for Stm32F103 Arm, GNAT Ada compiler

Id like to use Ada with Stm32F103 uc, but here is the problem - there is no build-in runtime system within GNAT 2016. There is another cortex-m3 uc by TI RTS included - zfp-lm3s, but seems like it needs some global updates, simple change of memory size/origin doesn't work.
So, there is some questions:
Does some body have RTS for stm32f103?
Is there any good books about low-level staff of cortex-m3 or other arm uc?
PS. Using zfp-lm3s rises this error, when i try to run program via GPS:
Loading section .text, size 0x140 lma 0x0
Load failed
The STM32F series is from STMicroelectronics, not TI, so the stm32f4 might seem to be a better starting point.
In particular, the clock code in bsp/setup_pll.adb should need only minor tweaking; use STM’s STM32CubeMX tool (written in Java) to find the magic numbers to set up the clock properly.
You will also find that the assembler code used in bsp/start*.S needs simplifying/porting to the Cortex-M3 part.
My Cortex GNAT Run Time Systems project includes an Arduino Due version (also Cortex-M3), which has startup code written entirely in Ada. I don’t suppose the rest of the code would help a lot, being based on FreeRTOS - you’d have to be very very careful about memory usage.
I stumbled upon this question while looking for a zfp runtime specific to the stm32l0xx boards. It doesn't look like one exists from what I can see, but I did stumble upon this guide to creating a new runtime from AdaCore, which might help anyone stuck with the same issue:
https://blog.adacore.com/porting-the-ada-runtime-to-a-new-arm-board

Resources