I have the following code that tries to tie arrays to files. Except, when I run this code, it only creates 2045 files. What is the issue here?
#!/usr/bin/perl
use Tie::File;
for (my $i = 0; $i < 10000; $i++) {
#files{$i} = ();
tie #{$files{$i}}, 'Tie::File', "files//tiefile$i";
}
Edit: I am on windows
You are accumulating open file handles (see ulimit -n, setrlimit RLIMIT_NOFILE/RLIMIT_OFILE), and you ultimately hit a 2048 open file descriptors limit (2045 + stdin + stdout + stderr.)
Under Windows you will have to rewrite your application so that it has at most 2048 open file handles at any one time, since the 2048 limit is hard limit (cannot be modified) in MSVC's stdio.
On Linux machines go to /etc/security/limits.conf and add or modify these lines
* soft nofile 10003
* hard nofile 10003
This will increase the number of files each process can have open to 10003 (remember that you always start with three open: stdin, stdout, and stderr).
Based on you comments it sounds like you are using a Win32 machine. I can't find a way to increase the number of open files per process, but you might, and I stress might, be able to handle this through fork'ing (which is really threading on Win32).
Related
I'm using rsync to backup some data from a remote host.
this is how I'm using the rsync cmd:
rsync --dry-run -avhi -e ssh --include-from=/home/rsync_list/test.txt root#10.10.4.61:/ /mnt/BACKUP/my_BACKUP/
this is the file /home/rsync_list/test.txt
+ /usr/acs/conf/**
+ /usr/acs/bin/**
+ /raid0/opmdps/TEMP_folder/**
- *
I want to copy only the listed folders excluding the remaining files.
I always get
receiving file list ... done
sent 103 bytes received 48 bytes 302.00 bytes/sec
total size is 0 speedup is 0.00 (DRY RUN)
Could you tell me what I'm doing wrong? How should I write the rsync command if I would like to sync, for example, only /raid0/opmdps/TEMP_folder/ without its subfolders?
I wonder if you really only tried with the command you posted?
Not only you are using "--dry-run" option, even the output indicates this:
total size is 0 speedup is 0.00 (DRY RUN)
Please consult the manpage:
-n, --dry-run perform a trial run with no changes made
https://linux.die.net/man/1/rsync
May I suggest you give it a run without --dry-run?
I've created a simple go script: https://gist.github.com/kbl/86ed3b2112eb80522949f0ce574a04e3
It's fetching some xml from the internet and then starts X goroutines. The X depends on file content. In my case it was 1700 goroutines.
My first execution finished with:
$ go run mathandel1.go
2018/01/27 14:19:37 Get https://www.boardgamegeek.com/xmlapi/boardgame/162152?pricehistory=1&stats=1: dial tcp 72.233.16.130:443: socket: too many open files
2018/01/27 14:19:37 Get https://www.boardgamegeek.com/xmlapi/boardgame/148517?pricehistory=1&stats=1: dial tcp 72.233.16.130:443: socket: too many open files
exit status 1
I've tried to increase ulimit to 2048.
Now I'm getting different error, script is the same thou:
$ go build mathandel1.go
# command-line-arguments
/usr/local/go/pkg/tool/linux_amd64/link: flushing $WORK/command-line-arguments/_obj/exe/a.out: write $WORK/command-line-arguments/_obj/exe/a.out: file too large
What is causing that error? How can I fix that?
You ran ulimit 2048 which changed the maximum file size.
From man bash(1), ulimit section:
If no option is given, then -f is assumed.
This means that you now set the maximum file size to 2048 bytes, that's probably not enough for.... anything.
I'm guessing you meant to change the limit for number of open file descriptors. For this, you want to run:
ulimit -n 2048
As for the original error (before changing the maximum file size), you're launching 1700 goroutines, each performing a http get. Each creates a connection, using a tcp socket. These are covered by the open file descriptor limit.
Instead, you should be limiting the number of concurrent downloads. This can be done with a simple worker pool pattern.
I'm trying to find the best and most efficient way to resume reading a file from a given point.
The given file is being written frequently (this is a log file).
This file is rotated on a daily basis.
In the log file I'm looking for a pattern 'slow transaction'. End of such lines have a number into parentheses. I want to have the sum of the numbers.
Example of log line:
Jun 24 2015 10:00:00 slow transaction (5)
Jun 24 2015 10:00:06 slow transaction (1)
This is easy part that I could do with awk command to get total of 6 with above example.
Now my challenge is that I want to get the values from this file on a regular basis. I've an external system that polls a custom OID using SNMP. When hitting this OID the Linux host runs a couple of basic commands.
I want this SNMP polling event to get the number of events since the last polling only. I don't want to have the total every time, just the total of the newly added lines.
Just to mention that only bash can be used, or basic commands such as awk sed tail etc. No perl or advanced programming language.
I hope my description will be clear enough. Apologizes if this is duplicate. I did some researches before posting but did not find something that precisely correspond to my need.
Thank you for any assistance
In addition to the methods in the comment link, you can also simply use dd and stat to read the logfile size, save it and sleep 300 then check the logfile size again. If the filesize has changed, then skip over the old information with dd and read the new information only.
Note: you can add a test to handle the case where the logfile is deleted and then restarted with 0 size (e.g. if $((newsize < size)) then read all.
Here is a short example with 5 minute intervals:
#!/bin/bash
lfn=${1:-/path/to/logfile}
size=$(stat -c "%s" "$lfn") ## save original log size
while :; do
newsize=$(stat -c "%s" "$lfn") ## get new log size
if ((size != newsize)); then ## if change, use new info
## use dd to skip over existing text to new text
newtext=$(dd if="$lfn" bs="$size" skip=1 2>/dev/null)
## process newtext however you need
printf "\nnewtext:\n\n%s\n" "$newtext"
size=$((newsize)); ## update size to newsize
fi
sleep 300
done
Suppose I have a text file, which contains data like this.
Below output generated from du - sh /home/*
1.5G user1
2.5G user2
And so on...
Now if I want that those files size be stored in an array and compared to 5 GB if the user is consuming more than 5 Gb. What can I do?
The du command shows the usage of each folder in home directory. So if i want myself to be notified that some user is consuming more than 5 GB. Because there is a long list of users. It will be tedious to identify each user's disk usage. I want a shell script to identify the usage for each directory in home. And then I will put mail function to notify myself for exceeded limits.
Note : Don't want to implement quota as I just want to monitor the usage.
Use du's -t (--threshold) option to specify you only want to know about directories with more than a certain amount of data in them:
$ du -sh -t 5G /home/*
If you're picky about precisely how big a gigabyte is, note that 5G uses multiples of 1024; you may prefer -t 5GB for multiples of 1000, or even -t 5000M to mix them.
For lots of users, you're probably better off writing that using -d 1 instead of -s to avoid the shell having to expand the * into a very long list:
$ du -h -d 1 -t 5G /home/
I have a job running on production which process xml files.
xml files counts around 4k and of size 8 to 9 GB all together.
After processing we get CSV files as output. I've a cat command which will merge all CSV files to a single file I'm getting:
Errno::ENOMEM: Cannot allocate memory
on cat (Backtick) command.
Below are few details:
System Memory - 4 GB
Swap - 2 GB
Ruby : 1.9.3p286
Files are processed using nokogiri and saxbuilder-0.0.8.
Here, there is a block of code which will process 4,000 XML files and output is saved in CSV (1 per xml) (sorry, I'm not suppose to share it b'coz of company policy).
Below is the code which will merge the output files to a single file
Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each {|file|
`cat #{file} >> #{final_output_file}`
}
I've taken memory consumption snapshots during processing.It consumes almost all part of the memory, but, it won't fail.
It always fails on cat command.
I guess, on backtick it tries to fork a new process which doesn't get enough memory so it fails.
Please let me know your opinion and alternative to this.
So it seems that your system is running pretty low on memory and spawning a shell + calling cat is too much for the few memory left.
If you don't mind loosing some speed, you can merge the files in ruby, with small buffers.
This avoids spawning a shell, and you can control the buffer size.
This is untested but you get the idea :
buffer_size = 4096
output_file = File.open(final_output_file, 'w')
Dir["#{processing_directory}/*.csv"].sort_by {|file| [file.count("/"), file]}.each do |file|
f = File.open(file)
while buffer = f.read(buffer_size)
output_file.write(buffer)
end
f.close
end
You are probably out of physical memory, so double check that and verify your swap (free -m). In case you don't have a swap space, create one.
Otherwise if your memory is fine, the error is most likely caused by shell resource limits. You may check them by ulimit -a.
They can be changed by ulimit which can modify shell resource limits (see: help ulimit), e.g.
ulimit -Sn unlimited && ulimit -Sl unlimited
To make these limit persistent, you can configure it by creating the ulimit setting file by the following shell command:
cat | sudo tee /etc/security/limits.d/01-${USER}.conf <<EOF
${USER} soft core unlimited
${USER} soft fsize unlimited
${USER} soft nofile 4096
${USER} soft nproc 30654
EOF
Or use /etc/sysctl.conf to change the limit globally (man sysctl.conf), e.g.
kern.maxprocperuid=1000
kern.maxproc=2000
kern.maxfilesperproc=20000
kern.maxfiles=50000
I have the same problem, but instead of cat it was sendmail (gem mail).
I found problem & solution here by installing posix-spawn gem, e.g.
gem install posix-spawn
and here is the example:
a = (1..500_000_000).to_a
require 'posix/spawn'
POSIX::Spawn::spawn('ls')
This time creating child process should succeed.
See also: Minimizing Memory Usage for Creating Application Subprocesses at Oracle.