I am executing a shell script, on a high level it reads the records from a csv and consequently do some DB operation.
I have analyzed by manually running the script. Its running fine for less than 900 records in file but it gives the error for more than 900 records. Below is the screenshot of the error which i get after some time:
There is a part of the script which is picking record 1 by 1:
Could you please suggest why this is happening? I have read similar topics when user got that error, but unable to relate them with my scneario.
Cheers
I've hit this problem before and it's quite easy to replicate:
unset a; export a=$(perl -e 'print "a"x(1024*64)'); whoami
tiago
unset a; export a=$(perl -e 'print "a"x(1024*128)'); whoami
bash: /usr/bin/whoami: Argument list too long
perl -e 'print "a"x(1024*64)' | wc -c
65536
perl -e 'print "a"x(1024*128)' | wc -c
131072
So something between 65536 and 131072 bytes breaks, When I had This problem instead of export the value I was printing and used pipes to work with data. Another way around is to use files.
You can find nice experiments: What is the maximum size of an environment variable value?
Related
This bash snippet works to add a UUID and a tab character (\t) at the start of each line of a file.
while read; do
echo "$(uuidgen | tr 'A-Z' 'a-z')\t$REPLY"
done < sourcefile.tsv > temp_destination.tsv
(Note the reason for the pipe to TR is to convert them to lowercase in MacOS version of UUID-generation).
Although that performs well for smaller files, it doesn't seem efficient.
sed -i '' "s/^/$(uuidgen | tr 'A-Z' 'a-z')\t/" sourcefile.tsv
Again, using MacOS bash so the '' after the -i flag is required since I don't want a backup file.
I think sed would perform better, but I think I have to have the UUID generation in some sort of loop.
I'm just looking to make this faster and/or perform more efficiently. It's working, but it's pretty slow on a 20,000-line file, and all other attempts by me have stumped me.
EDIT: I tested my bash script just outputting the UUIDs to a while loop without any of the other subprocesses. With my configuration, I can generate about 250-300 per second, so updating a 20,000-line file will take a minimum of 72 seconds just because of the weak link of UUID generation. As described below, using Perl or Python will likely be faster.
EDIT 2: This little python script kill the bash script. This snippet only does part of what I need, but just for comparison, it generated about 200,000 UUIDs in a second, or 1,000,000 in 5 seconds compared to the 250-300 in the bash subprocess. Wow, what a difference.
#!/usr/bin/env python3
#this generates 1,000,000 UUIDs in about 5 seconds
import uuid
import sys
sys.stdout = open('lots-of-uuid.txt', 'w')
i = 1
while i < 1000000:
print(uuid.uuid4())
i +=1
sys.stdout.close()
Did you try something like that:
{
uuidgen | tr 'A-Z' 'a-z'
echo -n "\t"
cat 'sourcefile.tsv'
} > temp_destination.tsv
You may think it is not much different from your "read" version, but it is:
You don't capture the result of uuidgen
cat will probably perform faster than read + $REPLY
Try this out:
while read; do printf "%s\t%s\n" $(uuidgen) "$REPLY"; done < input.tsv > output.tsv
No monkeying around with building strings.
Using sed
$ sed -i '' 's/.*/printf &#;\Luuidgen/e;s/\([^#]*\)#\(.*\)/\2\t\1/' sourcefile.tsv
This might work for you (GNU sed):
sed -i 'h;s/.*/uuidgen/e;s/.*/\L&/;G;s/\n/\t/' file
Make a copy of the current line.
Replace the current line by an evaluated uuidgen command and convert the result to lowercase.
Append the copy and replace the newline by a tab.
I am trying to split a 13Gb file into equal chunks using Linux Bash Shell in windows 10 by:
split -n l/13 myfile.csv
and I am getting the following error:
split: 'xaa' would overwrite input; aborting
the xaa which is created is empty.
I have also tried using:
split -l 9000000 myfile.csv
which wields the same results.
I have used the split command before with similar arguments with no problem.
Any ideas what am I missing?
Thanks in advance
EDIT: even if i provide my own prefix I still get the same error:
split -n l/13 myfile.csv completenewprefix
split: 'completenewprefixaa' would overwrite input; aborting
EDIT2:
ls -di completenewprefixaa myfile.csv
1 completenewprefixaa 1 myfile.csv
findmnt -T .
TARGET SOURCE FSTYPE OPTIONS
/mnt/u U: drvfs rw,relatime,case=off
I'm trying to temporarily disable dhcp on all connections in a computer using bash, so I need the process to be reversible. My approach is to comment out lines that contain BOOTPROTO=dhcp, and then insert a line below it with BOOTPROTO=none. I'm not sure of the correct syntax to make sed understand the line number stored in the $insertLine variable.
fileList=$(ls /etc/sysconfig/network-scripts | grep ^ifcfg)
path="/etc/sysconfig/network-scripts/"
for file in $fileList
do
echo "looking for dhcp entry in $file"
if [ $(cat $path$file | grep ^BOOTPROTO=dhcp) ]; then
echo "disabling dhcp in $file"
editLine=$(grep -n ^BOOTPROTO=dhcp /$path$file | cut -d : -f 1 )
#comment out the original dhcp value
sed -i "s/BOOTPROTO=dhcp/#BOOTPROTO=dhcp/g" $path$file
#insert a line below it with value of none.
((insertLine=$editLine+1))
sed "$($insertLine)iBOOTPROTO=none" $path$file
fi
done
Any help using sed or other stream editor greatly appreciated. I'm using RHEL 6.
The sed editor should be able to do the job, without having to to be combine bash, grep, cat, etc. Easier to test, and more reliable.
The whole scripts can be simplified to the below. It performs all operations (substitution and the insert) with a single pass using multiple sed scriptlets.
#! /bin/sh
for file in $(grep -l "^BOOTPROTO=dhcp" /etc/sysconfig/network-scripts/ifcfg*) ; do
sed -i -e "s/BOOTPROTO=dhcp/#BOOTPROTO=dhcp/g" -e "/BOOTPROTO=dhcp/i BOOTPROTO=none" $file
done
As side note consider NOT using path as variable to avoid possible confusion with the 'PATH` environment variable.
Writing it up, your attempt with the following fails:
sed "$($insertLine)iBOOTPROTO=none" $path$file
because:
$($insertLine) encloses $insertLIne in a command substitution which when $insertLIne is evaluated it returns a number which is not a command generating an error.
your call to sed does not include the -i option to edit the file $path$file in place.
You can correct the issues with:
sed -i "${insertLine}i BOOTPROTO=none" $path$file
Which is just sed - i (edit in place) and Ni where N is the number of the line to insert followed by the content to insert and finally what file to insert it in. You add ${..} to insertLine to protect the variable name from the i that follows and then the expression is double-quoted to allow variable expansion.
Let me know if you have any further questions.
(and see dash-o's answer for refactoring the whole thing to simply use sed to make the change without spawning 10 other subshells)
I have very big text files(~50,000) over which i have to do some text processing. Basically run multiple grep commands.
When i run it manually it returns in an instant , but when i do the same in a bash script - it takes a lot of time. What am i doing wrong in below bash script. I pass the names of files as command line arguments to script
Example Input data :
BUSINESS^GFR^GNevil
PERSONAL^GUK^GSheila
Output that should come in a file - BUSINESS^GFR^GNevil
It starts printing out the whole file on the terminal after quite some while. How do i suppress the same?
#!/bin/bash
cat $2 | grep BUSINESS
Do NOT use cat with program that can read file itself.
It slows thing down and you lose functionality:
grep BUSINESS test | grep '^GFR|^GDE'
Or you can do like this with awk
awk '/BUSINESS/ && /^GFR|^GDE/' test
I would like to print the number of characters in each line of a text file using a unix command. I know it is simple with powershell
gc abc.txt | % {$_.length}
but I need unix command.
Use Awk.
awk '{ print length }' abc.txt
while IFS= read -r line; do echo ${#line}; done < abc.txt
It is POSIX, so it should work everywhere.
Edit: Added -r as suggested by William.
Edit: Beware of Unicode handling. Bash and zsh, with correctly set locale, will show number of codepoints, but dash will show bytes—so you have to check what your shell does. And then there many other possible definitions of length in Unicode anyway, so it depends on what you actually want.
Edit: Prefix with IFS= to avoid losing leading and trailing spaces.
Here is example using xargs:
$ xargs -d '\n' -I% sh -c 'echo % | wc -c' < file
I've tried the other answers listed above, but they are very far from decent solutions when dealing with large files -- especially once a single line's size occupies more than ~1/4 of available RAM.
Both bash and awk slurp the entire line, even though for this problem it's not needed. Bash will error out once a line is too long, even if you have enough memory.
I've implemented an extremely simple, fairly unoptimized python script that when tested with large files (~4 GB per line) doesn't slurp, and is by far a better solution than those given.
If this is time critical code for production, you can rewrite the ideas in C or perform better optimizations on the read call (instead of only reading a single byte at a time), after testing that this is indeed a bottleneck.
Code assumes newline is a linefeed character, which is a good assumption for Unix, but YMMV on Mac OS/Windows. Be sure the file ends with a linefeed to ensure the last line character count isn't overlooked.
from sys import stdin, exit
counter = 0
while True:
byte = stdin.buffer.read(1)
counter += 1
if not byte:
exit()
if byte == b'\x0a':
print(counter-1)
counter = 0
Try this:
while read line
do
echo -e |wc -m
done <abc.txt