What is the simplest way to write a bash script to accept arguments and input from all possible direction (similar to sort -k1 -r)? - bash

I want to write a bash script that can handle arguments and input similar to many built-in bash command. For example, like sort, it can handle
sort -k 1 -r input.txt
sort input.txt -k 1 -r
cat input.txt | sort -k 1 -r
sort -k 1 -r < input.txt
sort -k 1 -r <(cat input.txt)
I want my script to be able to handle arguments and input in the similar way
myscript.sh -i 3 -b 4 input.txt
myscript.sh input.txt -i 3 -b 4
cat input.txt | myscript.sh -i 3 -b 4
myscript.sh -i 3 -b 4 < input.txt
myscript.sh -i 3 -b 4 <(cat input.txt)
so far I only used some features from "read" and "getopts" and think that it's may be buggy if I try to do that on my own.
To make me state my question more clearly, let the content of input.text be
aaa
bbb
ccc
and I want to use value from argument i and b to do something but I'll just print it out in this example. The sample output that I want is
i : 3
b : 4
aaa
bbb
ccc
What is the best way to write a code to handle my above sample commands to give out this output?
Below is the code that got from the sandwich idea of #chepner, which is the best one so far.
#!/bin/bash -l
die () {
echo >&2 "[exception] $#"
exit 1
}
#parse param
while getopts "i:b:" OPTION; do
case "$OPTION" in
i)
i="$OPTARG"
;;
b)
b="$OPTARG"
;;
*)
die "unrecognized option"
;;
esac
done
if [ -e tmpfile ]
then
rm tmpfile
fi
shift $(($OPTIND - 1))
echo "i : "$i
echo "b : "$b
cat $1 > tmpfile
if read -t 0; then
cat >> tmpfile
fi
cat tmpfile

Executive summary: you can use read -t 0 to test if there is any input available on standard input. It will exit with status 0 data is immediately available on standard input (via a pipe or a redirected file), and 1 if not (e.g., still connected to the keyboard). You can then branch in your script based on whether or not you need to read from standard input.
if read -t 0; then
# Do one thing
else
# Do something else
fi
The tricky part for me was writing the script so that it does not block reading standard input if you don't pipe anything to it.
This seems to work; improvements welcome. First, consume everything on standard input; then process files given as arguments.
# The first call to read only succeeds when there is input
# available on standard input. It does not actually consume
# a line, though.
if read -t 0; then
# Read standard input normally
while read line; do
echo $line
done
fi
# I'll assume you populate an array called input files
# while processing your arguments
for file in "${inputfiles[#]"; do
cat $file
done
Here's a pointless wrapper around sort just do demonstrate another way of combining standard input with other input files:
if read -t 0; then
cat | sort fileA fileB
else
sort fileA file B
fi
A slightly more useful command might be sandwich, which outputs its standard input (if any) between two files given on the command line.
#!/bin/bash
cat "$1" # Output the first file
read -t 0 && cat # Pass through the standard input, if there is any
cat "$2" # Output the second file
# cat "$1" - "$2" is almost the same, but requires standard input.
Some calls to sandwich could be
$ sandwich header.txt footer.txt
$ sandwich header.txt footer.txt < body.txt
$ cat chapter1.txt chapter2.txt | sandwich <(echo "My header") <(echo "My footer")
This doesn't quite work, so there's room for improvement...
$ cat - | sandwich header.txt footer.txt

cat input.txt | command
command < input.txt
command <(cat input.txt)
These three are identical from the applications viewpoint, the shell does the redirect. The application only has to read stdin.
command input.txt
cat input.txt | command
These differ only in the file you process, in the first one it's input.txt, in the second one it's stdin, for which you can use -, most of the unix commands support it.
# echo "X" | cat -
X
So you set the default: filename=-, and if you read the filename option, you overwrite this variable.

Related

Infinite loop when redirecting output to the input file [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Editing multiple files using a bash script [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

How can I use a file in a command and redirect output to the same file without truncating it?

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Resources