Can envsubst not do in-place substitution?

Can envsubst not do in-place substitution? - bash

I have a config file which contains some ENV_VARIABLE styled variables.
This is my file.
It might contain $EXAMPLES of text.
Now I want that variable replaced with a value which is saved in my actual environment variables. So I'm trying this:
export EXAMPLES=lots
envsubst < file.txt > file.txt
But it doesn't work when the input file and output file are identical. The result is an empty file of size 0.
There must be a good reason for this, some bash basics that I'm not aware of?
How do I achieve what I want to do, ideally without first outputting to a different file and then replacing the original file with it?
I know that I can do it easily enough with sed, but when I discovered the envsubst command I thought that it should be perfect for my use case, so I'd like to use that.

Here is the solution that I use:
originalfile="file.txt"
tmpfile=$(mktemp)
cp --attributes-only --preserve $originalfile $tmpfile
cat $originalfile | envsubst > $tmpfile && mv $tmpfile $originalfile
Be careful with other solutions that do not use a temporary file. Pipes are asynchronous, so the file will occasionally be read after it has already been truncated.

Redirects are handled by the shell, not the program being executed, and they are set up before the program is invoked.
The redirect >output.file has the effect of creating output.file if it doesn't exist and emptying it if it does. Either way, you end up with an empty file, and that is what the program's output is redirected to.
Programs like sed which are capable of "in-place" modification must take the filename as a command-line argument, not as a redirect.
In your case, I would suggest using a temporary file and then renaming it if all goes OK.

envsubst < file.txt | tee file.txt

I found another shortcut to put into temp file and then rename it to original file.
envsubst < in.txt > out.txt && mv out.txt in.txt

To avoid creating a temporary file, use sponge not tee:
envsubst < file.txt | sponge file.txt
From https://linux.die.net/man/1/sponge:
sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

You can achieve in-place substitution by calling envsubst from gnu sed with the "e" command:
EXAMPLES=lots sed -i 's/.*/echo & | envsubst/e' file.txt

It's worth noting that the mv solution won't maintain file permissions. Using cp -pf would be preferable in the case that you're modifying an executable file.
tmpfile=$(mktemp)
cat file.txt | envsubst > "$tmpfile" && cp -pf "$tmpfile" file.txt
rm -f "$tmpfile"

This answer was framed from two other answers. I guess this is the best solution.
originalFile=file.txt
tmpfile=$(mktemp)
cat $originalFile | envsubst > "$tmpfile" && cp -pf "$tmpfile" $originalFile
rm -f "$tmpfile"

Updated 20221011 - Using 1 sed command
sed -i -r 's/["`]|\$\(/\\&/g; s/.*/echo "&"/ e' ./input.txt
Updated 20221007 - Using 2 sed commands
sed -i -r 's/["`]|\$\(/\\&/g' input.txt
sed -i -r 's/.*/echo "&"/ e' input.txt
Do it without envsubst
envsubst_file () {
local original_file=$1
local temp_file=$(mktemp)
trap "rm -f ${temp_file}" 0 2 3 15
cp -p ${original_file} ${temp_file}
cat ${original_file} | sed -r 's/["`]|\$\(/\\&/g' | sed -r 's/.*/echo "&"/g' | sh > ${temp_file}
mv ${temp_file} ${original_file}
}
envsubst_file 'input.txt'
First using sed to escapes double quotes("), backtick(`) and command $( by prefixing with backslash(\),then using sed again replace with
echo "&"
Finally executing the shell script and redirecting to ${temp_file}

If you use bash, check this:
a=`<file.txt` && envsubst <<<"$a" >file.txt
Tested on 500mb file, works as expected.

In the end I found that using envsubst was too dangerous after all. My files might contain dollar signs in places where I don't want any substitution to happen, and envsubst will just replace them with empty strings if no corresponding environment variable is defined. Not cool.

Related

need to clean file via SED or GREP

I have these files
NotRequired.txt (having lines which need to be remove)
Need2CleanSED.txt (big file , need to clean)
Need2CleanGRP.txt (big file , need to clean)
content:
more NotRequired.txt
[abc-xyz_pqr-pe2_123]
[lon-abc-tkt_1202]
[wat-7600-1_414]
[indo-pak_isu-5_761]
I am reading above file and want to remove lines from Need2Clean???.txt, trying via SED and GREP but no success.
myFile="NotRequired.txt"
while IFS= read -r HKline
do
sed -i '/$HKline/d' Need2CleanSED.txt
done < "$myFile"
myFile="NotRequired.txt"
while IFS= read -r HKline
do
grep -vE \"$HKline\" Need2CleanGRP.txt > Need2CleanGRP.txt
done < "$myFile"
Looks as if the Variable and characters [] making some problem.

What you're doing is extremely inefficient and error prone. Just do this:
grep -vF -f NotRequired.txt Need2CleanGRP.txt > tmp &&
mv tmp Need2CleanGRP.txt
Thanks to grep -F the above treats each line of NotRequired.txt as a string rather than a regexp so you don't have to worry about escaping RE metachars like [ and you don't need to wrap it in a shell loop - that one command will remove all undesirable lines in one execution of grep.
Never do command file > file btw as the shell might decide to execute the > file first and so empty file before command gets a chance to read it! Always do command file > tmp && mv tmp file instead.

Your assumption is correct. The [...] construct looks for any characters in that set, so you have to preface ("escape") them with \. The easiest way is to do that in your original file:
sed -i -e 's:\[:\\[:' -e 's:\]:\\]:' "${myFile}"
If you don't like that, you can probably put the sed command in where you're directing the file in:
done < replace.txt|sed -e 's:\[:\\[:' -e 's:\]:\\]:'
Finally, you can use sed on each HKline variable:
HKline=$( echo $HKline | sed -e 's:\[:\\[:' -e 's:\]:\\]:' )

try gnu sed:
sed -Ez 's/\n/\|/g;s!\[!\\[!g;s!\]!\\]!g; s!(.*).!/\1/d!' NotRequired.txt| sed -Ef - Need2CleanSED.txt
Two sed process are chained into one by shell pipe
NotRequired.txt is 'slurped' by sed -z all at once and substituted its \n and [ meta-char with | and \[ respectively of which the 2nd process uses it as regex script for the input file, ie. Need2CleanSED.txt. 1st process output;
/\[abc-xyz_pqr-pe2_123\]|\[lon-abc-tkt_1202\]|\[wat-7600-1_414\]|\[indo-pak_isu-5_761\]/d
add -u ie. unbuffered, option to evade from batch process, sort of direct i/o

Editing multiple files using a bash script [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?

Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name

You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.

Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name

try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.

You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?

One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name

Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.

This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.

The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}

There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name

You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example

This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}

Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC

I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

I am trying to modify a bash script to remove a glob of malicious code from a large number of files.
The community will benefit from this, so here it is:
#!/bin/bash
grep -r -l 'var createDocumentFragm' /home/user/Desktop/infected_site/* > /home/user/Desktop/filelist.txt
for i in $(cat /home/user/Desktop/filelist.txt)
do
cp -f $i $i.bak
done
for i in $(cat /home/user/Desktop/filelist.txt)
do
$i | sed 's/createDocumentFragm.*//g' > $i.awk
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
This is where the script bombs out with this message:
+ for i in '$(cat /home/user/Desktop/filelist.txt)'
+ sed 's/createDocumentFragm.*//g'
+ /home/user/Desktop/infected_site/index.htm
I get 2 errors and the script stops.
/home/user/Desktop/infected_site/index.htm: line 1: syntax error near unexpected token `<'
/home/user/Desktop/infected_site/index.htm: line 1: `<html><head><script>(function (){ '
I have the first 2 parts done.
The files containing createDocumentfragm have been enumerated in a text file correctly.
The files in the textfile.txt have been duplicated, in their original location with a .bak added to them IE: infected_site/some_directory/infected_file.htm and infected_file.htm.bak
effectively making sure we have a backup.
All I need to do now is write an AWK command that will use the list of files in filelist.txt, use the entire glob of malicious text as a pattern, and remove it from the files. Using just the uppercase script as the starting point, and the lower case script is too generic and could delete legitimate text
I suspect this may help me, but I don't know how to use it correctly.
http://backreference.org/2010/03/13/safely-escape-variables-in-awk/
Once I have this part figured out, and after you have verified that the files weren't mangled you can do this to clean out the bak files:
for i in $(cat /home/user/Desktop/filelist.txt)
do
rm -f $i.bak
done

Several things:
You have:
$i | sed 's/var createDocumentFragm.*//g' > $i.awk
You should probably meant this (using your use of cat which we'll talk about in a moment):
cat $i | sed 's/var createDocumentFragm.*//g' > $i.awk
You're treating each file in your file list as if it was a command and not a file.
Now, about your use of cat. If you're using cat for almost anything but concatenating multiple files together, you probably are doing something not quite right. For example, you could have done this:
sed 's/var createDocumentFragm.*//g' "$i" > $i.awk
I'm also a bit confused about the awk statement. Exactly what file are you using awk on? Your awk statement is using STDIN and STDOUT, so it's reading file names from the for loop and then printing the output on the screen. Is the sed statement suppose to feed into the awk statement?
Note that I don't have to print out my file to STDOUT, then pipe that into sed. The sed command can take the file name directly.
You also want to avoid for loops over a list of files. That is very inefficient, and can cause problems with the command line getting overloaded. Not a big issue today, but can affect you when you least suspect it. What happens is that your $(cat /home/user/Desktop/filelist.txt) must execute first before the for loop can even start.
A little rewriting of your program:
cd ~/Desktop
grep -r -l 'var createDocumentFragm' infected_site/* > filelist.txt
while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
We can use one loop, and we made it a while loop. I could even feed the grep into that while loop:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" > "$i.awk"
awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p'
done < filelist.txt
and then I don't even have to create a temporary file.
Let me know what's going on with the awk. I suspect you wanted something like this:
grep -r -l 'var createDocumentFragm' infected_site/* | while read file
do
cp -f "$file" "$file.bak"
sed 's/var createDocumentFragm.*//g' "$file" \
| awk '/<\/SCRIPT>/{p=1;print}/<\/script>/{p=0}!p' > "$i.awk"
done < filelist.txt
Also note I put quotes around file names. This helps prevent problems if file name has a space in it.

How can I use a file in a command and redirect output to the same file without truncating it?

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?

Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name

You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.

Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name

try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.

You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?

One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name

Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.

This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.

The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}

There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name

You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example

This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}

Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC

I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Redirect output from sed 's/c/d/' myFile to myFile

I am using sed in a script to do a replace and I want to have the replaced file overwrite the file. Normally I think that you would use this:
% sed -i 's/cat/dog/' manipulate
sed: illegal option -- i
However as you can see my sed does not have that command.
I tried this:
% sed 's/cat/dog/' manipulate > manipulate
But this just turns manipulate into an empty file (makes sense).
This works:
% sed 's/cat/dog/' manipulate > tmp; mv tmp manipulate
But I was wondering if there was a standard way to redirect output into the same file that input was taken from.

I commonly use the 3rd way, but with an important change:
$ sed 's/cat/dog/' manipulate > tmp && mv tmp manipulate
I.e. change ; to && so the move only happens if sed is successful; otherwise you'll lose your original file as soon as you make a typo in your sed syntax.
Note! For those reading the title and missing the OP's constraint "my sed doesn't support -i": For most people, sed will support -i, so the best way to do this is:
$ sed -i 's/cat/dog/' manipulate

Yes, -i is also supported in FreeBSD/MacOSX sed, but needs the empty string as an argument to edit a file in-place.
sed -i "" 's/old/new/g' file # FreeBSD sed

If you don't want to move copies around, you could use ed:
ed file.txt <<EOF
%s/cat/dog/
wq
EOF

Kernighan and Pike in The Art of Unix Programming discuss this issue. Their solution is to write a script called overwrite, which allows one to do such things.
The usage is: overwrite file cmd file.
# overwrite: copy standard input to output after EOF
opath=$PATH
PATH=/bin:/usr/bin
case $# in
0|1) echo 'Usage: overwrite file cmd [args]' 1>&2; exit 2
esac
file=$1; shift
new=/tmp/overwr1.$$; old=/tmp/overwr2.$$
trap 'rm -f $new $old; exit 1' 1 2 15 # clean up
if PATH=$opath "$#" >$new
then
cp $file $old # save original
trap '' 1 2 15 # wr are commmitted
cp $new $file
else
echo "overwrite: $1 failed, $file unchanged" 1>&2
exit 1
fi
rm -f $new $old
Once you have the above script in your $PATH, you can do:
overwrite manipulate sed 's/cat/dog/' manipulate
To make your life easier, you can use replace script from the same book:
# replace: replace str1 in files with str2 in place
PATH=/bin:/usr/bin
case $# in
0|2) echo 'Usage: replace str1 str2 files' 1>&2; exit 1
esac
left="$1"; right="$2"; shift; shift
for i
do
overwrite $i sed "s#$left#$right#g" $i
done
Having replace in your $PATH too will allow you to say:
replace cat dog manipulate

You can use sponge from the moreutils.
sed "s/cat/dog/" manipulate | sponge manipulate

Perhaps -i is gnu sed, or just an old version of sed, but anyways. You're on the right track. The first option is probably the most common one, the third option is if you want it to work everywhere (including solaris machines)... :) These are the 'standard' ways of doing it.

To change multiple files (and saving a backup of each as *.bak):
perl -p -i -e "s/oldtext/newtext/g" *
replaces any occurence of oldtext by newtext in all files in the current folder. However you will have to escape all perl special characters within oldtext and newtext using the backslash
This is called a “Perl pie” (mnemonic: easy as a pie)
The -i flag tells it do do in-place replacement, and it should be ok to use single (“'”) as well as double (“””) quotes.
If using ./* instead of just *, you should be able to do it in all sub-directories
See man perlrun for more details, including how to take a backup file of the original.
using sed:
sed -i 's/old/new/g' ./* (used in GNU)
sed -i '' 's/old/new/g' ./* (used in FreeBSD)

-i option is not available in standard sed.
Your alternatives are your third way or perl.

A lot of answers, but none of them is correct. Here is the correct and simplest one:
$ echo "111 222 333" > file.txt
$ sed -i -s s/222/444/ file.txt
$ cat file.txt
111 444 333
$

Workaround using open file handles:
exec 3<manipulate
Prevent open file from being truncated:
rm manipulate
sed 's/cat/dog/' <&3 > manipulate

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Can envsubst not do in-place substitution? - bash

envsubst < file.txt | tee file.txt

I found another shortcut to put into temp file and then rename it to original file. envsubst < in.txt > out.txt && mv out.txt in.txt

You can achieve in-place substitution by calling envsubst from gnu sed with the "e" command: EXAMPLES=lots sed -i 's/.*/echo & | envsubst/e' file.txt

It's worth noting that the mv solution won't maintain file permissions. Using cp -pf would be preferable in the case that you're modifying an executable file. tmpfile=$(mktemp) cat file.txt | envsubst > "$tmpfile" && cp -pf "$tmpfile" file.txt rm -f "$tmpfile"

This answer was framed from two other answers. I guess this is the best solution. originalFile=file.txt tmpfile=$(mktemp) cat $originalFile | envsubst > "$tmpfile" && cp -pf "$tmpfile" $originalFile rm -f "$tmpfile"

If you use bash, check this: a=`<file.txt` && envsubst <<<"$a" >file.txt Tested on 500mb file, works as expected.

In the end I found that using envsubst was too dangerous after all. My files might contain dollar signs in places where I don't want any substitution to happen, and envsubst will just replace them with empty strings if no corresponding environment variable is defined. Not cool.

Related

need to clean file via SED or GREP

Editing multiple files using a bash script [duplicate]

Trying to write a script to clean <script.aa=([].slice+'hjkbghkj') from multiple htm files, recursively

How can I use a file in a command and redirect output to the same file without truncating it?

Redirect output from sed 's/c/d/' myFile to myFile

Categories

Resources