Send diff output to 3 files - bash

What I want to do is, diff 2 files and write the diff output to 3 different files.
I can tell diff to format its output like:
diff a.txt b.txt --new-line-format=... --old-line-format=... --unchanged-line-format=...
And using this:
diff a.txt b.txt --new-bla-bla="echo %l>new.txt" --old--="echo %l>old" ...
I can output to 3 different files, except the double quotes don't appear.
I want to do this as minimally as possible, so running 3 diffs, etc are not an option

Here's a solution that is maybe a little longer, but more robust as it avoids the need for eval:
diff a.txt b.txt --new-line-format "3 %L" \
--old-line-format "4 %L"\
--unchanged-line-format "5 %L" |\
while read -r fd line; do
echo "$line" >&$fd
done 3> new.txt 4> old.txt 5> unchanged.txt
This works by prefixing each of the new, old, and unchanged lines (respectively) with the file descriptor of the file we will add them to. We then parse the output using read, and echo the line to the correct file descriptor, each of which is redirected to the correct output file.

I wrote the following before I read #chepner's excellent answer:
diff diff_old diff_new --new-line-format='>%L' --old-line-format='<%L' --unchanged-line-format='=%L' |
awk '
function printto(file) {print substr($0,2) > file}
/^>/ {printto("new.txt")}
/^</ {printto("old.txt")}
/^=/ {printto("unchanged.txt")}
'
This works silimarly to his answer but requires another process instead of working in the current[*] shell.
[*] discounting the subshell created for the while commands in a pipeline.

Related

Bash; How to combine multiple files into one file

I have multiple file in one directory, I want to combine each file into a single file using Bash. The output need to contain the file name and then list its contents. Example would be
$ cat File 1
store
$ cat File 2
bank
$ cat File 3
car
Desired output is in a single file named master
$ cat master
File 1
store
File 2
bank
File 3
car
for FILE in "File 1" "File 2" "File 3"; do
echo "$FILE"
cat "$FILE"
done > master
What you have asked for is what cat is meant for; it's short for concatenate, because it concatenates the contents of files together.
But it doesn't inject the filenames into the output. If you want the filenames there, your best bet is probably a loop:
for f in "File 1" "File 2" "File 3"; do
printf '%s\n' "$f"
cat "$f"
done > master
This will do the job
for f in File{1..3} ; do
echo $f >> master;
cat $f >> master;
done
With gnu sed
sed -s '1F' *
'-s'
'--separate'
By default, 'sed' will consider the files specified on the command
line as a single continuous long stream. This GNU 'sed' extension
allows the user to consider them as separate files: range addresses
(such as '/abc/,/def/') are not allowed to span several files, line
numbers are relative to the start of each file, '$' refers to the
last line of each file, and files invoked from the 'R' commands are
rewound at the start of each file.
'F'
Print out the file name of the current input file (with a trailing
newline).

Duplicate stdin to stdout

I am looking for a bash one-liner that duplicates stdin to stdout without interleaving. The only solution I have found so far is to use tee, but that does produced interleaved output. What do I mean by this:
If e.g. a file f reads
a
b
I would like to execute
cat f | HERE_BE_COMMAND
to obtain
a
b
a
b
If I use tee - as the command, the output typically looks something like
a
a
b
b
Any suggestions for a clean solution?
Clarification
The cat f command is just an example of where the input can come from. In reality, it is a command that can (should) only be executed once. I also want to refrain from using temporary files, as the processed data is sort of sensitive and temporary files are always error-prone when the executed command gets interrupted. Furthermore, I am not interested in a solution that involves additional scripts (as stated above, it should be a one-liner) or preparatory commands that need to be executed prior to the actual duplication command.
Solution 1:
<command_which_produces_output> | { a="$(</dev/stdin)"; echo "$a"; echo "$a"; }
In this way, you're saving the content from the standard input in a (choose a better name please), and then echo'ing twice.
Notice $(</dev/stdin) is a similar but more efficient way to do $(cat /dev/stdin).
Solution 2:
Use tee in the following way:
<command_which_produces_output> | tee >(echo "$(</dev/stdin)")
Here, you're firstly writing to the standard output (that's what tee does), and also writing to a FIFO file created by process substitution:
>(echo "$(</dev/stdin)")
See for example the file it creates in my system:
$ echo >(echo "$(</dev/stdin)")
/dev/fd/63
Now, the echo "$(</dev/stdin)" part is just the way I found to firstly read the entire file before printing it. It echo'es the content read from the process substitution's standard input, but once all the input is read (not like cat that prints line by line).
Store the second input in a temp file.
cat f | tee /tmp/showlater
cat /tmp/showlater
rm /tmp/showlater
Update:
As shown in the comments (#j.a.) the solution above will need to be adjusted into the OP's real needs. Calling will be easier in a function and what do you want to do with errors in your initial commands and in the tee/cat/rm ?
I recommend tee /dev/stdout.
cat f | tee /dev/stdout
One possible solution I found is the following awk command:
awk '{d[NR] = $0} END {for (i=1;i<=NR;i++) print d[i]; for (i=1;i<=NR;i++) print d[i]}'
However, I feel there must be a more "canonical" way of doing this using.
a simple bash script ?
But this will store all the stdin, why not store the output to a file a read the file both if you need ?
full=""
while read line
do
echo "$line"
full="$full$line\n"
done
printf $full
The best way would be to store the output in a file and show it later on. Using tee has the advantage of showing the output as it comes:
if tmpfile=$(mktemp); then
commands | tee "$tmpfile"
cat "$tmpfile"
rm "$tmpfile"
else
echo "Error creating temporary file" >&2
exit 1
fi
If the amount of output is limited, you can do this:
output=$(commands); echo "$output$output"

Shell script copying lines from multiple files

I have multiple files which have the same structure but not the same data. Say their names are values_#####.txt (values_00001.txt, values_00002.txt, etc.).
I want to extract a specific line from each file and copy it in another file. For example, I want to extract the 8th line from values_00001.txt, the 16th line from values_00002.txt, the 24th line from values_00003.txt and so on (increment = 8 each time), and copy them line by line in a new file (say values.dat).
I am new to shell scripting, I tried to use sed, but I didn't figure out how to do that.
Thank you in advance for your answers !
I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
n=8
while read f; do
sed $n'q;d' "$f" >> output.txt
((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)
This can make it:
for var in {1..NUMBER}
do
awk -v line=$var 'NR==8*line' values_${var}.txt >> values.dat
done
Explanation
The for loop is basic.
-v line=$var "gives" the $var value to awk, so it can be used with the variable line.
'NR==8*line' prints the line number 8*{value we are checking}.
values_${var}.txt gets the file values_1.txt, values_2.txt, and so on.
>> values.dat redirects to values.dat file.
Test
I created 3 equal files a1, a2, a3. They contain 30 lines, being each one the line number:
$ cat a1
1
2
3
4
...
Executing the one liner:
$ for var in {1..3}; do awk -v line=$var 'NR==8*line' a${var} >> values.dat; done
$ cat values.dat
8
16
24

Merging, then splitting files

Using a for loop, I can merge all of the files in a directory that end with *.txt:
for filename in *.txt; do
cat "${filename}"
echo
done > output.txt
After doing this, I will run output.txt through various scripts, in which the text will be changed considerably. After that, I want to split the files, at the same places at which they were merged, into different files (output01.txt, output02.txt, etc.).
How can I split the files at the same place they were merged?
This cannot be based on line number, because the scripts will add \t in places.
I think a solution that might work is to place "#########" at the end of each of the initial *.txt files before merging them, but I don't know how to get BASH to split the files again at that mark.
Instead of that for loop for concatenating, you can just use cat *.txt.
Anyway, why don't you just perform the scripts on each file independently within the for loop?
If you really want to combine and re-segregate, you can use:
for filename in *.txt; do
cat "${filename}"
echo "#####"
done > output.txt
# Pass output.txt through whatever
awk 'BEGIN { fileno = 1; file = sprintf("output%02d.txt", fileno) };
{ if($1 ~ /#####/) { fileno++;
file = sprintf("output%02d.txt", fileno);
next }
else print >file
}' output.txt
The canonical answer would be:
tar c *.txt > output.txt
You could split/unmerge them exactly by doing
tar xf output.txt # in the current directory
tar x -C /tmp/splitfiles/ -f output.txt
Now if you really want to do stuff like that in a loop and extract to stdout/a pipe, you could:
while read fname < <(tar tf output.txt)
do
# extract named to pipe
tar -xOf output.txt "$fname" | myprogram "$fname"
done
However, that would possibly not be very efficient. You could consider just doing
while read fname < <(tar x -v -C /tmp/splitfiles/ -f output.txt)
do
# handle extracted file
myprogram "/tmp/splitfiles/$fname"
unlink "/tmp/splitfiles/$fname" # drop the temp file
done
This will be completely asynchronous (so if extraction or even the transmission of the archive is slow, the first files can already be processed while waiting for more data to arrive).
See also my other answer https://stackoverflow.com/a/8341221/85371 (look for the older answer part, since that question was changed to be very specific later)
As Fredrik wrote here you can use csplit to split your merged file.

Unix: How can I prepend output to a file?

Specifically, I'm using a combination of >> and tee in a custom alias to store new Homebrew updates in a text file, as well as output on screen:
alias bu="echo `date "+%Y-%m-%d at %H:%M"` \
>> ~/Documents/Homebrew\ Updates.txt && \
brew update | tee -a ~/Documents/Homebrew\ Updates.txt"
Question: What if I wish to prepend this output in my textfile, i.e. placed at the beginning of the file as opposed to appending it to the end?
Edit1: As someone reported in the answers below, the use of temp files might be a good approach, which at least helped me partially:
targetLog="~/Documents/Homebrew\ Updates.txt"
alias bu="(brew update | cat - $targetLog \
> /tmp/out1 && mv /tmp/out1 $targetLog \
&& echo `date "+%Y-%m-%d at %H:%M":%S` | \
cat - $targetLog > /tmp/out2 \
&& mv /tmp/out2 $targetLog)"
But the problem is the output to STDOUT (previously made possible by tee), which I'm not sure can be incorporated in this tempfile approach …?
sed will happily do that for you, using -i to edit in place, eg.
sed -i -e "1i `date "+%Y-%m-%d at %H:%M"`" some_file
This works by creating an output file:
Let's say we have the initial contents on file.txt
echo "first line" > file.txt
echo "second line" >> file.txt
So, file.txt is our 'bottom' text file. Now prepend into a new 'output' file
echo "add new first line" | cat - file.txt > output.txt # <--- Just this command
Now, output has the contents the way we want. If you need your old name:
mv output.txt file.txt
cat file.txt
The only simple and safe way to modify an input file using bash tools, is to use a temp file, eg. sed -i uses a temp file behind the scenes (but to be robust sed needs more).
Some of the methods used have a subtle "can break things" trap, when, rather than running your command on the real data file, you run it on a symbolic link (to the file you intend to modify). Unless catered for correctly, this can break the link and convert it into a real file which receives the mods and leaves the original real file without the intended mods and without the symlink (no error exit-code results)
To avoid this with sed, you need to use the --follow-symlinks option.
For other methods, just be aware that it needs to follow symlinks (when you act on such a link)
Using a temp file, then rm temp file works only if "file" is not a symlink.
One safe way is to use sponge from package moreutils
Unlike a shell redirect, sponge soaks up all its input before
opening
the output file. This allows for constructing pipelines that read from
and write to the same file.
sponge is a good general way to handle this type of situation.
Here is an example, using sponge
hbu=~/'Documents/Homebrew Updates.txt'
{ date "+%Y-%m-%d at %H:%M"; cat "$hbu"; } | sponge "$hbu"
Simplest way IMO would be to use echo and cat:
echo "Prepend" | cat - inputfile > outputfile
Or for your example basically replace the tee -a ~/Documents/Homebrew\ Updates.txt with cat - ~/Documents/Homebrew\ Updates.txt > ~/Documents/Homebrew\ Updates.txt
Edit: As stated by hasturkun this won't work, try:
echo "Prepend" | cat - file | tee file
But this isn't the most efficient way of doing it any more...
Similar to the accepted answer, however if you are coming here because you want to prepend to the first line - rather than prepend an entirely new line - then use this command.
sed -i "1 s/^/string_replacement/" some_file
The -i flag will do a replacement within the file (rather than creating a new file).
Then the 1 will only do the replacement on line 1.
Finally, the s command is used which has the following syntax s/find/replacement/flags.
In our case we don't need any flags. The ^ is called a caret and it is used to represent the very start of a string.
Try this http://www.unix.com/shell-programming-scripting/42200-add-text-beginning-file.html
There is no direct operator or command AFAIK.You use echo, cat, and mv to get the effect.
{ date; brew update |tee /dev/tty; cat updates.txt; } >updates.txt.new
mv updates.txt.new updates.txt
I've no idea why you want to do this. It's pretty standard that logs like this have later entries appearing, well, later in the file.

Resources