Concatenate strings, files and program output in Bash - bash

The use case is, in my case, CSS file concatenation, before it gets minimized. To concat two CSS files:
cat 1.css 2.css > out.css
To add some text at one single position, I can do
cat 1.css <<SOMESTUFF 2.css > out.css
This will end in the middle.
SOMESTUFF
To add STDOUT from one other program:
sed 's/foo/bar/g' 3.css | cat 1.css - 2.css > out.css
So far so good. But I regularly come in situations, where I need to mix several strings, files and even program output together, like copyright headers, files preprocessed by sed(1) and so on. I'd like to concatenate them together in as little steps and temporary files as possible, while having the freedom of choosing the order.
In short, I'm looking for a way to do this in as little steps as possible in Bash:
command [string|file|output]+ > concatenated
# note the plus ;-) --------^
(Basically, having a cat to handle multiple STDINs would be sufficient, I guess, like
<(echo "FOO") <(sed ...) <(echo "BAR") cat 1.css -echo1- -sed- 2.css -echo2-
But I fail to see, how I can access those.)

This works:
cat 1.css <(echo "FOO") <(sed ...) 2.css <(echo "BAR")

You can do:
echo "$(command 1)" "$(command 2)" ... "$(command n)" > outputFile

You can add all the commands in a subshell, which is redirected to a file:
(
cat 1.css
echo "FOO"
sed ...
echo BAR
cat 2.css
) > output
You can also append to a file with >>. For example:
cat 1.css > output
echo "FOO" >> output
sed ... >> output
echo "BAR" >> output
cat 2.css >> output
(This potentially opens and closes the file repeatedly)

Related

Speed up bash for loop which contains multiple sed commands

my bash for loop looks like:
for i in read_* ; do
cut -f1 $i | sponge $i
sed -i '1 s/^/>/g' $i
sed -i '3 s/^/>ref\n/g' $i
sed -i '4d' $i
sed -i '1h;2H;1,2d;4G' $i
mv $i $i.fasta
done
Are there any methods of speeding up this process, perhaps using GNU parallel?
EDIT: Added input and expected output.
Input:
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Hopeful output:
>ref
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
I used the sed -i '1h;2H;1,2d;4G' $i command to swap lines 2 and 4.
If I read it right, this should create the same result, though it would probably help a LOT if I could see what your input and expected output look like...
awk '{$0=$1}
FNR==1{hd=">"$0; next}
FNR==2{hd=hd"\n"$0;next}
FNR==3{print ">ref\n"$0 > FILENAME".fasta"}
FNR==4{next}
FNR==5{print hd"\n"$0 > FILENAME".fasta"}
' read_*
My input files:
$: cat read_x
foo x
bar x
baz x
last x
curiosity x
$: cat read_y
FOO y
BAR y
BAZ y
LAST y
CURIOSITY y
and the resulting output files:
$: cat read_x.fasta
>ref
baz
>foo
bar
curiosity
$: cat read_y.fasta
>ref
BAZ
>FOO
BAR
CURIOSITY
This runs in one pass with no loop aside from awk's usual internals, and leaves the originals in place so you can check it first. If all is good, all that's left is to remove the originals. For that, I would use extended globbing.
$: shopt -s extglob; rm read_!(*.fasta)
That will clean up the original inputs but not the new outputs.
Same results, three commands, no loops.
I am, or course, making some assumptions about what you are meaning to do that might not be accurate. To get this format in a single sed call -
$: sed -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_x
>ref
baz
>foo
bar
curiosity
but that's not the same commands you used, so maybe I'm misreading it.
To use this to in-place edit multiple files at a time (instead of calling it in a loop on each file), use -si so that the line numbers apply to each file rather than the stream of records they collectively produce.
DON'T use -is, though you could use -i -s.
$: sed -s -i -e 's/[[:space:]].*//' -e '1{s/^/>/;h;d}' -e '2{H;s/.*/>ref/}' -e '4x' read_*
This still leaves you with the issue of renaming each, but xargs makes that pretty easy in the given example.
printf "%s\n" read_* | xargs -I# mv # #.fasta
addendum
Using the file you gave in the OP, assuming every file is the same general structure and exactly 4 lines -
$: cat file_0 # I made files 0 through 7, but with same data
sampleid 97 stuff 2086 42 213M = 3322 1431
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
$: sed -Esi '1{s/^([^[:space:]]+).*/>\1/;h;s/.*/>ref/}; 3x;' file_?
$: cat file_0 # used a diff on each, worked on all at once
>ref
TATTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
>sampleid
TTTTTAGGGAAGATCTGGCCTTCCTACAAGGGAAGGCCAGGGAATTTTCTTCAGAGCAGA
Breakout:
-Esi Extended pattern matching, separate file linecounts, in-place edits
1{...}; Collectively do these commands, in order, only on every line 1
s/^([^[:space:]]+).*/>\1/ add leading > but strip everything after any whitespace
h store the resulting >\1 line in the hold buffer
s/.*/>ref/ then replace the whole line with a literal >ref
`3x' swap line 3 with the value in the hold buffer from line 1
file_? I used a glob to supply the appropriate list of files all at once.
Doing same with awk:
$: awk 'FNR==1{id=">"$1; print ">ref" >FILENAME".fasta"; next} FNR==3{print id > FILENAME".fasta"; next} {print $0 > FILENAME".fasta"}' file_?
Then you can do file management as above with the xargs/mv for the sed or the shopt/rm for the awk - or we could add a little organizational work in awk if you like. Consider this:
awk 'BEGIN { system(" mkdir -p done ") }
FNR==1 { id=">"$1; print ">ref" > FILENAME".fasta"; next } # skip printing original
FNR==3 { print id > FILENAME".fasta"; next } # skip printing original
{ print $0 > FILENAME".fasta" } # every line NOT skipped
FNR==4 { close(FILENAME); close(FILENAME".fasta");
system("mv " FILENAME " done/")
}' file_?
Then if there are any problems, it's easy to delete the fasta's, move the originals back, adjust the code, and try again. If everything is ok, it's fast and easy to rm -fr done, yes?
Note that I really only added the mkdir inside a system call in the awk to show that you can, and to keep from having to manually do it separately if you have to run a few iterations or move it all into a wrapper script, etc.
The code in the question runs multiple subprocesses (cut, sponge, sed four times, and mv) for each file that is processed. Running subprocesses is relatively slow, so you can speed up the code significantly by reducing the number of them.
This Shellcheck-clean code is one way to do it:
#! /bin/bash -p
old_files=()
for f in read_* ; do
readarray -t lines <"$f"
printf '>ref\n%s\n>%s\n%s\n' \
"${lines[3]}" "${lines[0]%%[[:space:]]*}" "${lines[1]}" >"$f.fasta"
old_files+=( "$f" )
done
rm -- "${old_files[#]}"
This runs no subprocesses when processing individual files. It just reads the lines of the old file into an array using the built-in readarray command and writes to the new file using the built-in printf.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of the %% in ${lines[0]%%[[:space:]]*}.
To avoid running rm for each file, the code keeps a list of files to be deleted and removes all of them at the end. If you try the code, consider commenting the rm line until you are very confident that the rest of the code is doing what you want.

looping with grep over several files

I have multiple files /text-1.txt, /text-2.txt ... /text-20.txt
and what I want to do is to grep for two patterns and stitch them into one file.
For example:
I have
grep "Int_dogs" /text-1.txt > /text-1-dogs.txt
grep "Int_cats" /text-1.txt> /text-1-cats.txt
cat /text-1-dogs.txt /text-1-cats.txt > /text-1-output.txt
I want to repeat this for all 20 files above. Is there an efficient way in bash/awk, etc. to do this ?
#!/bin/sh
count=1
next () {
[[ "${count}" -lt 21 ]] && main
[[ "${count}" -eq 21 ]] && exit 0
}
main () {
file="text-${count}"
grep "Int_dogs" "${file}.txt" > "${file}-dogs.txt"
grep "Int_cats" "${file}.txt" > "${file}-cats.txt"
cat "${file}-dogs.txt" "${file}-cats.txt" > "${file}-output.txt"
count=$((count+1))
next
}
next
grep has some features you seem not to be aware of:
grep can be launched on lists of files, but the output will be different:
For a single file, the output will only contain the filtered line, like in this example:
cat text-1.txt
I have a cat.
I have a dog.
I have a canary.
grep "cat" text-1.txt
I have a cat.
For multiple files, also the filename will be shown in the output: let's add another textfile:
cat text-2.txt
I don't have a dog.
I don't have a cat.
I don't have a canary.
grep "cat" text-*.txt
text-1.txt: I have a cat.
text-2.txt: I don't have a cat.
grep can be extended to search for multiple patterns in files, using the -E switch. The patterns need to be separated using a pipe symbol:
grep -E "cat|dog" text-1.txt
I have a dog.
I have a cat.
(summary of the previous two points + the remark that grep -E equals egrep):
egrep "cat|dog" text-*.txt
text-1.txt:I have a dog.
text-1.txt:I have a cat.
text-2.txt:I don't have a dog.
text-2.txt:I don't have a cat.
So, in order to redirect this to an output file, you can simply say:
egrep "cat|dog" text-*.txt >text-1-output.txt
Assuming you're using bash.
Try this:
for i in $(seq 1 20) ;do rm -f text-${i}-output.txt ; grep -E "Int_dogs|Int_cats" text-${i}.txt >> text-${i}-output.txt ;done
Details
This one-line script does the following:
Original files are intended to have the following name order/syntax:
text-<INTEGER_NUMBER>.txt - Example: text-1.txt, text-2.txt, ... text-100.txt.
Creates a loop starting from 1 to <N> and <N> is the number of files you want to process.
Warn: rm -f text-${i}-output.txt command first will be run and remove the possible outputfile (if there is any), to ensure that a fresh new output file will be only available at the end of the process.
grep -E "Int_dogs|Int_cats" text-${i}.txt will try to match both strings in the original file and by >> text-${i}-output.txt all the matched lines will be redirected to a newly created output file with the relevant number of the original file. Example: if integer number in original file is 5 text-5.txt, then text-5-output.txt file will be created & contain the matched string lines (if any).

how to add append hostname in shell [duplicate]

The use case is, in my case, CSS file concatenation, before it gets minimized. To concat two CSS files:
cat 1.css 2.css > out.css
To add some text at one single position, I can do
cat 1.css <<SOMESTUFF 2.css > out.css
This will end in the middle.
SOMESTUFF
To add STDOUT from one other program:
sed 's/foo/bar/g' 3.css | cat 1.css - 2.css > out.css
So far so good. But I regularly come in situations, where I need to mix several strings, files and even program output together, like copyright headers, files preprocessed by sed(1) and so on. I'd like to concatenate them together in as little steps and temporary files as possible, while having the freedom of choosing the order.
In short, I'm looking for a way to do this in as little steps as possible in Bash:
command [string|file|output]+ > concatenated
# note the plus ;-) --------^
(Basically, having a cat to handle multiple STDINs would be sufficient, I guess, like
<(echo "FOO") <(sed ...) <(echo "BAR") cat 1.css -echo1- -sed- 2.css -echo2-
But I fail to see, how I can access those.)
This works:
cat 1.css <(echo "FOO") <(sed ...) 2.css <(echo "BAR")
You can do:
echo "$(command 1)" "$(command 2)" ... "$(command n)" > outputFile
You can add all the commands in a subshell, which is redirected to a file:
(
cat 1.css
echo "FOO"
sed ...
echo BAR
cat 2.css
) > output
You can also append to a file with >>. For example:
cat 1.css > output
echo "FOO" >> output
sed ... >> output
echo "BAR" >> output
cat 2.css >> output
(This potentially opens and closes the file repeatedly)

Redirecting output of list of commands

I am using grep to pull out lines that match 0. in multiple files. For files that do not contain 0., I want it to output "None" to a file. If it finds matches I want it to output the matches to a file. I have two example files that look like this:
$ cat sea.txt
shrimp 0.352
clam 0.632
$ cat land.txt
dog 0
cat 0
In the example files I would get both lines output from the sea.txt, but from the land.txt file I would just get "None" using the following code:
$ grep "0." sea.txt || echo "None"
The double pipe (||) can be read as "do foo or else do bar", or "if not foo then bar". It works perfect but the problem I am having is I cannot get it to output either the matches (as it would find in the sea.txt file) or "None" (as it would find in the land.txt file) to a file. It always prints to the terminal. I have tried the following as well as others without any luck of getting the output saved to a file:
grep "0." sea.txt || echo "None" > output.txt
grep "0." sea.txt || echo "None" 2> output.txt
Is there a way to get it to save to a file? If not is there anyway I can use the lines printed to the terminal?
You can group commands with { }:
$ { grep '0\.' sea.txt || echo "None"; } > output.txt
$ cat output.txt
shrimp 0.352
clam 0.632
Notice the ;, which is mandatory before the closing brace.
Also, I've changed your regex to 0\. because you want to match a literal ., and not any character (which the unescaped . does). Finally, I've replaced the quotes with single quotes – this has no effect here, but prevents surprises with special characters in longer regexes.
how about this?
echo $(grep "0." sea.txt || echo "None") > output.txt

sed to grep lines after specific line for further processing

I am working with a script which looks for file lines after a specific line and process them to get data from it.
Let me illustrate with an example,
if file "sample.log" has lines like
qwerty asdf foo bar
foo
time: 1:00 PM
foo1 bar1
foo foo fooo copying file abc/def/ghi/foo.txt
bar bar1 bar2 copying file efg/qwe/bar.txt
foo
My script should search for contents after time: 1:00 PM. After finding those lines, it must look for lines matching the pattern "copying" and get the path specified in the line.
In this case, output written to another file should be
abc/def/ghi/foo.txt
efg/qwe/bar.txt
I tried this using following command but getting empty string as output. Please guide me with this
sed -n '/^time: 1:00 PM/{/^(.*)copying file/s/^(.*)copying file //p}' ../../sample.log
If you're already in Tcl, you could code it in Tcl:
set fid [open "FILE" r]
set have_time false
while {[gets $fid line] != -1} {
if {$have_time && [regexp {copying file (.*)} $line -> filename]} {
puts $filename
} elseif {[string first "time:" $line] > -1} {
set have_time true
}
}
close $fid
If your file is quite huge, exec sed may be faster, but you'll have to see for yourself.
Note, if you're going to exec sed, keep in mind that inside Tcl, single quotes have no special meaning: use braces to quote the sed program.
exec sed -e {do stuff here} FILE
sed '/1:00 PM/,$ {/copying/s:.*file \(.*\):\1:p};d' FILE
This might work for you (GNU sed):
sed -ne '/1:00 PM/,$!b' -e 's/.*copying.* //w copy' file

Resources