Save output file each run in a loop - bash

I just want to change one number inside a file then run it and save the output file. This need to be done in an iterative way.
cat file1
decay=( 1 5 2 6 )
for i in $decay
do
run file2.i
done
cat file2.i
cell 215 cell 115 cell ${i}
So now for each decay(i), I need file2.i to use the decay(i). Also, the output file will always have the same name file2.i_outp. How to save the output file each run instead of overwriting each i.

Copy the template file2.i to a temporary file, replacing ${decay} with the current array element.
decay=(1 5 2 6)
for i in $decay
do
sed "s/\\${i}/$i/" file2.i > /tmp/file2.i.$$
run /tmp/file2.i.$$
rm /tmp/file2.i.$$
done > file2.i_outp

This one worked fine
for i in {1 5 2 6}
do
sed "s/\\${i}/$i/" file2.i > /tmp/file2.i.$$
run /tmp/file2.i.$$
rm /tmp/file2.i.$$
done > file2.i_outp

Related

Splitting multiple input files into multiple outputs using split function in linux

I have 8 files I would like to split into 5 chunks per file. I would normally do this individually but would like to run this as a loop. I work within a HPC.
I have created a list of the file names and labelled it "variantlist.txt". My code is:
for f in 'cat variantlist.txt'; do split ${f} -n 5 -d; done
However, it only splits the final file in the variantlist.txt file outputting 5 chunks from the final entry only.
Even if I list the files individually:
for f in chr001.vcf chr002 ...chr008.vcf ; do split ${f} -n 5 -d; done
It still only splits the final file into 5 chunks.
Not sure where I am going wrong here. The desired output would be 40 chunks, 5 per chromosome. Your help would be greatly appreciated.
Many thanks
The split is creating the same set of files each time and overwriting the previous ones. Here's one way to handle that -
for f in $(<variantlist.txt) # don't use cat
do mkdir -p $f.split # make a subdir for the files
( cd $f.split && # change into the subdir only in a subshell
split ../$f -n 5 -d # split from there
) # close the subshell, parent still in base dir
done
Or you could just do this -
while read f # grab each filename
do split $f -n 5 -d # split it
for x in x?? # for each split file
do mv $x $f.$x # rename it to include the parent file name
done
done < variantlist.txt # take names from this file
This is a lot slower, but doesn't use subdirs.
My favorite, though -
xargs -I {} split {} -n 5 -d {} < variantlist.txt
The last arg becomes the PREFIX for split instead of the default of x.
EDIT -- with 2 billion lines per file, use this one:
for f in $(<variantlist.txt)
do split "$f" -d -n 5 "$f" & # run all in background at the same time
done
When using split the -n swicth will determine the number of output files that the orinal is split into...
You need -l for the number of lines you need, 5 in your case:
split -l 5 ${f}

How to delete multiple first columns from multiple files?

I have multiple files like this:
trans_ENSG00000047849.txt.traw
trans_ENSG00000047848.txt.traw
trans_ENSG00000047847.txt.traw
...
and each has around 300 columns. Columns are separated with tab.I would like to remove the first 7 columns from each of those files.
I know how to do it for each file:
cut -f 7- trans_ENSG00000047849.txt.traw > trans_ENSG00000047849.txt.trawN
Is there is a way to do it at once for all files?
NOTE: there is a tab at the beginning. Therefore I used here cut -f 7 rather than cut -f 8 to remove the first 7 columns.
Just use a for loop:
for file in *.txt.traw
do
cut -f 7- "$file" > "$file"N
done
Backup your files first, and try this (GNU sed):
sed -ri 's/^([^\t]*\t){7}//' trans_*.txt.traw
As -i to sed will change your files in place. (You can remove the i for testing).
Eg:
$ cat file
1 2 3 4 5 6 7 8 9 0
a b c d e f g h i j
dfad da
$ sed -ri 's/^([^\t]*\t){7}//' file
$ cat file
8 9 0
h i j
dfad da
However, the command's for simple, so it won't remove when there're less than 7 columns. (Guess you won't have lines like this, right?)
If you still want to remove when there're less than 7 columns:
sed -r 's/^([^\t]*(\t|$)){,7}//'

how can I remove lines from file using command line?

how can i delete part of the file with command line?
I have tried using sed the following way:
c:\sed '1,2!d' res.txt > res.txt
but the file became empty
what i exepect to get is
1 a
2 b
3 c
4 d
to become
1 a
2 b
in the same file res.txt
Add -i or --in-place switch to sed to read and write the same file. Also, windows command line uses double quotes. So you should use
sed -i "1,2!d" res.txt
Just try to call c:\sed '1,2!d' res.txt. You'll see correct result:
1 a
2 b
So, you can't use the same file for input and output. You can use different files and move/copy after it: c:\sed '1,2!d' res.txt > res.tmp & move /y res.tmp res.txt

Shell script copying lines from multiple files

I have multiple files which have the same structure but not the same data. Say their names are values_#####.txt (values_00001.txt, values_00002.txt, etc.).
I want to extract a specific line from each file and copy it in another file. For example, I want to extract the 8th line from values_00001.txt, the 16th line from values_00002.txt, the 24th line from values_00003.txt and so on (increment = 8 each time), and copy them line by line in a new file (say values.dat).
I am new to shell scripting, I tried to use sed, but I didn't figure out how to do that.
Thank you in advance for your answers !
I believe ordering of files is also important to make sure you get output in desired sequence.
Consider this script:
n=8
while read f; do
sed $n'q;d' "$f" >> output.txt
((n+=8))
done < <(printf "%s\n" values_*.txt|sort -t_ -nk2,2)
This can make it:
for var in {1..NUMBER}
do
awk -v line=$var 'NR==8*line' values_${var}.txt >> values.dat
done
Explanation
The for loop is basic.
-v line=$var "gives" the $var value to awk, so it can be used with the variable line.
'NR==8*line' prints the line number 8*{value we are checking}.
values_${var}.txt gets the file values_1.txt, values_2.txt, and so on.
>> values.dat redirects to values.dat file.
Test
I created 3 equal files a1, a2, a3. They contain 30 lines, being each one the line number:
$ cat a1
1
2
3
4
...
Executing the one liner:
$ for var in {1..3}; do awk -v line=$var 'NR==8*line' a${var} >> values.dat; done
$ cat values.dat
8
16
24

Send diff output to 3 files

What I want to do is, diff 2 files and write the diff output to 3 different files.
I can tell diff to format its output like:
diff a.txt b.txt --new-line-format=... --old-line-format=... --unchanged-line-format=...
And using this:
diff a.txt b.txt --new-bla-bla="echo %l>new.txt" --old--="echo %l>old" ...
I can output to 3 different files, except the double quotes don't appear.
I want to do this as minimally as possible, so running 3 diffs, etc are not an option
Here's a solution that is maybe a little longer, but more robust as it avoids the need for eval:
diff a.txt b.txt --new-line-format "3 %L" \
--old-line-format "4 %L"\
--unchanged-line-format "5 %L" |\
while read -r fd line; do
echo "$line" >&$fd
done 3> new.txt 4> old.txt 5> unchanged.txt
This works by prefixing each of the new, old, and unchanged lines (respectively) with the file descriptor of the file we will add them to. We then parse the output using read, and echo the line to the correct file descriptor, each of which is redirected to the correct output file.
I wrote the following before I read #chepner's excellent answer:
diff diff_old diff_new --new-line-format='>%L' --old-line-format='<%L' --unchanged-line-format='=%L' |
awk '
function printto(file) {print substr($0,2) > file}
/^>/ {printto("new.txt")}
/^</ {printto("old.txt")}
/^=/ {printto("unchanged.txt")}
'
This works silimarly to his answer but requires another process instead of working in the current[*] shell.
[*] discounting the subshell created for the while commands in a pipeline.

Resources