linux for loop two variables each time - bash

I have several files in a directory and I want to run some linux packages on these files by every two of them, like ERR1045141_1 with ERR1045141_2 and ERR1045144_1 with ERR1045144_2 and so on. So I write a for loop for this but it is not working.
files:
ERR1045141_1.fastq.gz
ERR1045141_2.fastq.gz
ERR1045144_1.fastq.gz
ERR1045144_2.fastq.gz
ERR1045145_1.fastq.gz
ERR1045145_2.fastq.gz
ERR1045146_1.fastq.gz
ERR1045146_2.fastq.gz
ERR1045148_1.fastq.gz
ERR1045148_2.fastq.gz
ERR1045149_1.fastq.gz
ERR1045149_2.fastq.gz
ERR1045151_1.fastq.gz
ERR1045151_2.fastq.gz
ERR1045152_1.fastq.gz
ERR1045152_2.fastq.gz
ERR1045154_1.fastq.gz
ERR1045154_2.fastq.gz
codes:
files=ls
for (( i=0; i<${#files[#]} ; i+=2 )) ; do
echo "${files[i]}" "${files[i+1]}"
done
It did not work and I am not sure is the files=ls has something wrong.Or any better way to do it.please advise.

Try the following if you are sure about the existence of the second file:
for file1 in ERR*_1*
do
file2=`echo $file1 | sed 's/_1/_2/g'`
echo $file1 $file2
done

No, what you really want to do is to process all the 1 files, performing some action on it and its associated 2 file.
You can do that with something as simple as the for loop in this complete test program:
#!/usr/bin/env bash
doSomethingWith() {
echo "[$1] [$2]"
}
touch 'xERR1045141_1.fastq.gz' 'xERR1045141_2.fastq.gz'
touch 'xERR1045144_1.fastq.gz' 'xERR1045144_2.fastq.gz'
touch 'xERR1045145_1.fastq.gz' 'xERR1045145_2.fastq.gz'
touch 'xERR1045146_1.fastq.gz' 'xERR1045146_2.fastq.gz'
touch 'xERR1045148_1.fastq.gz' 'xERR1045148_2.fastq.gz'
touch 'xERR1045149_1.fastq.gz' 'xERR1045149_2.fastq.gz'
touch 'xERR1045151_1.fastq.gz' 'xERR1045151_2.fastq.gz'
touch 'xERR1045152_1.fastq.gz' 'xERR1045152_2.fastq.gz'
touch 'xERR1045154_1.fastq.gz' 'xERR1045154_2.fastq.gz'
touch 'xERR 45154_1.fastq.gz' 'xERR 45154_2.fastq.gz'
for file1 in xERR*_1.fastq.gz ; do
file2="${file1/_1/_2}"
doSomethingWith "${file1}" "${file2}"
done
rm -rf xERR*.fastq.gz
This program outputs:
[xERR1045141_1.fastq.gz] [xERR1045141_2.fastq.gz]
[xERR1045144_1.fastq.gz] [xERR1045144_2.fastq.gz]
[xERR1045145_1.fastq.gz] [xERR1045145_2.fastq.gz]
[xERR1045146_1.fastq.gz] [xERR1045146_2.fastq.gz]
[xERR1045148_1.fastq.gz] [xERR1045148_2.fastq.gz]
[xERR1045149_1.fastq.gz] [xERR1045149_2.fastq.gz]
[xERR1045151_1.fastq.gz] [xERR1045151_2.fastq.gz]
[xERR1045152_1.fastq.gz] [xERR1045152_2.fastq.gz]
[xERR1045154_1.fastq.gz] [xERR1045154_2.fastq.gz]
[xERR 45154_1.fastq.gz] [xERR 45154_2.fastq.gz]
to show that the names are being handled correctly.
Note that I've named the files xERR* so as not to clash with your own files. You should adjust the loop to handle your own files once you're satisfied it will work okay.
And, just as an aside, if you don't want to do anything except for those cases where both files exist, you can simply replace the "action" line with something like:
[[ -f "${file2}" ]] && doSomethingWith "${file1}" "${file2}"
This will bypass those where the 2 file is not a regular file.

Related

How to write every Nth file to new folder

I have this code which scans folders and moves all files in each folder to a new one.
How do I make it so only every Nth file is moved?
#!/bin/bash
# Save this file in the directory containing the folders (bb in this case)
# Then to run it, type:
# ./rencp.sh
# The first output frame number
let "frame=1"
# this is where files will go. A new directory will be created if it doesn't exist
outFolder="collected"
# print info every so many files.
feedbackFreq=250
# prefix for new files
namePrefix="ben_timelapse"
#new extension (uppercase is so ugly)
ext="jpg"
# this will make sure we only get files from camera directories
srcPattern="ND850"
mkdir -p $outFolder
for f in *${srcPattern}/*
do
mv $f `printf "$outFolder/$namePrefix.%05d.$ext" $frame`
if ! ((frame % $feedbackFreq)); then
echo "moved and renamed $frame files to $outFolder"
fi
let "frame++"
done
Pretty sure I need to edit the line for f in *${srcPattern}/* but not sure of the correct syntax
If files in the ND850 folders are sequential when listed (i.e. padded frame numbers), and the folders themselves are in order, then the following code should work.
#!/bin/bash
# Maintain a counter, and the output frame number
let "frame=1"
let "outframe=1"
outFolder="collected"
# frequency
gap=5
namePrefix="ben_timelapse"
#new extension (uppercase is so ugly)
ext="jpg"
srcPattern="ND850"
echo "Copying and renaming 1 in every $gap files"
mkdir -p "$outFolder"
for f in *${srcPattern}/*
do
if ! ((frame % $gap)); then
outfile=`printf "$outFolder/$namePrefix.%05d.$ext" $outframe`
cp $f "$outfile"
echo "copied $f to $outfile"
let "outframe++"
fi
let "frame++"
done
Try this instead of your mv command after do:
if ! ((frame % 5)); then
a=$((frame / 5));
mv $f `printf "$outFolder/$namePrefix.%05d.$ext" $a`
fi
It will move frame=5,10, and so on, to $outFolder/$namePrefix.00001.$ext,$outFolder/$namePrefix.00002.$ext, and so on

Shell script: Copy file and folder N times

I've two documents:
an .json
an folder with random content
where <transaction> is id+sequancial (id1, id2... idn)
I'd like to populate this structure (.json + folder) to n. I mean:
I'd like to have id1.json and id1 folder, an id2.json and id2 folder... idn.json and idn folder.
Is there anyway (shell script) to populate this content?
It would be something like:
for (i=0,i<n,i++) {
copy "id" file to "id+i" file
copy "id" folder to "id+i" folder
}
Any ideas?
Your shell syntax is off but after that, this should be trivial.
#!/bin/bash
for((i=0;i<$1;i++)); do
cp "id".json "id$i".json
cp -r "id" "id$i"
done
This expects the value of n as the sole argument to the script (which is visible inside the script in $1).
The C-style for((...)) loop is Bash only, and will not work with sh.
A proper production script would also check that it received the expected parameter in the expected format (a single positive number) but you will probably want to tackle such complications when you learn more.
Additionaly, here is a version working with sh:
#!/bin/sh
test -e id.json || { (>&2 echo "id.json not found") ; exit 1 ; }
{
seq 1 "$1" 2> /dev/null ||
(>&2 echo "usage: $0 transaction-count") && exit 1
} |
while read i
do
cp "id".json "id$i".json
cp -r "id" "id$i"
done

How to parallel process a function, with loops

So I have this function, I want this function to run everything that It contains in itself at the same time. So far it isn't working, and according to other sources, this is how you do it. The function itself works if its not in parallel.
#!/bin/bash
foo () {
cd ${HOME}/sh/path/to/script/execute
for f in *.sh; do #goes to "execute" directory and executes all
#scripts the current directory "execute" basically run-parts without cron
cd ~/sh/path/to/script
while IFS= read -r l1 #Line 1 in master.txt
IFS= read -r l2 #Line 2 in master.txt
IFS= read -r l3 #Line 3 in master.txt
do
cd /dev/shm/arb
echo ${l1} > arg.txt & echo ${l2} > arg2.txt & echo ${l3} > arg3.txt
cd ${HOME}/sh/path/to/script/execute
bash -H ${f} #executes all scripts inside "execute" folder
cd ~/sh/path/to/script/here
./here.sh &
cd ~/sh/path/to/script &
done <master.txt
done
}
export -f foo
parallel ::: foo
Results in
#No result at all....., just buffers. htop doesn't acknowledge any
#processes, and when this runs its pretty taxing on the cores.
master.txt content
In case this is relevant:
apple_fruit
apple_veggie
veggie_fruit
#apple changes
pear_fruit
pear_veggie
veggie_fruit
#pear changes
cucumber_fruit
...
I'm very new to using parallel, and don't know how it works in advanced(and basic) situations so would the loops interfere? And if it does interfere, is there a workaround?
The result is probably going to be something like:
inner() {
script="$1"
parallel -N3 "'$script' {}; here.sh {}" :::: master.txt
}
export -f inner
parallel inner ::: ${HOME}/sh/path/to/script/execute/*.sh
This will call each of the scripts in ${HOME}/sh/path/to/script/execute/ (and here.sh) with 3 arguments from master.txt like this:
${HOME}/sh/path/to/script/execute/script1.sh apple_fruit apple_veggie veggie_fruit
You need to change the scripts so that:
They get the arguments from the command line (not from arg.txt, arg2.txt, arg3.txt).
They send their output to stdout

A bash script to split a data file into many sub-files as per an index file using dd

I have a large data file that contains many joint files.
It has an separate index file has that file name, start + end byte of each file within the data file.
I'm needing help in creating a bash script to split the large file into it's 1000's of sub files.
Data File : fileafilebfilec etc
Index File:
filename.png<0>3049
folder\filename2.png<3049>6136.
I guess this needs to loop through each line of the index file, then using dd to extract the relevant bytes into a file. Maybe a fiddly part might be the folder structure bracket being windows style rather than linux style.
Any help much appreciated.
while read p; do
q=${p#*<}
startbyte=${q%>*}
endbyte=${q#*>}
filename=${p%<*}
count=$(($endbyte - $startbyte))
toprint="processing $filename startbyte: $startbyte endbyte: $endbyte count: $c$
echo $toprint
done <indexfile
Worked it out :-) FYI:
while read p; do
#sort out variables
q=${p#*<}
startbyte=${q%>*}
endbyte=${q#*>}
filename=${p%<*}
count=$(($endbyte - $startbyte))
#let it know we're working
toprint="processing $filename startbyte: $startbyte endbyte: $endbyte count: $c$
echo $toprint
if [[ $filename == *"/"* ]]; then
echo "have found /"
directory=${filename%/*}
#if no directory exists, create it
if [ ! -d "$directory" ]; then
# Control will enter here if $directory doesn't exist.
echo "directory not found - creating one"
mkdir ~/etg/$directory
fi
fi
dd skip=$startbyte count=$count if=~/etg/largefile of=~/etg/$filename bs=1
done <indexfile

How to parse the files by name?

I have files like this in my folder
262_V01_C07_R099_THx_BH_4096H.dat~ birrp.5.pdf diagnostic.f junho.1n1.rp junho.1r2.rp junho.2r.2c2 Makefile~ nilton.1n2.rp nilton.2n.2c2 nilton.diag weight.f
AdvProExampleScript_pb01.script birrp.f ewerton.diag junho.1n.2c2 junho.2n1.rf junho.2r2.rf math.f nilton.1r1.rf nilton.2n2.rf nilton.j wrthx
BasicModeExampleScript_pb01.script birrp.tar ewerton.j junho.1n2.rf junho.2n1.rp junho.2r2.rp mimi.diag nilton.1r1.rp nilton.2n2.rp parameters.h wrthx.f90
BasicModeExampleScript_pb01.script~ calibration2401.txt fft.f junho.1n2.rp junho.2n.2c2 junho.diag mimi.j nilton.1r.2c2 nilton.2r1.rf parameters.h~ wrthx.f90~
bbcalfunc.py Calibration Files filter.f junho.1r1.rf junho.2n2.rf junho.j nilton.1n1.rf nilton.1r2.rf nilton.2r1.rp rarfilt.f zlinpack.f
bbcalfunc.py~ coherence.f hx.sens junho.1r1.rp junho.2n2.rp karn.diag nilton.1n1.rp nilton.1r2.rp nilton.2r.2c2 response.f
bin dat inputxgarcia.txt junho.1r.2c2 junho.2r1.rf karn.j nilton.1n.2c2 nilton.2n1.rf nilton.2r2.rf rtpss.f
birrp dataft.f junho.1n1.rf junho.1r2.rf junho.2r1.rp Makefile nilton.1n2.rf nilton.2n1.rp nilton.2r2.rp utils.f
I would like to separate them,so how should I write a script that will print on screen all nilton files?I have tried with awk but it is not working.
Here is a portable POSIX shell solution that uses no outside utilities:
#!/bin/sh
for i in *
do case "$i" in
nilton*)
printf "%s\n" "$i"
;;
esac
done

Resources