I have multiple .vcf.gz files that look like this: (and there is 22 of them)
ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
...
ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
And I have a script filter.sh which can run on one file that looks like this. How would I loop trough all those 22 files?
filter_and_convert ()
{
echo -ne "varID\t"
bcftools view $1 -S $2 --force-samples -Ou | bcftools query -l | tr '\n' '\t' | sed 's/\t$/\n/'
#The first python inline script will check if a variant is blacklisted
NOW=$(date +%Y-%m-%d/%H:%M:%S)
echo "Starting at $NOW"
bcftools view -S $2 --force-samples $1 -Ou | \
bcftools query -f '%ID[\t%GT]\n' | \
awk '
{
for (i = 1; i <= NF; i++) {
if (substr($i,0,1) == "c") {
printf("%s",$i)
} else if ( substr($i, 0, 1) == ".") {
printf("\tNA")
} else if ($i ~ "[0-9]|[0-9]") {
n = split($i, array, "|")
printf("\t%d",array[1]+array[2])
} else {
#printf("\t%s",$i)
printf("Unexpected: %s",$i)
exit 1
}
}
printf("\n")
}
'
NOW=$(date +%Y-%m-%d/%H:%M:%S)
echo "Ending at $NOW"
}
filter_and_convert ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz samples.txt
Replace
filter_and_convert ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz samples.txt
with a for loop that calls the function on all the files that match a wildcard.
for file in ALL.*.vcf.gz; do
filter_and_convert "$file" samples.txt
done
v="ALL.chr"
p=".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
for i in {1..22};
do
file=$v$i$p
bash filter.sh $file sample.txt
done
Use this file variable with your script. It should work. I am assuming first argument to your filer.sh is filename. Rest of the argument you can add
Related
I have written a script that will check the contents of $path and print the i+1 value in the file mentioned in the $path.
#! /bin/bash
echo "Enter number of records"
read num
count=1
while [ $count -le $num ]
do
echo "Enter path"
read path
var2=`echo "${path##*/}"`
var3=`awk '{for(i=1;i<NF;i++) if ($i == "'${var2}'"} print $(i+1)}' ${path} | head -1`
echo "done,$var3" >> result.csv
((count++))
done
If the value of $path was /c/training/sample.sh or /c/training/textfile, with the below content
sample.sh
#!/bin/bash
#sample.sh 120
<psuedo-code>
textfile.txt
textfile.txt 0
This is random text
result.csv or how the output csv file should look like
done,120
done,0
So instead of reading the path each time, how can I read all the paths, if they are stored in a separated csv file.
Sampleinput.csv
/c/training/sample.sh,User1
/c/training/textfile.txt,USer2
How can I implement the awk mentioned above so that it will read each value in field 1 of the Sampleinput.csv and do the same thing
A much more versatile and usable arrangement is to pass the paths as parameters to the script.
#!/bin/sh
awk 'FNR == 1 { path=FILENAME; sub(/.*\//, "", path) }
$0 ~ path && / [0-9]+$/ { print path "," 0+$NF; nextfile }' "$#"
The nextfile statement is included in POSIX but might not be supported if you have a really old Awk or are using a non-POSIX system.
Usage:
scriptname /c/training/sample.sh /c/training/textfile >>result.csv
I think this is what you're trying to do (untested) using GNU awk for gensub(), ARGIND, nextfile, and ENDFILE:
#!/usr/bin/env bash
IFS= read -p 'Enter number of records: ' -r num
awk -v maxFiles="$num" '
BEGIN { OFS="," }
ARGIND == 1 {
if ( ARGC < maxFiles ) {
ARGV[ARGC++] = gensub(/,.*/,"",1)
}
next
}
FNR == 1 {
fname = gensub(".*/","",1,FILENAME)
}
{
for (i=1; i<=NF; i++) {
if ( gensub(/^#+/,"",1,$i) == fname ) {
val = $(i+1)
nextfile
}
}
}
ENDFILE {
if ( ARGIND > 1 ) {
print "done", val
}
val = 0
}
' Sampleinput.csv > result.csv
I have a following varaible:
tags = {
environment = "development",
provider = "ServiceOne",
ansible_role = "nfs-role",
comment = "mysql"
}
In my pipeline i need to convert it to the following:
tfh pushvars -overwrite-all -dry-run false -hcl-var "tags={environment=\"development\", provider=\"ServiceOne\", ansible_role=\"nfs-rolep\",comment= \"mysql\"}"
I have tried with SED and AWK but couldn't get any result?
This is where i am standing now:
#!/bin/bash
#[[ -z "$2" ]] && echo "==> Usage: ./transform_tfe_vars.sh <<INPUT_FILE>> <<OUTPUT_FILE>>" && exit 1;
vars_file=${1}
#output_file=${2}
tmp_file=".todelete.tmp"
cmd "$vars_file" | grep -v '^#' | awk '!/^$/' > "$tmp_file"
while read -r p; do
a=$(echo "$p" | awk '{print $1}')
b=$(echo "$p" | awk '{print $3}')
echo "tfh pushvars -overwrite-all -dry-run false -shcl-var \"$a=\\""$b""\""
done <$tmp_file
A shell read loop is always the wrong approach for manipulating text, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice. The guys who invented shell also invented awk for shell to call to manipulate text.
It looks like this might be what you're trying to do:
#!/usr/bin/env bash
(( $# == 2 )) || { echo "==> Usage: ${0##*/} <<INPUT_FILE>> <<OUTPUT_FILE>>"; exit 1; }
vars_file="$1"
output_file="$2"
awk '
BEGIN {
ORS = ""
print "tfh pushvars -overwrite-all -dry-run false -hcl-var \""
}
NF && !/^#/ {
gsub(/[[:space:]]/,"")
gsub(/"/,"\\\\&")
print
}
END {
print "\"\n"
}
' "$vars_file" > "$output_file"
I have this script, and i have multiple results.
How can the multiple results be output in one variable?
Example: results from first loop is English, second loop is Italian.
I need final Results in a Variable : English Italian
for i in $(ls -l $1/$2/);do
if [[ $i =~ .*\.idx$ ]];then
tr -d '\r' < $1/$2/$i > $1/$2/newfile
rm -f $1/$2/$i
mv $1/$2/newfile $1/$2/$i
results=$(cat $1/$2/$i |awk '/^# alt:/ { a[$3] } END { for (l in a) { printf("%s%s", c, l); c = " " } printf("\n") }')
echo "Results for $i : $results"
fi
done
Alternatively, just accumulate your results in another variable:
RESULTS=""
for i in $(ls -l $1/$2/);do
if [[ $i =~ .*\.idx$ ]];then
tr -d '\r' < $1/$2/$i > $1/$2/newfile
rm -f $1/$2/$i
mv $1/$2/newfile $1/$2/$i
results=$(cat $1/$2/$i |awk '/^# alt:/ { a[$3] } END { for (l in a) { printf("%s%s", c, l); c = " " } printf("\n") }')
RESULTS="$RESULTS$results "
fi
done
echo ${RESULTS%" "} #Get rid of the trailing space
You can get all results in a single variable if you use the following syntax:
variable=$(command)
You might find it useful to put your code in a function and then call it like this:
function command() {
# your code from above
}
results=$( command "$#" )
I have the following string which I get from a web service:
user#server:~# cat test.txt
account|uname|upass|mac|ip|tariff|download|upload
11122|24a43cda22b2|O3v2L2oPE9|24:A4:3C:DA:22:B2|192.168.1.2|7|12582500|2097000
First row are column names, and the second are their respective values. Note that the response always returns these two rows, not more. I cannot use JSON or other structured response format and this is what I have to work with.
My goal is to split all of these values by their separator character | and set environment variables (or variables in the context of my .sh script) by using column names as variable names.
So far, I am able to split both lines with | separatelly, using the following command:
user#server:~# head -n1 test.txt | sed 's/|/\n/g'
account
uname
upass
mac
ip
tariff
download
upload
user#server:~# head -n2 test.txt | sed 's/|/\n/g'
11122
24a43cda22b2
O3v2L2oPE9
24:A4:3C:DA:22:B2
192.168.1.2
7
12582500
2097000
From here I'd like to define variables, ie. $account, $uname, etc. to have values of:
$account = 11122
$uname = 24a43cda22b2
To have them as system environment variables (until reboot or next execution of my .sh script) or in the context of the shell script itself.
Take 3: awk as requested
$: awk -F'|' 'BEGIN{ ORS="" }
NR == 1 { for ( i=0; i <= NF; i++ ) { arr[i] = $i; }; }
NR == 2 { for ( i=1; i <= NF; i++ ) { printf "%s=%s\n", arr[i], $i; }; }
' test.txt > /tmp/vars
$: . /tmp/vars
Take 2: Rewrite for sh
Hope this helps. :)
$: cat parse
#! /bin/env sh
sed -n '
s/[|]/\n/g
1 w /tmp/names
2 w /tmp/vals
' $1
paste /tmp/names /tmp/vals | sed 's/\t/=/' >/tmp/vars
$: cat /tmp/vars
account=11122
uname=24a43cda22b2
upass=O3v2L2oPE9
mac=24:A4:3C:DA:22:B2
ip=192.168.1.2
tariff=7
download=12582500
upload=2097000
$: . /tmp/vars
$: echo $account
11122
Original for reference
printf -v will write to a specified variable name for you.
$: cat parse
#! /bin/env bash
{ IFS='|' read -a headers
IFS='|' read -a data
} < $1
declare -i ndx=0
for h in "${headers[#]}"
do printf -v "$h" "%s" "${data[ndx++]}"
done
echo "account=$account uname=$uname upass=$upass mac=$mac ip=$ip tariff=$tariff download=$download upload=$upload"
Executed as:
$: parse test.txt >vars
Output:
$: cat vars
account=11122 uname=24a43cda22b2 upass=O3v2L2oPE9 mac=24:A4:3C:DA:22:B2 ip=192.168.1.2 tariff=7 download=12582500 upload=2097000
to load those values into scope:
$: parse test.txt >vars
$: . vars
Even in sh, sourcing the output should do what you need.
This does what you want :
awk -F'|' 'BEGIN{ ORS="" } { for ( i=1; i<= NF && NR == 1; i++){ arr[i]=$i } ; if (NR == 1) next ; for ( i=1; i<= NF ; i++ ) { print arr[i]"="'\''$i'\''"\n" } } ' input.txt
account=11122
uname=24a43cda22b2
upass=O3v2L2oPE9
mac=24:A4:3C:DA:22:B2
ip=192.168.1.2
tariff=7
download=12582500
upload=2097000
EDIT : The dollar symbol is not needed when declaring.
EDIT : In order actually set this as environment variables you can redirect the output of it to the file /etc/environment as in here :
awk -F'|' 'BEGIN{ ORS="" } { for ( i=1; i<= NF && NR == 1; i++){ arr[i]=$i } ; if (NR == 1) next ; for ( i=1; i<= NF ; i++ ) { print arr[i]"="'\''$i'\''"\n" } } ' input.txt >> /etc/environment
You need sudo of course
Hope it helps!
If the headers are fixed a read is enough:
IFS='|' read account uname upass mac ip tariff download upload < <(tail -n1 test.txt)
echo $account $ip
Output:
11122 192.168.1.2
Using paste and GNU sed:
. <(paste -d= <(sed -n '1s/|/\n/gp' test.txt) <(sed -n '2s/|/\n/gp' test.txt))
Or if GNU datamash is available:
. <(datamash -t '|' --output-delimiter='=' transpose < test.txt)
Then:
echo $account $mac $ip
Will output:
11122 24:A4:3C:DA:22:B2 192.168.1.2
I have two files
Content of file A
paybackFile_537214-760887_000_20120801.xml
paybackFile_354472-544899_000_20120801.xml
paybackFile_62-11033_000_20120801.xml
paybackFile_831669-837544_000_20120801.xml
===========================================
Total file(s) - 4
===========================================
Content of file B
14/08/2012 12:36:01: MSG: File paybackFile_537214-760887_000_20120801.xml.gpg decrypted successfully.
13/08/2012 11:36:01: MSG: File paybackFile_62-11033_000_20120801.xml.gpg not decrypted successfully.
Here i have names of .xml files.
From file A we check that **.xml file is present in file B and also check whether it has been decrypted successfully.
Could you please help me with this.
Thanks in advance.
Regards,
Smita
awk 'FNR==NR{a[$2".gpg"];next}(($5 in a) && ($0~/decrypted/))' filea fileb
Create a script named compare.awk. Paste this inside:
FILENAME=="fileB" && $5 ~ /xml/ {
if ($6 == "decrypted" && $7 == "successfully.") {
decrypted_file[$5] = 1;
} else {
decrypted_file[$5] = 2;
}
}
FILENAME=="fileA" && $2 ~ /xml/ {
if (decrypted_file[$2".gpg"] == 1) {
print $2" exist and decrypted";
} else if (decrypted_file[$2".gpg"] == 2) {
print $2" exist but not decrypted";
} else {
print $2" not exist in fileB";
}
}
Call it by:
awk -F' ' -f compare.awk fileB fileA
[EDIT] For shell without awk script, (still need grep, sed, cut and wc tho):
#!/bin/bash
TESTA=`grep ".xml" fileA | cut -d' ' -f2`
TESTB=`grep ".xml" fileB | cut -d' ' -f5,6,7 | sed 's/ /-/g'`
DECRYPT_YES=""
DECRYPT_NO=""
for B in ${TESTB}
do
DECRYPT_B=`echo ${B} | sed 's/.*gpg-decrypted-successfully\./1/'`
if [ ${DECRYPT_B} == "1" ]
then
DECRYPT_YES=${DECRYPT_YES}" "`echo ${B} | sed 's/\.gpg.*//g'`
else
DECRYPT_NO=${DECRYPT_NO}" "`echo ${B} | sed 's/\.gpg.*//g'`
fi
done
for FILE_A in ${TESTA}
do
if [ `echo ${DECRYPT_YES} | grep "${FILE_A}" | wc -l` == 1 ]
then
echo ${FILE_A}" exist and decrypted"
elif [ `echo ${DECRYPT_NO} | grep "${FILE_A}" | wc -l` == 1 ]
then
echo ${FILE_A}" exist but not decrypted"
else
echo ${FILE_A}" not exist"
fi
done
Here's a script:
#!/bin/sh
FILEA=fileA
FILEB=fileB
awk -F" " ' { print $2 } ' $FILEA > .tmpfileA
awk -F" " ' { print $5 } ' $FILEB | sed 's/\.gpg//' | grep 'decrypted successfully' > .tmpfileB
diff .tmpfileA .tmpfileB
rm -f .tmpfileA
rm -f .tmpfileB
All you'll need to change is the variables FILEA and FILEB
When executing it with the inputs you provided it gives the following result:
$ testAB.ksh
2d1
< paybackFile_521000-845442_000_20120701.xml
$