Bash string substitution with % - bash

I have a list of files named with this format:
S2_7-CHX-2-5_Chr5.bed
S2_7-CHX-2-13_Chr27.bed
S2_7-CHX-2-0_Chr1.bed
I need to loop through each file to perform a task. Previously, I had named them without the step 2 indicator ("S2"), and this format had worked perfectly:
for FASTQ in *_clean.bam; do
SAMPLE=${FASTQ%_clean.bam}
echo $SAMPLE
echo $(samtools view -c ${SAMPLE}_clean.bam)
done
But now that I have the S2 preceding what I would like to set as the variable, this returns a list of empty "SAMPLE" variables. How can I rewrite the following code to specify only S2_*.bed?
for FASTQ in S2_*.bed; do
SAMPLE=${S2_FASTQ%.bed}
echo $SAMPLE
done
Edit: I'm trying to isolate the unique name from each file, for example "7-CHX-2-13_Chr27" so that I can refer to it later. I can't use the "S2" as part of this because I want to rename the file with "S3" for the next step, and so on.
Example of what I'm trying to use it for:
for FASTQ in S2_*.bed; do
SAMPLE=${S2_FASTQ%.bed}
echo $SAMPLE
#rename each mapping position with UCSC chromosome name using sed
while IFS=, read -r f1 f2; do
#rename each file
echo " sed "s/${f1}.1/chr${f2}/g" S2_${SAMPLE}_Chr${f2}.bed > S3_${SAMPLE}_Chr${f2}.bed" >> $SCRIPT
done < $INPUT
done

The name of the variable is still $FASTQ, the S2_ is not part of the variable name, but its value.
sample=${FASTQ%.bed}
# ~~~~~|~~~~
# | | |
# Variable | What to remove
# name |
# Remove
# from the right
If you want to remove the S2_ from the $sample, use left hand side removal:
sample=${sample#S2_}
The removals can't be combined, you have to proceed in two steps.
Note that I use lower case variable names. Upper case should be reserved for environment and internal shell variables.

Related

how to assign each of multiple lines in a file as different variable?

this is probably a very simple question. I looked at other answers but couldn't come up with a solution. I have a 365 line date file. file as below,
01-01-2000
02-01-2000
I need to read this file line by line and assign each day to a separate variable. like this,
d001=01-01-2000
d002=02-01-2000
I tried while read commands but couldn't get them to work.It takes a lot of time to shoot one by one. How can I do it quickly?
Trying to create named variable out of an associative array, is time waste and not supported de-facto. Better use this, using an associative array:
#!/bin/bash
declare -A array
while read -r line; do
printf -v key 'd%03d' $((++c))
array[$key]=$line
done < file
Output
for i in "${!array[#]}"; do echo "key=$i value=${array[$i]}"; done
key=d001 value=01-01-2000
key=d002 value=02-01-2000
Assumptions:
an array is acceptable
array index should start with 1
Sample input:
$ cat sample.dat
01-01-2000
02-01-2000
03-01-2000
04-01-2000
05-01-2000
One bash/mapfile option:
unset d # make sure variable is not currently in use
mapfile -t -O1 d < sample.dat # load each line from file into separate array location
This generates:
$ typeset -p d
declare -a d=([1]="01-01-2000" [2]="02-01-2000" [3]="03-01-2000" [4]="04-01-2000" [5]="05-01-2000")
$ for i in "${!d[#]}"; do echo "d[$i] = ${d[i]}"; done
d[1] = 01-01-2000
d[2] = 02-01-2000
d[3] = 03-01-2000
d[4] = 04-01-2000
d[5] = 05-01-2000
In OP's code, references to $d001 now become ${d[1]}.
A quick one-liner would be:
eval $(awk 'BEGIN{cnt=0}{printf "d%3.3d=\"%s\"\n",cnt,$0; cnt++}' your_file)
eval makes the shell variables known inside your script or shell. Use echo $d000 to show the first one of the newly defined variables. There should be no shell special characters (like * and $) inside your_file. Remove eval $() to see the result of the awk command. The \" quoted %s is to allow spaces in the variable values. If you don't have any spaces in your_file you can remove the \" before and after %s.

how to find pattern and insert text in middle using shell script

I would like to add a name in the middle of dirPath
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
I want to insert agent_name-2 after /var/azp in dirPath, and store it in a separate variable result like this
result=/var/azp/agent_name-2/1/s
If /var/azp is a hard coded string (i.e. constant), try:
name='agent_name-2'
dirPath='/var/azp/1/s'
result="/var/azp/$name${dirPath#/var/azp}"
Explanation: ${dirPath#/var/azp} removes the string /var/azp from the beginning of the string $dirPath.
Try this:
#!/bin/bash
name='agent_name-2'
dirPath='/var/azp/1/s'
Split dirPath by / and store it in the array dirs.
IFS=/ read -r -a dirs <<< "$dirPath"
Calculate the middle of the array.
middle=$(((${#dirs[#]}+1)/2))
Create two new arrays left and right with the left and right half of the dirs array.
left=("${dirs[#]:0:$middle}")
right=("${dirs[#]:$middle}")
Join the left and right half and put the name in between.
result="$(printf "%s/" "${left[#]}" "$name" "${right[#]}")"
Remove the trailing slash.
result=${result%/}
Bash search-replace
You can use Bash's search and replace syntax ${variable//search/replace}.
prefix='/var/azp'
result=${dirPath//$prefix/$prefix\/$name}
# > /var/azp/agent_name-2/1/s
sed s
If $name doesn't contain any special characters, you could inject it into a sed search-replace:
$ sed "s|/var/azp|\0/$name|" <<< "$dirPath"
/var/azp/agent_name-2/1/s
Then for saving the result to a variable, see How do I set a variable to the output of a command in Bash?

Bash command to read a line based on the parameters I pass - perform column-based lookups

I have a file links.txt:
1 a.sh
3 b.sh
6 c.sh
4 d.sh
So, if i pass 1,4 as parameters to another file(master.sh), a.sh and d.sh should be stored in a variable.
sed '3!d' would print the 3rd line, but not the line that starts with 3. For that, you need sed '/^3 /!d'. The problem is you can't combine them for more lines, as this means "Delete everything that doesn't start with a 3", which means all other lines will be missed. So, use sed -n '/^3 /p' instead, i.e. don't print by default and tell sed what lines to print, not what lines to delete.
You can loop over the argument and create a sed script from them that prints the lines, then run sed using this output:
#!/bin/bash
file=$1
shift
for id in "$#" ; do
echo "/^$id /p"
done | sed -nf- "$file"
Run as script.sh filename 3 4.
If you want to remove the id from the output, you can either use
cut -f2 -d' '
or you can modify the generated sed script to do the work
echo "/^$id /s/.* //p"
i.e. only print if the substitution was successful.
This loops through each argument and greps for it in the links file. The result is piped into cut where we specify the delimiter as a space with -d flag and the field number as 2 with -f flag. Finally this is appended to the array called files.
links="links.txt"
files=()
for arg in $#; do
files=("${files[#]}" `grep "^$arg" "$links" | cut -d" " -f2`)
done;
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Edit:
As pointed out by mklement0, the solution above reads the file once per arg. The following first builds the pattern then reads the file just once.
links="links.txt"
pattern="^$1\s"
for arg in ${#:2}; do
pattern+="|^$arg\s"
done
files=$(grep -E "$pattern" "$links" | cut -d" " -f2)
echo ${files[#]}
Usage:
$ ./master.sh 1 4
a.sh d.sh
Here is another example with grep and cut:
#!/bin/bash
for line in $(grep "$1\|$2" links.txt|cut -d' ' -f2)
do
echo $line
done
Example of usage:
./master.sh 1 4
a.sh
d.sh
Why not just stores the values and call them at will:
items=()
while read -r num file
do
items[num]="$file"
done<links.txt
for arg
do
echo "${items[arg]}"
done
Now you can use the items array any time you like :)
The following awk solution:
preserves the argument order; that is, the results reflect the order in which the lookup values were specified (as opposed to the order in which the lookup values happen to occur in the file).
If that is not important (i.e., if outputting the results in file order is acceptable), the readarray technique below can be combined with this one-liner, which is a generalized variant of Panta's answer:
grep -f <(printf "^%s\n" "$#") links.txt | cut -d' ' -f2-
performs well, because the input file is only read once; the only requirement is that all key-value pairs fit into memory as a whole (as a single associative Awk array (dictionary)).
works with any lookup values that don't have embedded whitespace.
Similarly, the assumption is that the output column values (containing values such as a.sh in the sample input) have no embedded whitespace. awk doesn't handle quoted fields well, so more work would be needed.
#!/bin/bash
readarray -t files < <(
awk -v idList="$*" '
BEGIN { count=split(idList, idArr); for (i in idArr) idDict[idArr[i]]++ }
$1 in idDict { idDict[$1] = $2 }
END { for (i=1; i<=count; ++i) print idDict[idArr[i]] }
' links.txt
)
# Print results.
printf '%s\n' "${files[#]}"
readarray -t files reads stdin input (<) line by line into array variable files.
Note: readarray requires Bash v4+; on Bash 3.x, such as on macOS, replace this part with
IFS=$'\n' read -d '' -ra files
<(...) is a Bash process substitution that, loosely speaking, presents the output from the enclosed command as if it were (self-deleting) temporary file.
This technique allows readarray to run in the current shell (as opposed to a subshell if a pipeline had been used), which is necessary for the files variable to remain defined in the remainder of the script.
The awk command breaks down as follows:
-v idList="$*" passes the space-separated list of all command-line arguments as a single string to Awk variable idList.
Note that this assumes that the arguments have no embedded spaces, which is indeed the case here and also generally the case with identifiers.
BEGIN { ... } is only executed once, before the individual lines are processed:
split(idList, idArr) splits the input ID list into an array by whitespace and stores the result in idArr.
for (i in idArr) idDict[idArr[i]]++ } then converts the (conceptually regular) array into associative array idDict (dictionary), whose keys are the input IDs - this enables efficient lookup by ID later, and also allows storing the lookup result for each ID.
$1 in idDict { idDict[$1] = $2 } is processed for every input line:
Pattern $1 in idDict returns true if the line's first whitespace-separated field ($1) - e.g., 6 - is among the keys (in) of associative array idDict, and, if so, executes the associated action ({...}).
Action { idDict[$1] = $2 } then assigns the second field ($2) - e.g., c.sh - to the iDict entry for key $1.
END { ... } is executed once, after all input lines have been processed:
for (i=1; i<=count; ++i) print idDict[idArr[i]] loops over all input IDs in order and prints each ID's lookup result, which is the value of the dictionary entry with that ID.

Reading value from an ini style file with sed/awk

I wrote a simple bash function which would read value from an ini file (defined by variable CONF_FILE) and output it
getConfValue() {
#getConfValue section variable
#return value of a specific variable from given section of a conf file
section=$1
var="$2"
val=$(sed -nr "/\[$section\]/,/\[/{/$var/p}" $CONF_FILE)
val=${val#$var=}
echo "$val"
}
The problem is that it does not ignore comments and runs into troubles if multiple variables within a section names share common substrings.
Example ini file:
[general]
# TEST=old
; TEST=comment
TEST=new
TESTING=this will be output too
PATH=/tmp/test
Running getConfValue general PATH would output /tmp/test as expected, but running getConfValue general TEST shows all the problems this approach has.
How to fix that?
I know there are dedicated tools like crudini or tools for python, perl and php out in the wild, but I do not want to add extra dependencies for simple config file parsing. A solution incorporating awk instead of sed would be just fine too.
Sticking with sed you could anchor your var search to the start of the record using ^ and end it with an equal sign:
"/\[$section\]/,/\[/{/^$var=/p}"
If you are concerned about whitespace in front of your record you could account for that:
"/\[$section\]/,/\[/{/^(\W|)$var=/p}"
That ^(\W|)$var= says "if there is whitespace at the beginning (^(\W) or nothing (|)) before your variable concatenated with an equal sign ($var=)."
If you wanted to switch over to awk you could use something like:
val=$(awk -F"=" -v section=$section -v var=$var '$1=="["section"]"{secFound=1}secFound==1 && $1==var{print $2; secFound=0}' $CONF_FILE)
That awk command splits the record by equal -F"=". Then if the first field in the record is your section ($1=="["section"]") then set variable secFound to 1. Then... if secFound is 1 and the first field is exactly equal to your var variable (secFound==1 && $1==var) then print out the second field ({print $2}) and sets secFound to 0 so we don't pick up any other Test keys.
I encountered this problem and came up with a solution similar to others here.
The main difference is it uses a single awk call to get a response suitable for creating an associative array of the property/value pairs for a section.
This will not ignore the commented properties. Though adding something to do that should not be to hard.
Here's a testing script demonstrating the awk and declare statements used;
#!/bin/bash
#
# Parse a INI style properties file and extract property values for a given section
#
# Author: Alan Carlyle
# License: CC0 (https://creativecommons.org/about/cclicenses/)
#
# Example Input: (example.properties)
# [SEC1]
# item1=value1
# item2="value 2"
#
# [Section 2]
# property 1="Value 1 of 'Section 2'"
# property 2='Single "quoted" value'
#
# Usage:
# $ read_props example.properties "Section 2" property\ 2
# $ Single "quoted" value
#
# Section names and properties with spaces do not need to be quoted.
# Values with spaces must be quoted. Values can use single or double quotes.
# The following characters [ = ] can not be used in names or values.
#
# If the property is not provided the the whole section is outputed.
#
propertiesFile=$1
section=$2
property=$3
# Extract the propetites for the section formated as for associtive array
sectionData="( "$(awk -F'=' -v s="$section" '/^\[/{ gsub(/[\[\]]/, "", $1); f = ($1 == s); next }
NF && f{ print "["$1"]="$2 }' $propertiesFile)" )"
# Create associtive array from extracted section data
declare +x -A "properties=$sectionData"
if [ -z "$property" ]
then
echo $sectionData
else
echo ${properties[$property]}
fi

UNIX - Replacing variables in sql with matching values from .profile file

I am trying to write a shell which will take an SQL file as input. Example SQL file:
SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'
Now the script should extract all variables, which in this case everything starting with %%. So the output file will be something as below:
%%DB
%%TBLEXT
%%CITY
Now I should be able to extract the matching values from the user's .profile file for these variables and create the SQL file with the proper values.
SELECT *
FROM tempdb.TBL_abc
WHERE CITY = 'Chicago'
As of now I am trying to generate the file1 which will contain all the variables. Below code sample -
sed "s/[(),']//g" "T:/work/shell/sqlfile1.sql" | awk '/%%/{print $NF}' | awk '/%%/{print $NF}' > sqltemp2.sql
takes me till
%%DB.TBL_%%TBLEXT
%%CITY
Can someone help me in getting to file1 listing the variables?
You can use grep and sort to get a list of unique variables, as per the following transcript:
$ echo "SELECT *
FROM %%DB.TBL_%%TBLEXT
WHERE CITY = '%%CITY'" | grep -o '%%[A-Za-z0-9_]*' | sort -u
%%CITY
%%DB
%%TBLEXT
The -o flag to grep instructs it to only print the matching parts of lines rather than the entire line, and also outputs each matching part on a distinct line. Then sort -u just makes sure there are no duplicates.
In terms of the full process, here's a slight modification to a bash script I've used for similar purposes:
# Define all translations.
declare -A xlat
xlat['%%DB']='tempdb'
xlat['%%TBLEXT']='abc'
xlat['%%CITY']='Chicago'
# Check all variables in input file.
okay=1
for key in $(grep -o '%%[A-Za-z0-9_]*' input.sql | sort -u) ; do
if [[ "${xlat[$key]}" == "" ]] ; then
echo "Bad key ($key) in file:"
grep -n "${key}" input.sql | sed 's/^/ /'
okay=0
fi
done
if [[ ${okay} -eq 0 ]] ; then
exit 1
fi
# Process input file doing substitutions. Fairly
# primitive use of sed, must change to use sed -i
# at some point.
# Note we sort keys based on descending length so we
# correctly handle extensions like "NAME" and "NAMESPACE",
# doing the longer ones first makes it work properly.
cp input.sql output.sql
for key in $( (
for key in ${!xlat[#]} ; do
echo ${key}
done
) | awk '{print length($0)":"$0}' | sort -rnu | cut -d':' -f2) ; do
sed "s/${key}/${xlat[$key]}/g" output.sql >output2.sql
mv output2.sql output.sql
done
cat output.sql
It first checks that the input file doesn't contain any keys not found in the translation array. Then it applies sed substitutions to the input file, one per translation, to ensure all keys are substituted with their respective values.
This should be a good start, though there may be some edge cases such as if your keys or values contain characters sed would consider important (like / for example). If that is the case, you'll probably need to escape them such as changing:
xlat['%%UNDEFINED']='0/0'
into:
xlat['%%UNDEFINED']='0\/0'

Resources