BASH: How to resolve paths beginning with file://, sftp://, davs:// - bash

I'm writing a bulk rename program to use with nemo, but nemo passes paths beginning with file://, sftp://, davs://. For example:
file:///home/jkoop/my-file.txt
sftp://my-nas/my-file.txt
davs://my-nas/my-file.txt
How can I resolve these paths to something like:
/home/jkoop/my-file.txt
/run/user/1000/gvfs/sftp=joek#my-nas/my-file.txt
/run/user/1000/davs=jkoop#my-nas/my-file.txt
This has been a difficult question to Google. Perhaps something like realpath could help?

You need to do conditional pattern substitution. It is my experience that awk is best for that in both flexibility in defining the patterns to match and specifying the patterns that need to be substituted.
The following fits your example:
#!/bin/sh
echo "file:///home/jkoop/my-file.txt
sftp://my-nas/my-file.txt
davs://my-nas/my-file.txt
abcdefg://unknown/random/extraneous.txt" |
while [ true ]
do
read nemopath
if [ -z "${nemopath}" ] ; then exit 0 ; fi
fullpath=`echo "${nemopath}" |
awk -v htPref="/" \
-v ftpPref="/run/user/1000/gvfs/sftp=joek#" \
-v davPref="/run/user/1000/davs=jkoop#" '{
n=index( $0, "file:///" ) ;
if( n == 1 ){
filepath=sprintf("%s%s", htPref, substr($0,9) ) ;
}else{
n=index($0,"sftp://")
if( n == 1 ){
filepath=sprintf("%s%s", ftpPref, substr($0,8) ) ;
}else{
n=index($0,"davs://")
if( n == 1 ){
filepath=sprintf("%s%s", davPref, substr($0,8) ) ;
}else{
filepath=sprintf("#NOMATCH|%s|", $0 ) ;
} ;
} ;
} ;
}END{
print filepath ;
}' `
take_action "${nemopath}" "${fullpath}
done
That provides the following output (which could be directed to a log file):
/home/jkoop/my-file.txt
/run/user/1000/gvfs/sftp=joek#my-nas/my-file.txt
/run/user/1000/davs=jkoop#my-nas/my-file.txt
#NOMATCH|abcdefg://unknown/random/extraneous.txt|

Related

How to replace the values of a Param

How can I replace the values of parameters step by step.
What I mean is,
For Example-
Url is
https://example.com/?p=first&q=second&r=third
First I want to add '123' on p param
https://example.com/?p=123&q=second&r=third
Then again with same URL but different parameter, such as q param
https://example.com/?p=first&q=123&r=third
Again with same URL but different parameter,
https://example.com/?p=first&q=second&r=123
What I tried:
while read line; do
first_part=`echo $line | cut -d'=' -f1` second_part=`echo $line | cut -d'=' -f2`
echo "${first_part}=123${second_part}"
echo "${first_part}${second_part}=123"
done < urls.txt
The problem described is a good application for AWK's capabilities. The demo script includes samples for both URLs and a mapping functions file for global transformation of URLs.
This approach allows for parameters to "free float", not dependent on matching at a specific sequential position in the URL string.
This approach also allows for parameters to be strings of any length.
#!/bin/bash
#QUESTION: https://stackoverflow.com/questions/75124190/how-to-replace-the-values-of-a-param
cat >URL.list <<"EnDoFiNpUt"
https://example.com/?p=first&q=second&r=third
https://example.com/?r=zinger
https://example.com/?r=bonkers&q=junk&p=wacko
https://example.com/?p=flyer
EnDoFiNpUt
cat >mapfile.txt <<"EnDoFiNpUt"
q=SECOND
r=THIRD
p=FIRST
EnDoFiNpUt
awk -v datFile="mapfile.txt" 'BEGIN{
## Initial loading of the mapping file into array for comparison
split( "", transforms ) ;
indexT=0 ;
while( getline < datFile ){
indexT++ ;
transforms[indexT]=$0 ;
} ;
}
{
### Split off beginning of URL from parameters
qPos=index( $0, "?" ) ;
beg=substr( $0, 1, qPos ) ;
### Load URL elements into array for comparison
rem=substr( $0, qPos+1 ) ;
n=split( rem, parts, "&" ) ;
### Match and Map transforms elements with URL parts
for( k=1 ; k<= indexT ; k++ ){
dPos=index( transforms[k], "=" ) ;
fieldPref=substr( transforms[k], 1, dPos ) ;
for( i=1 ; i<=n ; i++ ){
if( parts[i] ~ fieldPref ){
parts[i]=transforms[k] ;
} ;
} ;
} ;
### Print transformed URL
printf("%s%s", beg, parts[1] ) ;
for( i=2 ; i<=n ; i++ ){
printf("&%s", parts[i] ) ;
} ;
print "" ;
}' URL.list
The output looks like this:
https://example.com/?p=FIRST&q=SECOND&r=THIRD
https://example.com/?r=THIRD
https://example.com/?r=THIRD&q=SECOND&p=FIRST
https://example.com/?p=FIRST
HTML params are, by spec, orderless, so you can simply place p='s new value at the tail instead of original position :
echo 'https://example.com/?p=first&q=second&r=third' |
mawk NF=NF FS='p=[^&]*[&]?' OFS= ORS='&p=123\n'
1 https://example.com/?q=second&r=third&p=123
same for q=.
if you're modifying r= instead, then set both FS and OFS to "=", and do it it like a vanilla value update for $NF

The command 'startx' only works when entered in the terminal and not from a bash script

I wrote a rule in /etc/udev.rules.d/ that runs a bash script '/home/pi/startx.sh' if a device is plugged into my Raspberry Pi 4. This rule works, in the sense that if in my script I write
#!/bin/bash
touch test.txt
the test.txt file is actually created each time I insert e.g. a keyboard. Now I would like this script to run the startx command as the user 'pi'.
In practice, I want the Raspberry to start in console mode and starts the 'pi' user's desktop only if this device is inserted through the script startx.sh.
Now if in my script I write
#!/bin/bash
startx
it doesn't work and the pi's Xorg Desktop doesn't start.
If I try to run it from ssh it says (well, i dont care to start it via ssh, it was only a try):
parse_vt_settings /dev/tty0 (permission denied)
The really absurd thing is that if I run the startx command from the keyboard in front of the monitor startx works without problems (always as user pi).
What's more, if I run the script ./startx.sh from the console when I'm physically in front of the monitor it works perfectly.
It is important to specify that I want the Desktop to start as the user 'pi' and not as root or other users.
So the command sudo startx is not suitable (which works by giving me obviously another Desktop that is not pi's).
I also don't want to change any permissions on tty, video, etc. because if it works from the console when I'm physically in front of the monitor, I don't understand why it can't work from a bash script.
OS: Raspbian GNU/Linux 10 (buster)
Raspberry pi 4
I tried modifying the script like this:
#!/bin/bash -l
startx
or
#!/bin/bash
sudo -u pi startx
or
#!/bin/bash -l
sudo -u pi startx
and also
#!/bin/bash
sudo -i -u pi xinit
without any result.
"startx" will start the Desktop, as long as the Desktop is not already running.
You are facing either
a conflict with a previous Desktop instance, or
you have not correctly identified the device that the system believes to be the console where the Desktop "must" be displayed, or
you haven't correctly defined the device for a dual-head Pi, or
the configuration for that target device is malformed as an intended X display.
To help in assisting you with identifying process parents and dependencies, below is a script (reused/modified) to display the processes in a hierarchy tree. The original was simply called pt. I call my own pt.sh.
#!/bin/sh
##################################################################################
###
### $Id: pt.sh,v 2.4 2022/08/17 22:27:05 root Exp $
###
### Script to print a process tree on stdout.
### - easy to see relationships between processes with this script
### Synt: pt [opts] [startPID]
### Opts: -h,H,-V : help,HELP,version
### -w width: screen width
### -o type : output type: 1, 2, or 3 (def 3)
### Parm: startPID: show tree starting at this PID (def: 1)
###
### #(#) pt - display Process Table "Tree"
###
### Original Author: William J. Duncan (prior to May 7 1996)
###
### Synopsis:
### pt [opts] [startPID] | less # (or whatever your fav pager is)
###
### Notes:
### - all recent implementations of awk I have seen have recursion.
### It is a requirement. This is a nice little example of using
### recursion in awk.
###
### - under bsd, there was no real happy mix of options which
### would pick up a user's name, and do everything else we wanted.
### (eg. need to pick up PPID for example)
### So we need to do a separate search through the passwd entries
### ourselves and build a lookup table, or alternatively run ps
### twice.
###
### - notice the ugliness of 3 separate sets of quotes required in
### the line:
### while ("'"$GETPASSWD"'" | getline > 0)
###
### The inside pair of quotes keeps the 2 tokens for the command
### together. The pair of single quotes escapes from within the
### awk script to the "outside" which is the shell script. This
### makes the shell variable "$GETPASSWD" available for use with-
### in the awk script as a literal string. (Which is the reason
### for the outside pair of double quotes.)
###
### - This is the general format of including awk scripts within
### the shell, and passing ENVIRONMENT variables down. -wjd
###
##################################################################################
##################################################################################
###
### Mods by E. Marceau, Ottawa, Canada
###
### - Added logic to determine max length of username for proper display of that field
###
##################################################################################
TMP=/tmp/`basename "$0" ".sh" `.$$
set -u
## Constants
rcsid='$Id: pt.sh,v 2.4 2022/08/17 22:27:05 root Exp $'
P=`basename $0`;
## This command should list the password file on on "all" systems, even
## if YP is not running or 'ypcat' does not exist.
## List the local password file first because the UIDS array is assigned
## such that later entries override earlier entries.
if [ -z "`which ypcat 2>>/dev/null `" ]
then
GETPASSWD="(cat /etc/passwd)"
else
GETPASSWD="(ypcat passwd 2>/dev/null)"
fi
## Name: usage; Desc: standard usage description function
usage() { awk 'NF==0{if(n++=='${1:-0}')exit}0==0'<$0; }
maxWidth=0
## check for options
set -- `getopt ehHVw:o: ${*:-}`
test $? -ne 0 && usage 0 && exit 9
for i in $*; do
case $i in
-e) maxWidth=1 ; COLS=512 ; shift ;;
-h) usage 0 && exit 9 ;;
-H) usage 1 && exit 9 ;;
-V) echo "${P} ${rcsid}"|cut -d' ' -f1,4-5; exit;;
-w) COLS=$2 ; shift 2 ;;
-o) outtype=$2 ; shift 2 ;;
--) shift ; break ;;
esac
done
## initialize
startpid="${1:-1}"
SYSTEM=${SYSTEM:-`uname`}
outtype="${outtype:-3}"
case ${SYSTEM} in
# XENIX) # or any other sys5 i think
# PS=/bin/ps
# AWK=/bin/awk
# PSFLAGS=-ef
# SYSTEM=sys5
# SIZE='/bin/stty size'
# ;;
# SunOS) # bsd flavours of ps
# os=`uname -r | cut -c1`
# PS=/bin/ps
# AWK=nawk
# if test "$os" = "4"; then
# PSFLAGS=-axjww
# SYSTEM=bsd
# SIZE='/bin/stty size'
# else
# PSFLAGS=-ef
# SYSTEM=sys5
# SIZE='/usr/ucb/stty size'
# fi
# ;;
# HP-UX)
# PS=/bin/ps
# AWK=/usr/bin/awk
# PSFLAGS=-ef
# SYSTEM=sys5
# SIZE='/bin/stty size'
# ;;
Linux)
PS=/bin/ps
AWK=awk
PSFLAGS=-axjww
SYSTEM=bsd
SIZE='/bin/stty size'
;;
*)
PS=/bin/ps
AWK=awk
PSFLAGS=-axjww
SYSTEM=bsd
SIZE='/bin/stty size'
;;
esac
COLShere=`${SIZE} | awk '{print $2}' `
COLS=${COLS:-$COLShere}
COLS=${COLS:-80}
echo "\t [1] ${COLS}"
${PS} ${PSFLAGS} |
${AWK} -v maxWidth="${maxWidth}" 'BEGIN{ lowestPID=9999 ;
FS = ":" ;
while ("'"${GETPASSWD}"'" | getline > 0){
UIDS[ $3 ] = $1 ;
} ;
UIDS[ 0 ] = "root" ; # fix for "extra" root accounts
for ( var in UIDS ){
lenT=length( UIDS[var] ) ;
if ( lenT > lenM ){
lenM=lenT ; # longest=UIDS[var] ;
} ;
} ;
printf("%6s %6s %4s %-"lenM"s %s\n", "PID", "PPID", "TTY", "USER", "COMMAND" ) ;
FS = " " ;
COLS='${COLS}' ;
SYSTEM="'${SYSTEM}'" ;
if (SYSTEM == "sys5"){
fpid = 2 ;
fppid = 3 ;
fuid = 1 ;
}else{
if (SYSTEM == "bsd"){
fpid = 2 ;
fppid = 1 ;
fuid = 8 ;
} ;
} ;
outtype ="'${outtype}'" ;
if (outtype == 1){
SPACES=".............................................................................................." ;
SPREAD=1 ;
CMD_PREFIX=" " ;
}else{
if (outtype == 2){
SPACES="||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||" ;
SPREAD=1 ;
CMD_PREFIX="" ;
}else{
SPACES="| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |" ;
SPREAD=2 ;
CMD_PREFIX="" ;
} ;
} ;
} ;
NR==1 { title = $0 ; next } ;
# All others
{
if ( $fpid < lowestPID ){ lowestPID=$fpid ; } ;
LINE[ $fpid ] = $0 ; # Line indexed by PID
PCNT[ $fppid ]++ ; # Index into PPID
PPID[ $fppid, PCNT[$fppid] ] = $2 ; # Parent to Children unique
INDENT=0 ;
} ;
function doprint(s, a, name, i, nf, tty, cmd){
# the splitting and complications here are mostly for
# sys5, which a different number of fields depending on STIME
# field. Argh.
nf = split(s,a) ;
for (i=7; i <= nf; i++){
if (a[i] ~ /[0-9]:[0-9][0-9]/){
break # set i here
} ;
} ;
for (i++ ; i <= nf; i++){
name = name " " a[i] ;
} ;
if (a[fuid] in UIDS){
a[fuid] = UIDS[a[fuid]] ; # if username found
} ;
if (SYSTEM == "bsd"){ # if bsd
tty = a[5] ;
}else{ # sys5 2 possible formats
tty = (a[5] ~ /^[0-9]+:/) ? a[6] : a[7] ;
} ;
cmd = substr(SPACES,1,INDENT*SPREAD) CMD_PREFIX substr(name,2) ;
#if ( length(cmd) > COLS-27 && maxWidth == 0 ){
if ( length(cmd) > COLS-27 ){
cmd = substr(cmd,1,COLS-27) ;
} ;
printf("%6d %6d %4s %-"lenM"s %s\n", a[fpid], a[fppid], substr(tty,length(tty)-1), a[fuid], cmd ) ;
} ;
function dotree(pid) { # recursive
if (pid == 0) return
doprint(LINE[ pid ])
INDENT++
while (PCNT[pid] > 0) {
dotree(PPID[ pid, PCNT[pid] ] ) ; # recurse
delete PPID[ pid, PCNT[pid] ] ;
PCNT[pid]-- ;
} ;
INDENT-- ;
} ;
END{
if ( lowestPID > startpid ){ startpid=lowestPID ; } ;
dotree('${startpid}') ;
}' >${TMP}.initial
###########################################################################################################################
###
### Additional coding to present results with PID correctly sorted along with associated children also in sorted order.
###
###########################################################################################################################
head -1 ${TMP}.initial >${TMP}.head
tail --lines=+2 ${TMP}.initial | sort --key=1,1n --key=2,2n >${TMP}.remainder
cat ${TMP}.head
while [ true ]
do
line=`awk '{ if ( NR == 1 ){ print $0 } ; exit }' <${TMP}.remainder `
first=`echo "${line}" | awk '{ print $1 }' `
echo "${line}"
#get children
tail --lines=+2 ${TMP}.remainder | awk -v pid="${first}" '{ if ( $2 == pid ){ print $0 } ; }' >${TMP}.next
tail --lines=+2 ${TMP}.remainder | awk -v pid="${first}" '{ if ( $2 != pid ){ print $0 } ; }' >${TMP}.others
cat ${TMP}.next ${TMP}.others >${TMP}.remainder
if [ ! -s ${TMP}.remainder ] ; then break ; fi
done ### >${TMP}.new ;cat ${TMP}.new
exit 0
exit 0
exit 0

Generic "append to file if not exists" function in Bash

I am trying to write a util function in a bash script that can take a multi-line string and append it to the supplied file if it does not already exist.
This works fine using grep if the pattern does not contain \n.
if grep -qF "$1" $2
then
return 1
else
echo "$1" >> $2
fi
Example usage
append 'sometext\nthat spans\n\tmutliple lines' ~/textfile.txt
I am on MacOS btw which has presented some problems with some of the solutions I've seen posted elsewhere being very linux specific. I'd also like to avoid installing any other tools to achieve this if possible.
Many thanks
If the files are small enough to slurp into a Bash variable (you should be OK up to a megabyte or so on a modern system), and don't contain NUL (ASCII 0) characters, then this should work:
IFS= read -r -d '' contents <"$2"
if [[ "$contents" == *"$1"* ]]; then
return 1
else
printf '%s\n' "$1" >>"$2"
fi
In practice, the speed of Bash's built-in pattern matching might be more of a limitation than ability to slurp the file contents.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I replaced echo with printf.
Using awk:
awk '
BEGIN {
n = 0 # length of pattern in lines
m = 0 # number of matching lines
}
NR == FNR {
pat[n++] = $0
next
}
{
if ($0 == pat[m])
m++
else if (m > 0 && $0 == pat[0])
m = 1
else
m = 0
}
m == n {
exit
}
END {
if (m < n) {
for (i = 0; i < n; i++)
print pat[i] >>FILENAME
}
}
' - "$2" <<EOF
$1
EOF
if necessary, one would need to properly escape any metacharacters inside FS | OFS :
jot 7 9 |
{m,g,n}awk 'BEGIN { FS = OFS = "11\n12\n13\n"
_^= RS = (ORS = "") "^$" } _<NF || ++NF'
9
10
11
12
13
14
15
jot 7 -2 | (... awk stuff ...)
-2
-1
0
1
2
3
4
11
12
13

Matching a number against a comma-separated sequence of ranges

I'm writing a bash script which takes a number, and also a comma-separated sequence of values and strings, e.g.: 3,15,4-7,19-20. I want to check whether the number is contained in the set corresponding to the sequence. For simplicity, assume no comma-separated elements intersect, and that the elements are sorted in ascending order.
Is there a simple way to do this in bash other than the brute-force naive way? Some shell utility which does something like that for me, maybe something related to lpr which already knows how to process page range sequences etc.
Is awk cheating?:
$ echo -n 3,15,4-7,19-20 |
awk -v val=6 -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' -
Output:
4-7
Another version:
$ echo 19 |
awk -v ranges=3,15,4-7,19-20 '
BEGIN {
split(ranges,a,/,/)
}
{
for(i in a) {
n=split(a[i],b,/-/)
if((n==1 && $1==a[i]) || (n==2 && $1>=b[1] && $1<=b[2]))
print a[i]
}
}' -
Outputs:
19-20
The latter is better as you can feed it more values from a file etc. Then again the former is shorter. :D
Pure bash:
check() {
IFS=, a=($2)
for b in "${a[#]}"; do
IFS=- c=($b); c+=(${c[0]})
(( $1 >= c[0] && $1 <= c[1] )) && break
done
}
$ check 6 '3,15,4-7,19-20' && echo "yes" || echo "no"
yes
$ check 42 '3,15,4-7,19-20' && echo "yes" || echo "no"
no
As bash is tagged, why not just
inrange() { for r in ${2//,/ }; do ((${r%-*}<=$1 && $1<=${r#*-})) && break; done; }
Then test it as usual:
$ inrange 6 3,15,4-7,19-20 && echo yes || echo no
yes
$ inrange 42 3,15,4-7,19-20 && echo yes || echo no
no
A function based on #JamesBrown's method:
function match_in_range_seq {
(( $# == 2 )) && [[ -n "$(echo -n "$2" | awk -v val="$1" -v RS=, -F- '(NF==1&&$1==val) || (NF==2&&$1<=val&&$2>=val)' - )" ]]
}
Will return 0 (in $?) if the second argument (the range sequence) contains the first argument, 1 otherwise.
Another awk idea using two input (-v) variables:
# use of function wrapper is optional but cleaner for the follow-on test run
in_range() {
awk -v value="$1" -v r="$2" '
BEGIN { n=split(r,ranges,",")
for (i=1;i<=n;i++) {
low=high=ranges[i]
if (ranges[i] ~ "-") {
split(ranges[i],x,"-")
low=x[1]
high=x[2]
}
if (value >= low && value <= high) {
print value,"found in the range:",ranges[i]
exit
}
}
}'
}
NOTE: the exit assumes no overlapping ranges, ie, value will not be found in more than one 'range'
Take for a test spin:
ranges='3,15,4-7,19-20'
for value in 1 6 15 32
do
echo "########### value = ${value}"
in_range "${value}" "${ranges}"
done
This generates:
########### value = 1
########### value = 6
6 found in the range: 4-7
########### value = 15
15 found in the range: 15
########### value = 32
NOTES:
OP did not mention what to generate as output if no range match is found; code could be modified to output a 'not found' message as needed
in a comment OP mentioned possibly running the search for a number of values; code could be modified to support such a requirement but would need more input (eg, format of list of values, desired output and how to be used/captured by calling process, etc)

Processing a delimited line in bash

Given a single line of input with 'n' arguments which are space delimited. The input arguments themselves are variable. The input is given through an external file.
I want to move specific elements to variables depending on regular expressions. As such, I was thinking of declaring a pointer variable first to keep track of where on the line I am. In addition, the assignment to variable is independent of numerical order, and depending on input some variables may be skipped entirely.
My current method is to use
awk '{print $1}' file.txt
However, not all elements are fixed and I need to account for elements that may be absent, or may have multiple entries.
UPDATE: I found another method.
file=$(cat /file.txt)
for i in ${file[#]}; do
echo $i >> split.txt;
done
With this way, instead of a single line with multiple arguments, we get multiple lines with a single argument. as such, we can now use var#=(grep --regexp="[pattern]" split.txt. Now I just need to figure out how best to use regular expressions to filter this mess.
Let me take an example.
My input strings are:
RON KKND 1534Z AUTO 253985G 034SRT 134OVC 04/32
RON KKND 5256Z 143623G72K 034OVC 074OVC 134SRT 145PRT 13/00
RON KKND 2234Z CON 342523G CLS 01/M12 RMK
So the variable assignment for each of the above would be:
var1=RON var2=KKND var3=1534Z var4=TRUE var5=FALSE var6=253985G varC=2 varC1=034SRT varC2=134OVC var7=04/32
var1=RON var2=KKND var3=5256Z var4=FALSE var5=FALSE var6=143623G72K varC=4 varC1=034OVC varC2=074OVC varC3=134SRT varC4=145PRT var7=13/00
var1=RON var2=KKND var3=2234Z var4=FALSE var5=TRUE var6=342523G varC=0 var7=01/M12
So, the fourth argument might be var4, var5, or var6.
The fifth argument might be var5, var6, or match another criteria.
The sixth argument may or may not be var6. Between var6 and var7 can be determined by matching each argument with */*
Boiling this down even more, The positions on the input of var1, var2 and var3 are fixed but after that I need to compare, order, and assign. In addition, the arguments themselves can vary in character length. The relative position of each section to be divided is fixed in relation to its neighbors. var7 will never be before var6 in the input for example, and if var4 and var5 are true, then the 4th and 5th argument would always be 'AUTO CON' Some segments will always be one argument, and others more than one. The relative position of each is known. As for each pattern, some have a specific character in a specific location, and others may not have any flag on what it is aside from its position in the sequence.
So I need awk to recognize a pointer variable as every argument needs to be checked until a specific match is found
#Check to see if var4 or var5 exists. if so, flag and increment pointer
pointer=4
if (awk '{print $$pointer}' file.txt) == "AUTO" ; then
var4="TRUE"
pointer=$pointer+1
else
var4="FALSE"
fi
if (awk '{print $$pointer}' file.txt) == "CON" ; then
var5="TRUE"
pointer=$pointer+1
else
var5="FALSE"
fi
#position of var6 is fixed once var4 and var5 are determined
var6=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
#Count the arguments between var6 and var7 (there may be up to ten)
#and separate each to decode later. varC[0-9] is always three upcase
# letters followed by three numbers. Use this counter later when decoding.
varC=0
until (awk '{print $$pointer}' file.txt) == "*/*" ; do
varC($varC+1)=(awk '{print $$pointer}' file.txt)
varC=$varC+1
pointer=$pointer+1
done
#position of var7 is fixed after all arguments of varC are handled
var7=$(awk '{print $$pointer}' file.txt)
pointer=$pointer+1
I know the above syntax is incorrect. The question is how do I fix it.
var7 is not always at the end of the input line. Arguments after var7 however do not need to be processed.
Actually interpreting the patterns I haven't gotten to yet. I intend to handle that using case statements comparing the variables with regular expressions to compare against. I don't want to use awk to interpret the patterns directly as that would get very messy. I have contemplated using for n in $string, but to do that would mean comparing every argument to every possible combination directly (And there are multiple segments each with multiple patterns) and is such impractical. I'm trying to make this a two step process.
Please try the following:
#!/bin/bash
# template for variable names
declare -a namelist1=( "var1" "var2" "var3" "var4" "var5" "var6" "varC" )
declare -a ary
# read each line and assign ary to the elements
while read -r -a ary; do
if [[ ${ary[3]} = AUTO ]]; then
ary=( "${ary[#]:0:3}" "TRUE" "FALSE" "${ary[4]}" "" "${ary[#]:5:3}" )
elif [[ ${ary[3]} = CON ]]; then
ary=( "${ary[#]:0:3}" "FALSE" "TRUE" "${ary[4]}" "" "${ary[#]:5:3}" )
else
ary=( "${ary[#]:0:3}" "FALSE" "FALSE" "${ary[3]}" "" "${ary[#]:4:5}" )
fi
# initial character of the 7th element
ary[6]=${ary[7]:0:1}
# locate the index of */* entry in the ary and adjust the variable names
for (( i=0; i<${#ary[#]}; i++ )); do
if [[ ${ary[$i]} == */* ]]; then
declare -a namelist=( "${namelist1[#]}" )
for (( j=1; j<=i-7; j++ )); do
namelist+=( "$(printf "varC%d" "$j")" )
done
namelist+=( "var7" )
fi
done
# assign variables to array elements
for (( i=0; i<${#ary[#]}; i++ )); do
# echo -n "${namelist[$i]}=${ary[$i]} " # for debugging
declare -n p="${namelist[$i]}"
p="${ary[$i]}"
done
# echo "var1=$var1 var2=$var2 var3=$var3 ..." # for debugging
done < file.txt
Note that the script above just assigns bash variables and does not print anything
unless you explicitly echo or printf the variables.
Updated: This code shows how to decide variable value based on pattern match , multiple times.
one code block in pure bash and the other in gawk manner
bash code block requires associative Array support, which is not available in very early versions
grep is also required to do pattern matching
tested with GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu) and grep (GNU grep) 2.20
and stick to printf other than echo after I learn why-is-printf-better-than-echo
when using bash I consider it good practice to be more defensive
#!/bin/bash
declare -ga outVars
declare -ga lineBuf
declare -g NF
#force valid index starts from 1
#consistent with var* name pattern
outVars=(unused var1 var2 var3 var4 var5 var6 varC var7)
((numVars=${#outVars[#]} - 1))
declare -gr numVars
declare -r outVars
function e_unused {
return
}
function e_var1 {
printf "%s" "${lineBuf[1]}"
}
function e_var2 {
printf "%s" "${lineBuf[2]}"
}
function e_var3 {
printf "%s" "${lineBuf[3]}"
}
function e_var4 {
if [ "${lineBuf[4]}" == "AUTO" ] ;
then
printf "TRUE"
else
printf "FALSE"
fi
}
function e_var5 {
if [ "${lineBuf[4]}" == "CON" ] ;
then
printf "TRUE"
else
printf "FALSE"
fi
}
function e_varC {
local var6_idx=4
if [ "${lineBuf[4]}" == "AUTO" -o "${lineBuf[4]}" == "CON" ] ;
then
var6_idx=5
fi
local var7_idx=$NF
local i
local count=0
for ((i=NF;i>=1;i--));
do
if [ $(grep -cE '^.*/.*$' <<<${lineBuf[$i]}) -eq 1 ];
then
var7_idx=$i
break
fi
done
((varC = var7_idx - var6_idx - 1))
if [ $varC -eq 0 ];
then
printf 0
return;
fi
local cFamily=""
local append
for ((i=var6_idx;i<=var7_idx;i++));
do
if [ $(grep -cE '^[0-9]{3}[A-Z]{3}$' <<<${lineBuf[$i]}) -eq 1 ];
then
((count++))
cFamily="$cFamily varC$count=${lineBuf[$i]}"
fi
done
printf "%s %s" $count "$cFamily"
}
function e_var6 {
if [ "${lineBuf[4]}" == "AUTO" -o "${lineBuf[4]}" == "CON" ] ;
then
printf "%s" "${lineBuf[5]}"
else
printf "%s" "${lineBuf[4]}"
fi
}
function e_var7 {
local i
for ((i=NF;i>=1;i--));
do
if [ $(grep -cE '^.*/.*$' <<<${lineBuf[$i]}) -eq 1 ];
then
printf "%s" "${lineBuf[$i]}"
return
fi
done
}
while read -a lineBuf ;
do
NF=${#lineBuf[#]}
lineBuf=(unused ${lineBuf[#]})
for ((i=1; i<=numVars; i++));
do
printf "%s=" "${outVars[$i]}"
(e_${outVars[$i]})
printf " "
done
printf "\n"
done <file.txt
The gawk specific extension Indirect Function Call is used in the awk code below
the code assigns a function name for every desired output variable.
different pattern or other transformation can be applied in its specific function
doing so to avoid tons of if-else-if-else
and is also easier to read and extend.
for the special varC family, the function pick_varC played a trick
after varC is determined ,its value consists of multiple output fields.
if varC=2, the value of varC is returned as 2 varC1=034SRT varC2=134OVC
that is actual value of varC appending all follow members.
gawk '
BEGIN {
keys["var1"] = "pick_var1";
keys["var2"] = "pick_var2";
keys["var3"] = "pick_var3";
keys["var4"] = "pick_var4";
keys["var5"] = "pick_var5";
keys["var6"] = "pick_var6";
keys["varC"] = "pick_varC";
keys["var7"] = "pick_var7";
}
function pick_var1 () {
return $1;
}
function pick_var2 () {
return $2;
}
function pick_var3 () {
return $3;
}
function pick_var4 () {
for (i=1;i<=NF;i++) {
if ($i == "AUTO") {
return "TRUE";
}
}
return "FALSE";
}
function pick_var5 () {
for (i=1;i<=NF;i++) {
if ($i == "CON") {
return "TRUE";
}
}
return "FALSE";
}
function pick_varC () {
for (i=1;i<=NF;i++) {
if (($i=="AUTO" || $i=="CON")) {
break;
}
}
var6_idx = 5;
if ( i!=4 ) {
var6_idx = 4;
}
var7_idx = NF;
for (i=1;i<=NF;i++) {
if ($i~/.*\/.*/) {
var7_idx = i;
}
}
varC = var7_idx - var6_idx - 1;
if ( varC == 0) {
return varC;
}
count = 0;
cFamily = "";
for (i = 1; i<=varC;i++) {
if ($(var6_idx+i)~/[0-9]{3}[A-Z]{3}/) {
cFamily = sprintf("%s varC%d=%s",cFamily,i,$(var6_idx+i));
count++;
}
}
varC = sprintf("%d %s",count,cFamily);
return varC;
}
function pick_var6 () {
for (i=1;i<=NF;i++) {
if (($i=="AUTO" || $i=="CON")) {
break;
}
}
if ( i!=4 ) {
return $4;
} else {
return $5
}
}
function pick_var7 () {
for (i=1;i<=NF;i++) {
if ($i~/.*\/.*/) {
return $i;
}
}
}
{
for (k in keys) {
pickFunc = keys[k];
printf("%s=%s ",k,#pickFunc());
}
printf("\n");
}
' file.txt
test input
RON KKND 1534Z AUTO 253985G 034SRT 134OVC 04/32
RON KKND 5256Z 143623G72K 034OVC 074OVC 134SRT 145PRT 13/00
RON KKND 2234Z CON 342523G CLS 01/M12 RMK
script output
var1=RON var2=KKND var3=1534Z var4=TRUE var5=FALSE varC=2 varC1=034SRT varC2=134OVC var6=253985G var7=04/32
var1=RON var2=KKND var3=5256Z var4=FALSE var5=FALSE varC=4 varC1=034OVC varC2=074OVC varC3=134SRT varC4=145PRT var6=143623G72K var7=13/00
var1=RON var2=KKND var3=2234Z var4=FALSE var5=TRUE varC=0 var6=342523G var7=01/M12

Resources