How can I create and process an array of variables in shell script? - shell

I have a shell script to read the data from a YAML file and then do some processing. This is how the YAML file is -
view:
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
tables:
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
end
I want the output as -
schema:schema1
object:view1
fileloc:/some-path/view1.sql
schema:schema2
object:view2
fileloc:/some-path/view2.sql
schema:schema1
object:table1
fileloc:/some-path/table1.sql
schema:schema2
object:table2
fileloc:/some-path/table2.sql
This is how I'm reading the YAML file using the shell script -
#!/bin/bash
input=./file.yaml
viewData=$(sed '/view/,/tables/!d;/tables/q' $file|sed '1d;$d')
tableData=$(sed '/tables/,/end/!d;/end/q' $file|sed '1d;$d')
so viewData will have this data -
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
and tableData will have this data -
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
And then I'm using a for loop to separate the schema, object and SQL file -
for line in $tableData; do
field=`echo $line | cut -d: -f1`
schema=`echo $field | cut -d. -f1`
object=`echo $field | cut -d. -f2`
fileLoc=`echo $line | cut -d: -f2`
echo "schema=$schema"
echo "object=$object"
echo "fileloc=$fileLoc"
done
But I'll have to do the same thing again for the view. Is there any way in shell script like using an array or something else so that I can use the same loop to get data for both view and tables.
Any help would be appreciated. Thanks!

Using (g)awk:
awk -F "[:.]" '/:$/{ s=$1 }{ gsub(" ",""); if($3!=""){ print "schema="$1; print "object="$2; print "fileloc="$3 }}' yaml
-F "[:.]" reads input, and separates this on : or . (But using the regular expression [:.].)
/:$/{ s=$1 } This will store the group (view or tables) you are currently reading. This is not used anymore, so can be ignored.
gsub(" ",""); This will delete all spaced in the input line.
if... When you have three fields, checked by a not empty third field, print the info.
output:
schema=schema1
object=view1
fileloc=/some-path/view1
schema=schema2
object=view2
fileloc=/some-path/view2
schema=schema1
object=table1
fileloc=/some-path/table1
schema=schema2
object=table2
fileloc=/some-path/table2
EDIT: Adding the objectType to the output:
awk -F "[:.]" '/:$/{ s=$1 }{ gsub(" ",""); if($3!=""){ print "objectType="$s; "schema="$1; print "object="$2; print "fileloc="$3 }}' yaml
But I do see that I made a mistake.... 🤔😉
I would have expected the regular expression /:$/ to find a line that end with a :, but for some reason it does not. (I will have to do some more research to look into that)
It should be, for a working work-around:
awk -F "[:.]" 'NF==2{ s=$1 }NF>2{ gsub(" ",""); if($3!=""){ print "objectType="s; "sch
ema="$1; print "object="$2; print "fileloc="$3 }}' yaml
The line with view: has two field, which make NF return the value 2, and view is stored in the variable s.
When we have more than two fields, the contents of the variables is printed.

Related

How to print matching all names given as a argument?

I want to write a script for any name given as an argument and prints the list of paths
to home directories of people with the name.
I am new at scripts. Is there any simple way to do this with awk or egrep command?
Example:
$ show names jakub anna (as an argument)
/home/users/jakubo
/home/students/j_luczka
/home/students/kubeusz
/home/students/jakub5z
/home/students/qwertinx
/home/users/lazinska
/home/students/annalaz
Here is the my friend's code but I have to write it from a different way and it has to be simple like this code
#!/bin/bash
for name in $#
do
awk -v n="$name" -F ':' 'BEGIN{IGNORECASE=1};$5~n{print $6}' /etc/passwd | while read line
do
echo $line
done
done
Possible to use a simple awk script to look for matching names.
The list of names can be passed as a space separated list to awk, which will construct (in the BEGIN section) a combined pattern (e.g. '(names|jakub|anna)'). The pattern is used for testing the user name column ($5) of the password file.
#! /bin/sh
awk -v "L=$*" -F: '
BEGIN {
name_pat = "(" gensub(" ", "|", "g", L) ")"
}
$5 ~ name_pat { print $6 }
' /etc/passwd
Since at present the question as a whole is unclear, this is more of a long comment, and only a partial answer.
There is one easy simplification, since the sample code includes:
... | while read line
do
echo $line
done
All of the code shown above after and including the | is needless, and does nothing, (like a UUoC), and should therefore be removed. (Actually echo $line with an unquoted $line would remove formatting and repeated spaces, but that's not relevant to the task at hand, so we can say the code above does nothing.)

Unix bash - using cut to regex lines in a file, match regex result with another similar line

I have a text file: file.txt, with several thousand lines. It contains a lot of junk lines which I am not interested in, so I use the cut command to regex for the lines I am interested in first. For each entry I am interested in, it will be listed twice in the text file: Once in a "definition" section, another in a "value" section. I want to retrieve the first value from the "definition" section, and then for each entry found there find it's corresponding "value" section entry.
The first entry starts with ' gl_ ', while the 2nd entry would look like ' "gl_ ', starting with a '"'.
This is the code I have so far for looping through the text document, which then retrieves the values I am interested in and appends them to a .csv file:
while read -r line
do
if [[ $line == gl_* ]] ; then (param=$(cut -d'\' -f 1 $line) | def=$(cut -d'\' -f 2 $line) | type=$(cut -d'\' -f 4 $line) | prompt=$(cut -d'\' -f 8 $line))
while read -r glline
do
if [[ $glline == '"'$param* ]] ; then val=$(cut -d'\' -f 3 $glline) |
"$project";"$param";"$val";"$def";"$type";"$prompt" >> /filepath/file.csv
done < file.txt
done < file.txt
This seems to throw some syntax errors related to unexpected tokens near the first 'done' statement.
Example of text that needs to be parsed, and paired:
gl_one\User Defined\1\String\1\\1\Some Text
gl_two\User Defined\1\String\1\\1\Some Text also
gl_three\User Defined\1\Time\1\\1\Datetime now
some\junk
"gl_one\1\Value1
some\junk
"gl_two\1\Value2
"gl_three\1\Value3
So effectively, the while loop reads each line until it hits the first line that starts with 'gl_', which then stores that value (ie. gl_one) as a variable 'param'.
It then starts the nested while loop that looks for the line that starts with a ' " ' in front of the gl_, and is equivalent to the 'param' value. In other words, the
script should couple the lines gl_one and "gl_one, gl_two and "gl_two, gl_three and "gl_three.
The text file is large, and these are settings that have been defined this way. I need to collect the values for each gl_ parameter, to save them together in a .csv file with their corresponding "gl_ values.
Wanted regex output stored in variables would be something like this:
first while loop:
$param = gl_one, $def = User Defined, $type = String, $prompt = Some Text
second while loop:
$val = Value1
Then it stores these variables to the file.csv, with semi-colon separators.
Currently, I have an error for the first 'done' statement, which seems to indicate an issue with the quotation marks. Apart from this,
I am looking for general ideas and comments to the script. I.e, not entirely sure I am looking for the quotation mark parameters "gl_ correctly, or if the
semi-colons as .csv separators are added correctly.
Edit: Overall, the script runs now, but extremely slow due to the inner while loop. Is there any faster way to match the two lines together and add them to the .csv file?
Any ideas and comments?
This will generate a file containing the data you want:
cat file.txt | grep gl_ | sed -E "s/\"//" | sort | sed '$!N;s/\n/\\/' | awk -F'\' '{print $1"; "$5"; "$7"; "$NF}' > /filepath/file.csv
It uses grep to extract all lines containing 'gl_'
then sed to remove the leading '"' from the lines that contain one [I have assumed there are no further '"' in the line]
The lines are sorted
sed removes the return from each pair of lines
awk then prints
the required columns according to your requirements
Output routed to the file.
LANG=C sort -t\\ -sd -k1,1 <file.txt |\
sed '
/^gl_/{ # if definition
N; # append next line to buffer
s/\n"gl_[^\\]*//; # if value, strip first column
t; # and start next loop
}
D; # otherwise, delete the line
' |\
awk -F\\ -v p="$project" -v OFS=\; '{print p,$1,$10,$2,$4,$8 }' \
>>/filepath/file.csv
sort lines so gl_... appears immediately before "gl_... (LANG fixes LC_TYPE) - assumes definition appears before value
sed to help ensure matching definition and value (may still fail if duplicate/missing value), and tidy for awk
awk to pull out relevant fields

Retrieve oracle sid from oratab file

I'm creating this ksh shell script to compare Oracle homes from two database name which user inputs.
I tried using cat and also sed from various threads but somehow not able to put oracle home value into variable to compare them.
Oratab:
db1:/oracle/app/oracle/product/11.2.0.3:Y
db2:/oracle/app/oracle/product/11.2.0.3:N
#db3:/oracle/app/oracle/product/11.2.0.4:Y
Runtime:
./compare_db db1 db2
#!/bin/ksh
sid1=$1;
sid2=$2;
file=/etc/oratab
function compare {
home1= sed -n "s#${sid1}.*/\(.*\)${sid1}.*#\1#p" $file
home2= sed -n "s#${sid2}.*/\(.*\)${sid2}.*#\1#p" $file
if $home1 = $home2; then
echo "Success"
else
echo "Failure"
fi
}
Output: (I don't want to include last part "N/Y" after the : (colon))
home1 = /oracle/app/oracle/product/11.2.0.3
home2 = /oracle/app/oracle/product/11.2.0.3
db1 = db2
success
Obviously above is not working and only test code, does somebody comment and what's missing or how it can be done in elegant way?
Thanks,
awk works:
awk -F: "/^${mysid}/{printf \"%s\n\",\$2}" /etc/oratab
You can do it elegantly with this combination of programs :
home1=$(grep $sid1 $file | cut -d":" -f2) --> output /oracle/app/oracle/product/11.2.0.3
home2=$(grep $sid2 $file | cut -d":" -f2) --> output /oracle/app/oracle/product/11.2.0.3
grep --> finds the line in which the specified SID is
cut --> cuts the second field (specified by -f2), fields are delimited with the specified character -d":"

Want to sort a file based on another file in unix shell

I have 2 files refer.txt and parse.txt
refer.txt contains the following
julie,remo,rob,whitney,james
parse.txt contains
remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,whitney/hello/1.0,julie/hello/2.0,julie/hello/3.0,rob/hello/4.0,james/hello/6.0
Now my output.txt should list the files in parse.txt based on the order specified in refer.txt
ex of output.txt should be:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
i have tried the following code:
sort -nru refer.txt parse.txt
but no luck.
please assist me.TIA
You can do that using gnu-awk:
awk -F/ -v RS=',|\n' 'FNR==NR{a[$1] = (a[$1])? a[$1] "," $0 : $0 ; next}
{s = (s)? s "," a[$1] : a[$1]} END{print s}' parse.txt refer.txt
Output:
julie/hello/2.0,julie/hello/3.0,remo/hello/1.0,remo/hello2/2.0,remo/hello3/3.0,rob/hello/4.0,whitney/hello/1.0,james/hello/6.0
Explanation:
-F/ # Use field separator as /
-v RS=',|\n' # Use record separator as comma or newline
NR == FNR { # While processing parse.txt
a[$1]=(a[$1])?a[$1] ","$0:$0 # create an array with 1st field as key and value as all the
# records with keys julie, remo, rob etc.
}
{ # while processing the second file refer.txt
s = (s)?s "," a[$1]:a[$1] # aggregate all values by reading key from 2nd file
}
END {print s } # print all the values
In pure native bash (4.x):
# read each file into an array
IFS=, read -r -a values <parse.txt
IFS=, read -r -a ordering <refer.txt
# create a map from content before "/" to comma-separated full values in preserved order
declare -A kv=( )
for value in "${values[#]}"; do
key=${value%%/*}
if [[ ${kv[$key]} ]]; then
kv[$key]+=",$value" # already exists, comma-separate
else
kv[$key]="$value"
fi
done
# go through refer list, putting full value into "out" array for each entry
out=( )
for value in "${ordering[#]}"; do
out+=( "${kv[$value]}" )
done
# print "out" array in comma-separated form
IFS=,
printf '%s\n' "${out[*]}" >output.txt
If you're getting more output fields than you have input fields, you're probably trying to run this with bash 3.x. Since associative array support is mandatory for correct operation, this won't work.
tr , "\n" refer.txt | cat -n >person_id.txt # 'cut -n' not posix, use sed and paste
cat person_id.txt | while read person_id person_key
do
print "$person_id" > $person_key
done
tr , "\n" parse.txt | sed 's/(^[^\/]*)(\/.*)$/\1 \1\2/' >person_data.txt
cat person_data.txt | while read foreign_key person_data
do
person_id="$(<$foreign_key)"
print "$person_id" " " "$person_data" >>merge.txt
done
sort merge.txt >output.txt
A text book data processing approach, a person id table, a person data table, merged on a common key field, which is the first name of the person:
[person_key] [person_id]
- person id table, a unique sortable 'id' for each person (line number in this instance, since that is the desired sort order), and key for each person (their first name)
[person_key] [person_data]
- person data table, the data for each person indexed by 'person_key'
[person_id] [person_data]
- a merge of the 'person_id' table and 'person_data' table on 'person_key', which can then be sorted on person_id, giving the output as requested
The trick is to implement an associative array using files, the file name being the key (in this instance 'person_key'), the content being the value. [Essentially a random access file implemented using the filesystem.]
This actually adds a step to the otherwise simple but not very efficient task of grepping parse.txt with each value in refer.txt - which is more efficient I'm not sure.
NB: The above code is very unlikely to work out of the box.
NBB: On reflection, probably a better way of doing this would be to use the file system to create a random access file of parse.txt (essentially an index), and to then consider refer.txt as a batch file, submitting it as a job as such, printing out from the parse.txt random access file the data for each of the names read in from refer.txt in turn:
# 1) index data file on required field
cat person_data.txt | while read data
do
key="$(print "$data" | sed 's/(^[^\/]*)/\1/')" # alt. `cut -d'/' -f1` ??
print "$data" >>./person_data/"$key"
done
# 2) run batch job
cat refer_data.txt | while read key
do
print ./person_data/"$key"
done
However having said that, using egrep is probably just as rigorous a solution or at least for small datasets, I would most certainly use this approach given the specific question posed. (Or maybe not! The above could well prove faster as well as being more robust.)
Command
while read line; do
grep -w "^$line" <(tr , "\n" < parse.txt)
done < <(tr , "\n" < refer.txt) | paste -s -d , -
Key points
For both files, newlines are translated to commas using the tr command (without actually changing the files themselves). This is useful because while read and grep work under the assumption that your records are separated by newlines instead of commas.
while read will read in every name from refer.txt, (i.e julie, remo, etc.) and then use grep to retrieve lines from parse.txt containing that name.
The ^ in the regex ensures matching is only performed from the start of the string and not in the middle (thanks to #CharlesDuffy's comment below), and the -w option for grep allows whole-word matching only. For example, this ensures that "rob" only matches "rob/..." and not "robby/..." or "throb/...".
The paste command at the end will comma-separate the results. Removing this command will print each result on its own line.

appending text to specific line in file bash

So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.

Resources