How to process file content differently for each line using shell script? - bash

I have a file which has this data -
view:
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
tables:
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
end:
I have to read the file and store the contents in different variables.
viewData=$(sed '/view/,/tables/!d;/tables/q' $file|sed '$d')
tableData=$(sed '/tables/,/end/!d;/end/q' $file|sed '$d')
echo $viewData
view:
schema1.view1:/some-path/view1.sql
schema2.view2:/some-path/view2.sql
echo $tableData
tables:
schema1.table1:/some-path/table1.sql
schema2.table2:/some-path/table2.sql
dataArray=("$viewData" "$tableData")
I need to use a for loop over dataArray so that I get all the components in 4 different variables.
Lets say for $viewData, the loop should be able to print like this -
objType=view
schema=schema1
view=view1
fileLoc=some-path/view1.sql
objType=view
schema=schema2
view=view2
fileLoc=some-path/view2.sql
I have tried sed and cut commands but that is not working properly. And I need to do this using shell script only.
Any help will be appreciated. Thanks!

remark: If you add a space character between the : and / in the input then you would be able to use YAML-aware tools for parsing it robustly.
Given your sample input you, can use this awk for generating the expected blocks:
awk '
match($0,/[^[:space:]]+:/) {
key = substr($0,RSTART,RLENGTH-1)
val = substr($0,RSTART+RLENGTH)
if (i = index(key,".")) {
print "objType=" type
print "schema=" substr(key,1,i-1)
print "view=" substr(key,i+1)
print "fileLoc=" val
printf "%c", 10
} else
type = key
}
' data.txt
objType=view
schema=schema1
view=view1
fileLoc=/some-path/view1.sql
objType=view
schema=schema2
view=view2
fileLoc=/some-path/view2.sql
objType=tables
schema=schema1
view=table1
fileLoc=/some-path/table1.sql
objType=tables
schema=schema2
view=table2
fileLoc=/some-path/table2.sql

Related

how to replace delimiter value present within quotes as part of data in file

I want to replace delimiter which is part of data from each records. For Ex-
echo '"hi","how,are,you","bye"'|sed -nE 's/"([^,]*),([^,]*),([^,]*)"/"\1;\2;\3"/gp'
output -->
"hi","how;are;you","bye"
So, I am able to replace delimiter(comma in this case), which is present in data also with semi colon.
But the challenge is, in real time, we are not sure how many times delmiter will be present and also, it may come in multiple fields as well.
For Ex-
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
All these are valid records.
Can anyone help me out here. how can we write a generic code to handle this?
When working with anything but the most trivial CSV data, I prefer to use something that understands the format directly instead of messing with regular expressions to try to handle things like quoted fields. For example (Warning: Blatant self promotion ahead!), my tcl-based awk-like utility tawk, which I wrote in part to make it easier to manipulate CSV files:
 $ tawk -csv -quoteall '
line {
for {set n 1} {$n <= $NF} {incr n} {
set F($n) [string map {, \;} $F($n)]
}
print
}' input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
Or a perl approach using the Text::CSV_XS module:
$ perl -MText::CSV_XS -e '
my $csv = Text::CSV_XS->new({binary=>1, always_quote=>1});
while (my $row = $csv->getline(\*STDIN)) {
tr/,/;/ foreach #$row;
$csv->say(\*STDOUT, $row);
}' < input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
Assuming the data does not contain embedded double quotes ...
Sample data:
$ cat delim.dat
"hi","how,are,you","bye"
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
One awk idea whereby we replace , with ; in the even numbered fields:
awk '
BEGIN { FS=OFS="\"" }
{ for (i=2;i<=NF;i=i+2) gsub(",",";",$i) }
1
' delim.dat
This generates:
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"

Assign the value of awk-for loop variable to a bash variable

content within the tempfile
123 sam moore IT_Team
235 Rob Xavir Management
What i'm trying to do is get input from user and search it in the tempfile and output of search should give the column number
Code I have for that
#!/bin/bash
set -x;
read -p "Enter :" sword6;
awk 'BEGIN{IGNORECASE = 1 }
{
for(i=1;i<=NF;i++) {
if( $i ~ "'$sword6'$" )
print i;
}
} ' /root/scripts/pscripts/tempprint.txt;
This exactly the column number
Output
Enter : sam
2
What i need is the value of i variable should be assigned to bash variable so i can call as per the need in script.
Any help in this highly appreciated.
I searched to find any existing answer but not able to find any. If any let me know please.
first of all, you should pass your shell variable to awk in this way (e.g. sword6)
awk -v word="$sword6" '{.. if($i ~ word)...}`...
to assign shell variable by the output of other command:
shellVar=$(awk '......')
Then you can continue using $shellVar in your script.
regarding your awk codes:
if user input some special chars, your script may fail, e.g .*
if one column had matched userinput multiple times, you may have duplicated output.
if your file had multi-columns matching user input, you may want to handle it.
You just need to capture the output of awk. As an aside, I would pass sword6 as an awk variable, not inject it via string interpolation.
i=$(awk -v w="$sword6" '
BEGIN { IGNORECASE = 1 }
{ for (i=1;i<=NF;i++) {
if ($i ~ w"$") { print i; }
}
}' /root/scripts/pscipts/tempprint.txt)
Following script may help you on same too.
cat script.ksh
echo "Please enter the user name:"
read var
awk -v val="$var" '{for(i=1;i<=NF;i++){if(tolower($i)==tolower(val)){print i,$i}}}' Input_file
If tempprint.txt is big
awk -v w="$word6" '
BEGIN { IGNORECASE = 1 }
"$0 ~ \\<w\\>" {
for(i=1;i<=NF;i++)
if($i==w)print i
}' tempprint.txt

Adding file information to an AWK comparison

I'm using awk to perform a file comparison against a file listing in found.txt
while read line; do
awk 'FNR==NR{a[$1]++;next}$1 in a' $line compare.txt >> $CHECKFILE
done < found.txt
found.txt contains full path information to a number of files that may contain the data. While I am able to determine that data exists in both files and output that data to $CHECKFILE, I wanted to be able to put the line from found.txt (the filename) where the line was found.
In other words I end up with something like:
File " /xxxx/yyy/zzz/data.txt "contains the following lines in found.txt $line
just not sure how to get the /xxxx/yyy/zzz/data.txt information into the stream.
Appended for clarification:
The file found.txt contains the full path information to several files on the system
/path/to/data/directory1/file.txt
/path/to/data/directory2/file2.txt
/path/to/data/directory3/file3.txt
each of the files has a list of parameters that need to be checked for existence before appending additional information to them later in the script.
so for example, file.txt contains the following fields
parameter1 = true
parameter2 = false
...
parameter35 = true
the compare.txt file contains a number of parameters as well.
So if parameter35 (or any other parameter) shows up in one of the three files I get it's output dropped to the Checkfile.
Both of the scripts (yours and the one I posted) will give me that output but I would also like to echo in the line that is being read at that point in the loop. Sounds like I would just be able to somehow pipe it in, but my awk expertise is limited.
It's not really clear what you want but try this (no shell loop required):
awk '
ARGIND==1 { ARGV[ARGC] = $0; ARGC++; next }
ARGIND==2 { keys[$1]; next }
$1 in keys { print FILENAME, $1 }
' found.txt compare.txt > "$CHECKFILE"
ARGIND is gawk-specific, if you don't have it add FNR==1{ARGIND++}.
Pass the name into awk inside a variable like this:
awk -v file="$line" '{... print "File: " file }'

to align rows fetched from shell script output

The output from the shell scripts is like :
343434345,5454645645645,ACTIVE,2011-05-25 14:34;refid=134053
90092;pep.state=ACTIVATED
343434345,5454645645645,ACTIVE,2011-05-25 14:34;refid=134053
90092;pep.state=ACTIVATED
And it gets pasted in the editplus in identical manner.
But I want my output to be in complete line instead of two lines. Like :
343434345,5454645645645,ACTIVE,2011-05-25 14:34;refid=13405390092;pep.state=ACTIVATED
343434345,5454645645645,ACTIVE,2011-05-25 14:34;refid=13405390092;pep.state=ACTIVATED
P.S. Data been fetched from database.
How can that be possible ? Pls advise !
(g)awk to the rescue:
awk 'NR % 2 == 1 { saved_line=$0 ; next } { print saved_line $0 }' INPUTFILE
will do.
It will save every odd lines to a variable, then prints it and the next line. Note: it can be done more than one way, e.g. this does the same:
awk '{printf("%s",$0) ; getline ; printf("%s\n",$0)} INPUTFILE
HTH

unix ksh retrieve oracle query result

I'm working on a small piece of ksh code for a simple task.
I need to retrieve about 14 millions lines from a table and then generate a xml file using this informations. I don't have any treatement on the information, only some "IF".
The problem is that for writing the file it takes about 30 minutes, and it is not acceptable for me.
This is a piece o code:
......
query="select field1||','||field2||' from table1"
ctl_data=`sqlplus -L -s $ORA_CONNECT #$REQUEST`
for variable in ${ctl_data}
do
var1=echo ${variable} | awk -F, '{ print $1 }'
var2=echo ${variable} | awk -F, '{ print $2 }'
....... write into the file ......
done
For speed up the things I'm writing only 30 lines into the file, so more stuff on one line, so I have only 30 acces to the file.
It is still long, so is not the writing but looping through the results.
Anyone have a ideea about how to improve it ?
Rather than pass from oracle to ksh could you do it all in oracle?
You can use the following to format your output as xml.
select xmlgen.getxml('select field1,field2 from table1') from dual;
You may be able to eliminate the calls to awk:
saveIFS="$IFS"
IFS=,
array=($variable)
IFS="$saveIFS"
var1=${array[0]} # or just use the array's elements in place of var1 and var2
var2=${array[1]}
you can lessen the amount of calls to awk using just one instance. eg
query="select codtit||','||crsspt||' from table1"
.....
sqlplus -L -s $ORA_CONNECT #$REQUEST | awk -F"," 'BEGIN{
print "xml headers here..."
}
{
# generate xml here..
print "<tag1>variable 1 is "$1"</tag1>"
print "<tag2>variable 2 is "$2" and so on..</tag2>"
if ( some condition here is true ){
print "do something here"
}
}'
redirect the above to a new file as necessary using > or >>
I doubt, that this is the most efficient way of dumping data to an xml file. You could try to use groovy for such a task. Take a look at the groovy cookbook at -> http://groovy.codehaus.org/Convert+SQL+Result+To+XML

Resources