How to manipulate a rows in a file using bash script? - bash

I have a file which has records from a table.
It has this format. Each column is separated by tabs
UUID number name
ac500f63-630d-4048-90cf-74bc85c1101c 1 Kane
47493ed9-008b-4dd6-88dc-d91fa64225b3 3 NULL
What I want to do is : columns need to be comma separated. and UUID and name columns need to be wrapped in single quotes and not enclose in single quotes when name is NULL. Even a row needs to be comma separated.
the sample output for the above is
'ac500f63-630d-4048-90cf-74bc85c1101c', 1, 'Kane'
'47493ed9-008b-4dd6-88dc-d91fa64225b3', 3 NULL
I will need these values for an INSERT query. Is there a way to achieve this with sed or awk commands ?

cat l.txt
UUID number name
ac500f63-630d-4048-90cf-74bc85c1101c 1 Kane
47493ed9-008b-4dd6-88dc-d91fa64225b3 3 NULL
cat p.sh
#!/bin/bash
awk '
NF == 3 && NR >= 2 { m=$3=="NULL"?32:39;printf("%c%s%c,%s, %c%s%c\n",39,$1,39,$2,m,$3,m);}
' $1
./p.sh l.txt
'ac500f63-630d-4048-90cf-74bc85c1101c',1,'Kane'
'47493ed9-008b-4dd6-88dc-d91fa64225b3',3, NULL
I hope it will good for you.

Related

Replace string with pattern keeping the string intact

Would to replace the string either with sed or awk, where it identifies a patter as mentioned below :
Example: It looks for word starting with "XX" and ending with "XX" and replace the word with concatinating with "${hf:" at the start of "XX" and "}" at the end of "XX"
INPUT
CREATE TABLE XX_DB_XX.test_XX_YYYYMMDD_XX
AS
SELECT id
FROM XX_R_DB_XX.usr_XX_YYYYMMDD_XX
WHERE year = XX_YYYY_XX
AND month = XX_MM_XX
AND day = XX_DD_XX;
OUTPUT
CREATE TABLE ${hf:XX_DB_XX}.test_${hf:XX_YYYYMMDD_XX}
AS
SELECT id
FROM ${hf:XX_R_DB_XX}.usr_${hf:XX_YYYYMMDD_XX}
WHERE year = ${hf:XX_YYYY_XX}
AND month = ${hf:XX_MM_XX}
AND day = ${hf:XX_DD_XX};
Tried to replace the pattern matchin, but the issue is , in the output I want replace the $A , to the subsquet "XX_(*)_XX" string available over the input file.
cat test.hql | gawk '{ print gensub(/XX_+[A-Z,_]+_XX/, "${hiveconf:$A}", 1)
}' | gawk '{ print gensub(/XX_+[A-Z]+_XX/, "${hiveconf:$A}", 1) }'
OUTPUT: -> That I recived needs to be updated w.r.t the string available, So how can this be done:
CREATE TABLE ${hiveconf:$A}.test_${hiveconf:$A}
AS
SELECT id
FROM ${hiveconf:$A}.usr_${hiveconf:$A}
WHERE year = ${hiveconf:$A}
AND month = ${hiveconf:$A}
AND day = ${hiveconf:$A};
Following awk may help you on same.
awk '{gsub(/XX_[a-zA-Z]+_XX/,"${hf:&}")} 1' Input_file
That's what sed exists for,
sed 's/XX[[:alnum:]_]*XX/${hf:&}/g' file
[[:alnum:]_] stands for Alpha numeric or underscore. The append * means zero or more times of it in regular expression.
Or you could do
sed 's/\(XX[^'XX']*XX\)/${hf:\1}/g'
in cases where there may be non alphanumeric characters as well in between the XXs.
First an XX is matched after which waits till an XX is found.

Parsing a space delimited file and performing operations in bash

I am trying to read in a basic, space delimited file in bash and I want to perform operations on the variables.
What is the nomenclature for referencing certain "columns" in bash?
I am trying to explicitly use bash. If there is useful documentation that references specifically how to delimit files and perform basic operations- that would be very useful.
An example of the text document I have would be as follows:
123456789 LastName FirstName 1 2 3
123456789 LastName FirstName 1 2 3
123456789 LastName FirstName 1 2 3
123456789 LastName FirstName 1 2 3
123456789 LastName FirstName 1 2 3
I would like to sort it and perform operations on multiple columns.
I have done this using awk, but I would like to do this in bash.
My awk implementation:
awk '{average = ($2 + $3 + $4)/3} {print (average, "["$1"]", $2",", $3); average = 0}' $'readme.txt'
How might this be achieved?
You'll want sort and cut for sorting and splitting respectively.
cut --delimiter=' ' --fields=LIST where LIST is a comma separated list of column indexes returns the sections of each space-split line denoted by the indexes in LIST.
sort --field-separator=' ' --keys=POS sorts the lines of your file and outputs them to stdout. --field-separator=' ' causes the positions to be delimited by spaces.
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
You can use expr and bc for the math.
wc -l will give you the count of total lines.
If you have headers which need ignoring, use tail -n +2 to get the whole file starting on the second line.
Strap everything together with pipes and subshells. In general the sort of processing you want is why awk has a place.

bash csv file column extraction and deduplication

I have a .csv file I am working with and I need to output another csv file that contains a de-deuplicated list of columns 2 and 6 from the first csv with some caveats.
This is a bit difficult to explain in words but here is an example of what my input is:
"customer_name","cid”,”boolean_status”,”type”,”number”
“conotoso, inc.”,”123456”,”TRUE”,”Inline”,”210”
"conotoso, inc.","123456”,”FALSE”,”Inline”,”411"
“afakename”,”654321”,”TRUE","Inline”,”253”
“bfakename”,”909090”,”FALSE”,”Inline”,”321”
“cfakename”,”121212”,”TRUE","Inline","145”
what I need for this to do is create a new .csv file containing only "customer_name" column and "boolean_status" column.
Now, I also need there to be only one line for "customer_name" and to show "TRUE" if ANY of the customer_name matches a "true" value in the boolean column.
The output from the above input should be this:
"customer_name",”boolean_status”
“conotoso, inc.”,”TRUE”
“afakename”,”TRUE"
“cfakename”,”TRUE"
So far I tried
awk -F "\"*\",\"*\"" '{print $1","$6}' data1.csv >data1out.csv
to give me the output file, but then I attempted to cat data1out.csv | grep 'TRUE' with no good luck
can someone help me out on what i should do to manipulate this properly?
I'm also running into issues with the awk printing out the leading commas
All I really need at the end is a number of "how many unique 'customer_names' have at least 1 'True' in the "boolean" column?"
You will get your de duplicated file by using
sort -u -t, -k2,2 -k6,6 filname>sortedfile
Post this you can write a script to extract the columns required.
while read line
do
grep "TRUE" "$line"
if [ $? -eq 0]
then
a=$(cut -d',' -f1-f3 $line)
echo a >>outputfile
fi
done<<sortedfile

Bash Find Null values of all variables after equal sign in a file

I have a configuration(conf.file) with list of variables and its values generated from shell script
cat conf.file
export ORA_HOME=/u01/app/12.1.0
export ORA_SID=test1
export ORA_LOC=
export TW_WALL=
export TE_STAT=YES
I want to find any variable has null value after equal(=) symbol, if so, then report the message as Configuration file has following list of null variables
You can use awk for this:
awk -F"[= ]" '$3=="" && NF==3 {print $2}' conf.file
That will split each record by a space or an equal sign, then test the third field in each row. If it's empty, it will print the second field (the variable).
UPDATE: Added in a test for Number of Fields (NF) equal to 3 to avoid null rows.
try:
awk -F"=" '$2' Input_file
As you need after = a field shouldn't be empty so making = as a field separator and checking if 2nd field is not empty then no action defined in my code so default print action will happen for any line which satisfy this condition. Let me know if this helps.
EDIT: Above will give only those values whose values are NULL after =, thanks to JNevill for letting me know that requirement is exactly opposite, following may help now in same.
awk -F"=" '!$2{gsub(/.* |=/,"",$1);print $1}' Input_file

Analyze a control table by Shell Script

A shell script is analysing a control table to get the right parameter for it's processing.
Currently, it is simple - using grep, it points to the correct line, awk {print $n} determines the right columns.
Columns are separated by space only. No special rules, just values separated by space.
All is fine and working, the users like it.
As long as none of the columns is left empty. For last colum, it's ok to leave it empty, but if somebody does not fill in a column in the mid, it confuses the awk {print $n} logic.
Of course, one could as the users to fill in every entry, or one could just define the column delimiter as ";" .
In case something is skipped, one could use " ;; " However, I would prefer not to change table style.
So the question is:
How to effectively analyze a table having blanks in colum values? Table is like this:
ApplikationService ServerName PortNumber ControlValue_1 ControlValue_2
Read chavez.com 3599 john doe
Write 3345 johnny walker
Update curiosity.org jerry
What might be of some help:
If there is a value set in a column, it is (more a less precise) under its column header description.
Cheers,
Tarik
You don't say what your desired output is but this shows you the right approach:
$ cat tst.awk
NR==1 {
print
while ( match($0,/[^[:space:]]+[[:space:]]*/) ) {
width[++i] = RLENGTH
$0 = substr($0,RSTART+RLENGTH)
}
next
}
{
i = 0
while ( (fld = substr($0,1,width[++i])) != "" ) {
gsub(/^ +| +$/,"",fld)
printf "%-*s", width[i], (fld == "" ? "[empty]" : fld)
$0 = substr($0,width[i]+1)
}
print ""
}
$
$ awk -f tst.awk file
ApplikationService ServerName PortNumber ControlValue_1 ControlValue_2
Read chavez.com 3599 john doe
Write [empty] 3345 johnny walker
Update curiosity.org [empty] jerry [empty]
It uses the width of the each field in the title line to determine the width of every field in every line of the file, and then just replaces empty fields with the string "[empty]" and left-aligns every field just to pretty it up a bit.

Resources