need help on unix Script to read a data from specific position and use the extracted in query - shell

Input file:
ADSDWETTYT017775227ACG
ADSDWETTYT029635225HCG
ADSDWETTYC018525223JCG
ADSDWETTYC987415221ACG
ADSDWETTCC891235219ACG
ADSDWETTTT074565217ACG
ADSDWETTYT567895213ACG
ADSDWETTYH037535215ACG
ADSDWETTYC051595211ACG
ADSDWETTYT052465209ACG
ADSDWETTYT067595207ACG
ADSDWETTYT077515205ACG
need to check the 10 position on the file contain/start with T, if its start with "T" then i need to take the value from 14 char from 16.
from the above file am expecting the below output,
'5227','5225','5217','5213','5209','5207','5205'
this result i should assigned to some constant like (result below) and should be used in the query where clause like below,
result=$(awk '
BEGIN{
conf="" };
{ if(substr($0,10,1)=="T"){
conf=substr($0,16,4);
{NT==1?s="'\''"conf:s=s"'\'','\''"conf}
}
}
END {print s"'\''"}
' $INPUT_FILE_PATH
db2 "EXPORT TO ${OUTPUT_FILE} OF DEL select STATUS FROM TRAN where TN_NR in (${result})"
I need some help to enhance the awk command and passing the constant in query where clause. kindly help.

With your shown samples, attempts; please try following awk code.
awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file
Adding non-one liner form of above code:
awk -v s1="'" '
BEGIN{ OFS=", " }
substr($0,10,1)=="T"{
val=(val?val OFS:"") (s1 substr($0,16,4) s1)
}
END{
print val
}
' Input_file
To save output of this code into a shell variable try following:
value=$(awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file)
Explanation: Adding detailed explanation for above code.
awk -v s1="'" ' ##Starting awk program from here setting s1 to ' here.
BEGIN{ OFS=", " } ##Setting OFS as comma space here.
substr($0,10,1)=="T"{ ##Checking condition if 10th character is T then do following.
val=(val?val OFS:"") (s1 substr($0,16,4) s1) ##Creating val which has values from current line as per OP requirement.
}
END{ ##Starting END block of this program from here.
print val ##Printing val here.
}
' Input_file ##Mentioning Input_file name here.

I'd use sed for this:
sed -En '/^.{9}T/ s/^.{15}(....).*/\1/p' file
And then to get your exact output, pipe that into
... | sed "s/.*/'&'/" | paste -sd,

I'd use perl over awk here for its better arrays (in particular, joining one into a string). Something like:
perl -nE "push #n, substr(\$_, 15, 4) if /^.{9}T/;
END { say join(',', map { \"'\$_'\" } #n) }" "$INPUT_FILE_PATH"

Related

awk to get value for a column of next line and add it to the current line in shellscript

I have a csv file lets say lines
cat lines
1:abc
6:def
17:ghi
21:tyu
I wanted to achieve something like this
1:6:abc
6:17:def
17:21:ghi
21::tyu
Tried the below code by didn't work
awk 'BEGIN{FS=OFS=":"}NR>1{nln=$1;cl=$2}NR>0{print $1,nln,$2}' lines
1::abc
6:6:def
17:17:ghi
21:21:tyu
Can you please help ?
Here is a potential AWK solution:
cat lines
1:abc
6:def
17:ghi
21:tyu
awk -F":" '{num[NR]=$1; letters[NR]=$2}; END{for(i=1;i<=NR;i++) print num[i] ":" num[i + 1] ":" letters[i]}' lines
1:6:abc
6:17:def
17:21:ghi
21::tyu
Formatted:
awk '
BEGIN {FS=OFS=":"}
{
num[NR] = $1;
letters[NR] = $2
}
END {for (i = 1; i <= NR; i++)
print num[i], num[i + 1], letters[i]
}
' lines
1:6:abc
6:17:def
17:21:ghi
21::tyu
Basically this is your solution but I switched the order of the code blocks and added the END block to output the last record, you were close:
awk 'BEGIN{FS=OFS=":"}FNR>1{print p,$1,q}{p=$1;q=$2}END{print p,"",q}' file
Explained:
$ awk 'BEGIN {
FS=OFS=":" # delims
}
FNR>1 { # all but the first record
print p,$1,q # output $1 and $2 from the previous round
}
{
p=$1 # store for the next round
q=$2
}
END { # gotta output the last record in the END
print p,"",q # "" feels like cheating
}' file
Output:
1:6:abc
6:17:def
17:21:ghi
21::tyu
1st solution: Here is a tac + awk + tac solution. Written and tested with shown samples only.
tac Input_file |
awk '
BEGIN{
FS=OFS=":"
}
{
prev=(prev?$2=prev OFS $2:$2=OFS $2)
}
{
prev=$1
}
1
' | tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Printing lines from bottom to top of Input_file.
awk ' ##Getting input from previous command as input to awk.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
prev=(prev?$2=prev OFS $2:$2=OFS $2) ##Creating prev if previous NOT NULL then add its value prior to $2 with prev OFS else add OFS $2 in it.
}
{
prev=$1 ##Setting prev to $1 value here.
}
1 ##printing current line here.
' | tac ##Sending awk output to tac to make it in actual sequence.
2nd solution: Adding Only awk solution with 2 times passing Input_file to it.
awk '
BEGIN{
FS=OFS=":"
}
FNR==NR{
if(FNR>1){
arr[FNR-1]=$1
}
next
}
{
$2=(FNR in arr)?(arr[FNR] OFS $2):OFS $2
}
1
' Input_file Input_file

Merge rows with same value and every 100 lines in csv file using command

I have a csv file like below:
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
...
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
...
I want combine the csv file to new csv file like below:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
http://www.z.com/4
...
http://www.z.com/100
",flower
"http://www.z.com/101
http://www.z.com/102
http://www.z.com/103
http://www.z.com/104
...
http://www.z.com/200
",flower
I want keep the first column every cell have max 100 lines http url.
Column two same value will appear in corresponding cell.
Is there a very simple command pattern to achieve this idea ?
I used command below:
awk '{if(NR%100!=0)ORS="\t";else ORS="\n"}1' test.csv > result.csv
$ awk -F, '$2!=p || n==100 {if(NR!=1) print "\"," p; printf "\""; p=$2; n=0}
{print $1; n+=1} END {print "\"," p}' test.csv
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
",flower
First set the field separator to the comma (-F,). Then:
If the second field changes ($2!=p) or if we already printed 100 lines in the current batch (n==100):
if it is not the first line, print a double quote, a comma, the previous second field and a newline,
print a double quote,
store the new second field in variable p for later comparisons,
reset line counter n.
For all lines print the first field and increment line counter n.
At the end print a double quote, a comma and the last value of second field.
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
' Input_file
2nd solution: In case your Input_file is NOT sorted with 2nd column then try following sort + awk code.
sort -t, -k2 Input_file |
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
'
Output will be as follows:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3",flower
Given:
cat file
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
Here is a two pass awk to do this:
awk -F, 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
If you want to print either at the change of the $2 value or at some fixed line interval (like 100) you can do:
awk -F, -v n=100 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR || FNR%n==0{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
Either prints:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4"
,apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3"
,flower

Copy one csv header to another csv with type modification

I want to copy one csv header to another in row wise with some modifications
Input csv
name,"Mobile Number","mobile1,mobile2",email2,Address,email21
test, 123456789,+123456767676,a#test.com,testaddr,a1#test.com
test1,7867778,8799787899898,b#test,com, test2addr,b2#test.com
In new csv this should be like this and file should also be created. And for sting column I will pass the column name so only that column will be converted to string
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
As you see above all these header with type modification should be inserted in different rows
I have tried with below command but this is only for copy first row
sed '1!d' input.csv > output.csv
You may try this alternative gnu awk command as well:
awk -v FPAT='"[^"]+"|[^,]+' 'NR == 1 {
for (i=1; i<=NF; ++i)
print gensub(/"/, "", "g", $i) "." ($i ~ /,/ ? "string" : "auto") "()"
exit
}' file
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
Or using sed:
sed -i -e '1i 1234567890.string(),My address is test.auto(),abc3#gmail.com.auto(),120000003.auto(),abc-003.auto(),3.com.auto()' -e '1d' test.csv
EDIT: As per OP's comment to print only first line(header) please try following.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
exit
}
' Input_file > output_file
Could you please try following, written and tested with GUN awk with shown samples.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program and setting FPAT to [^,]*|"[^"]+".
FNR==1{ ##Checking condition if this is first line then do following.
for(i=1;i<=NF;i++){ ##Running for loop from i=1 to till NF value.
if($i~/^".*,.*"$/){ ##Checking condition if current field starts from " and ends with " and having comma in between its value then do following.
gsub(/"/,"",$i) ##Substitute all occurrences of " with NULL in current field.
print $i".string()" ##Printing current field and .string() here.
}
else{ ##else do following.
print $i".auto()" ##Printing current field dot auto() string here.
}
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

Use sed (or similar) to remove anything between repeating patterns

I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".
Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.
I would like to get from this:
1,2,"a",4,"b","c",5
To this:
1,2,,4,,,5
Is there a sed wizard who can help? :)
You may use
sed 's/"[^"]*"//g' file > newfile
See online sed demo:
s='1,2,"a",4,"b","c",5'
sed 's/"[^"]*"//g' <<< "$s"
# => 1,2,,4,,,5
Details
The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.
Could you please try following.
awk -v s1="\"" 'BEGIN{FS=OFS=","} {for(i=1;i<=NF;i++){if($i~s1){$i=""}}} 1' Input_file
Non-one liner form of solution is:
awk -v s1="\"" '
BEGIN{
FS=OFS=","
}
{
for(i=1;i<=NF;i++){
if($i~s1){
$i=""
}
}
}
1
' Input_file
Detailed explanation:
awk -v s1="\"" ' ##Starting awk program from here and mentioning variable s1 whose value is "
BEGIN{ ##Starting BEGIN section of this code here.
FS=OFS="," ##Setting field separator and output field separator as comma(,) here.
}
{
for(i=1;i<=NF;i++){ ##Starting a for loop which traverse through all fields of current line.
if($i~s1){ ##Checking if current field has " in it if yes then do following.
$i="" ##Nullifying current field value here.
}
}
}
1 ##Mentioning 1 will print edited/non-edited line here.
' Input_file ##Mentioning Input_file name here.
With Perl:
perl -p -e 's/".*?"//g' file
? forces * to be non-greedy.
Output:
1,2,,4,,,5

Replace the first column in a file with another column in different file using shell

I have two files file1 and file2
file1
Shyam=123=12.3.4.5=user#gmail.com
Shyam=123=12.2.5.4=user#gmail.com
Joshwa=234=14.3.4.67=user#gmail.com
Anil=879=15.3.4.98=user#gmail.com
Anil=765=15.4.5.65=user#gmail.com
.......
file2
Shyam=ShyamLal
Joshwa=JoshwaSam
Anil=AnilAcharya
....
"=" is mentioned as a seperator in file1 and file2.
I want to update names as given in file2. ie.,Shyam will be replaced with ShyamLal, Joshwa will be replaced with JoshwaSam and Anil will be replaced with AnilAcharya. I don't want to use if-else condition, because I have large number of datas.
My output should be like:
ShyamLal=123=12.3.4.5=user#gmail.com
ShyamLal=123=12.2.5.4=user#gmail.com
JoshwaSam=234=14.3.4.67=user#gmail.com
AnilAcharya=879=15.3.4.98=user#gmail.com
AnilAcharya=765=15.4.5.65=user#gmail.com.
I tried this. But don't know whether I am doing right
while IFS= read -r line
do
key=`echo $line | awk -F "=" '{print $1}'` < file1.txt
value=`echo $line | awk -F "=" '{print $2}' < file2.txt`
cat file1.txt | sed 's/$key/$value/g'
done
How can I proceed?
Could you please try following.
awk '
BEGIN{
FS=OFS="="
}
FNR==NR{
a[$1]=$2
next
}
($1 in a){
$1=a[$1]
}
1
' Input_file2 Input_file1
Explanation: Adding detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section here.
FS=OFS="=" ##Setting FS and OFS as = for all lines here.
} ##Closing BLOCK for BEGIN section of this program here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when Input_file file2 is being read.
a[$1]=$2 ##Creating an array named a with index $1 with value of $2 of current line.
next ##next will skip all further statements from here.
}
($1 in a){ ##Checking condition if $1 is present in array a this will be done when Input_file1 is being read.
$1=a[$1] ##Setting $1 to array a value with index $1 of current line.
}
1 ##1 will print edited/non-edited line here.
' file2 file1 ##Mentioning Input_file names here.

Resources