awk to get value for a column of next line and add it to the current line in shellscript - shell

I have a csv file lets say lines
cat lines
1:abc
6:def
17:ghi
21:tyu
I wanted to achieve something like this
1:6:abc
6:17:def
17:21:ghi
21::tyu
Tried the below code by didn't work
awk 'BEGIN{FS=OFS=":"}NR>1{nln=$1;cl=$2}NR>0{print $1,nln,$2}' lines
1::abc
6:6:def
17:17:ghi
21:21:tyu
Can you please help ?

Here is a potential AWK solution:
cat lines
1:abc
6:def
17:ghi
21:tyu
awk -F":" '{num[NR]=$1; letters[NR]=$2}; END{for(i=1;i<=NR;i++) print num[i] ":" num[i + 1] ":" letters[i]}' lines
1:6:abc
6:17:def
17:21:ghi
21::tyu
Formatted:
awk '
BEGIN {FS=OFS=":"}
{
num[NR] = $1;
letters[NR] = $2
}
END {for (i = 1; i <= NR; i++)
print num[i], num[i + 1], letters[i]
}
' lines
1:6:abc
6:17:def
17:21:ghi
21::tyu

Basically this is your solution but I switched the order of the code blocks and added the END block to output the last record, you were close:
awk 'BEGIN{FS=OFS=":"}FNR>1{print p,$1,q}{p=$1;q=$2}END{print p,"",q}' file
Explained:
$ awk 'BEGIN {
FS=OFS=":" # delims
}
FNR>1 { # all but the first record
print p,$1,q # output $1 and $2 from the previous round
}
{
p=$1 # store for the next round
q=$2
}
END { # gotta output the last record in the END
print p,"",q # "" feels like cheating
}' file
Output:
1:6:abc
6:17:def
17:21:ghi
21::tyu

1st solution: Here is a tac + awk + tac solution. Written and tested with shown samples only.
tac Input_file |
awk '
BEGIN{
FS=OFS=":"
}
{
prev=(prev?$2=prev OFS $2:$2=OFS $2)
}
{
prev=$1
}
1
' | tac
Explanation: Adding detailed explanation for above code.
tac Input_file | ##Printing lines from bottom to top of Input_file.
awk ' ##Getting input from previous command as input to awk.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS as colon here.
}
{
prev=(prev?$2=prev OFS $2:$2=OFS $2) ##Creating prev if previous NOT NULL then add its value prior to $2 with prev OFS else add OFS $2 in it.
}
{
prev=$1 ##Setting prev to $1 value here.
}
1 ##printing current line here.
' | tac ##Sending awk output to tac to make it in actual sequence.
2nd solution: Adding Only awk solution with 2 times passing Input_file to it.
awk '
BEGIN{
FS=OFS=":"
}
FNR==NR{
if(FNR>1){
arr[FNR-1]=$1
}
next
}
{
$2=(FNR in arr)?(arr[FNR] OFS $2):OFS $2
}
1
' Input_file Input_file

Related

awk from file using echo and output to file

A.txt contains:
/*333*/
asdfasdfadfg
sadfasdfasgadas
###
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
###
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
###
B.txt contains:
555
777
I want to create the loop, for each string found in B.txt, then output the '/*'[the string] until right before the first '###' met to each own file (the string name is also used as file name).
So based on the sample above, the result should be :
555.txt, which contains:
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
and 777.txt, which contains:
/*777*/
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
I tried this script but it outputs nothing:
for i in `cat B.txt`; do echo $i | awk '/{print "/*"$1}/{flag=1} /###/{flag=0} flag' A.txt > $i.txt; done
Thank you in advance
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
FNR==NR{
if($0~/^\/\*/){
line=$0
gsub(/^\/\*|\*\/$/,"",line)
arr[++count]=$0
arr1[line]=count
next
}
arr[count]=(arr[count]?arr[count] ORS:"") $0
next
}
($0 in arr1){
outputFile=$0".txt"
print arr[arr1[$0]] >> (outputFile)
close(outputFile)
}
' file1 file2
Explanation: Adding detailed explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when file1 is being read.
if($0~/^\/\*/){ ##Checking condition if current line starts with /* then do following.
line=$0 ##Setting $0 to line variable here.
gsub(/^\/\*|\*\/$/,"",line) ##using gsub to globally substitute starting /* and ending */ with NULL in line here.
arr[++count]=$0 ##Creating arr with index of ++count and value is $0.
arr1[line]=count ##Creating arr1 with index of line and value of count.
next ##next will skip all further statements from here.
}
arr[count]=(arr[count]?arr[count] ORS:"") $0 ##Creating arr with index of count and keep appending values of same count values with current line value.
next ##next will skip all further statements from here.
}
($0 in arr1){ ##checking if current line is present in arr1 then do following.
outputFile=$0".txt" ##Creating outputFile with current line .txt value here.
print arr[arr1[$0]] >> (outputFile) ##Printing arr value with index of arr1[$0] to outputFile.
close(outputFile) ##Closing outputFile in backend to avoid too many opened files error.
}
' file1 file2 ##Mentioning Input_file names here.
Making a few alterations to your code provides the desired outcome with the example data provided:
while read -r f
do
awk -v var="/[*]$f[*]/" '$0 ~ var {flag=1} /###/{flag=0} flag' A.txt > "$f".txt
done < B.txt
cat 555.txt
/*555*/
hfawehfihohawe
aweihfiwahif
aiwehfwwh
cat 777.txt
jawejfiawjia
ajwiejfjeiie
eiuehhawefjj
Does this solve your problem?
Here is another awk solution for this:
awk '
FNR == NR {
map["/*" $0 "*/"] = $0
next
}
$0 in map {
fn = map[$0] ".txt"
}
/^###$/ {
close(fn)
fn = ""
}
fn {print > fn}' B.txt A.txt

Merge rows with same value and every 100 lines in csv file using command

I have a csv file like below:
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
...
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
...
I want combine the csv file to new csv file like below:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
http://www.z.com/4
...
http://www.z.com/100
",flower
"http://www.z.com/101
http://www.z.com/102
http://www.z.com/103
http://www.z.com/104
...
http://www.z.com/200
",flower
I want keep the first column every cell have max 100 lines http url.
Column two same value will appear in corresponding cell.
Is there a very simple command pattern to achieve this idea ?
I used command below:
awk '{if(NR%100!=0)ORS="\t";else ORS="\n"}1' test.csv > result.csv
$ awk -F, '$2!=p || n==100 {if(NR!=1) print "\"," p; printf "\""; p=$2; n=0}
{print $1; n+=1} END {print "\"," p}' test.csv
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4
",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3
",flower
First set the field separator to the comma (-F,). Then:
If the second field changes ($2!=p) or if we already printed 100 lines in the current batch (n==100):
if it is not the first line, print a double quote, a comma, the previous second field and a newline,
print a double quote,
store the new second field in variable p for later comparisons,
reset line counter n.
For all lines print the first field and increment line counter n.
At the end print a double quote, a comma and the last value of second field.
1st solution: With your shown samples, please try following awk code.
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
' Input_file
2nd solution: In case your Input_file is NOT sorted with 2nd column then try following sort + awk code.
sort -t, -k2 Input_file |
awk '
BEGIN{
s1="\""
FS=OFS=","
}
prev!=$2 && prev{
print s1 val s1,prev
val=""
}
{
val=(val?val ORS:"")$1
prev=$2
}
END{
if(val){
print s1 val s1,prev
}
}
'
Output will be as follows:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4",apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3",flower
Given:
cat file
http://www.a.com/1,apple
http://www.a.com/2,apple
http://www.a.com/3,apple
http://www.a.com/4,apple
http://www.z.com/1,flower
http://www.z.com/2,flower
http://www.z.com/3,flower
Here is a two pass awk to do this:
awk -F, 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
If you want to print either at the change of the $2 value or at some fixed line interval (like 100) you can do:
awk -F, -v n=100 'FNR==NR{seen[$2]=FNR; next}
seen[$2]==FNR || FNR%n==0{
printf("\"%s%s\"\n,%s\n",data,$1,$2)
data=""
next
}
{data=data sprintf("%s\n",$1)}' file file
Either prints:
"http://www.a.com/1
http://www.a.com/2
http://www.a.com/3
http://www.a.com/4"
,apple
"http://www.z.com/1
http://www.z.com/2
http://www.z.com/3"
,flower

need help on unix Script to read a data from specific position and use the extracted in query

Input file:
ADSDWETTYT017775227ACG
ADSDWETTYT029635225HCG
ADSDWETTYC018525223JCG
ADSDWETTYC987415221ACG
ADSDWETTCC891235219ACG
ADSDWETTTT074565217ACG
ADSDWETTYT567895213ACG
ADSDWETTYH037535215ACG
ADSDWETTYC051595211ACG
ADSDWETTYT052465209ACG
ADSDWETTYT067595207ACG
ADSDWETTYT077515205ACG
need to check the 10 position on the file contain/start with T, if its start with "T" then i need to take the value from 14 char from 16.
from the above file am expecting the below output,
'5227','5225','5217','5213','5209','5207','5205'
this result i should assigned to some constant like (result below) and should be used in the query where clause like below,
result=$(awk '
BEGIN{
conf="" };
{ if(substr($0,10,1)=="T"){
conf=substr($0,16,4);
{NT==1?s="'\''"conf:s=s"'\'','\''"conf}
}
}
END {print s"'\''"}
' $INPUT_FILE_PATH
db2 "EXPORT TO ${OUTPUT_FILE} OF DEL select STATUS FROM TRAN where TN_NR in (${result})"
I need some help to enhance the awk command and passing the constant in query where clause. kindly help.
With your shown samples, attempts; please try following awk code.
awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file
Adding non-one liner form of above code:
awk -v s1="'" '
BEGIN{ OFS=", " }
substr($0,10,1)=="T"{
val=(val?val OFS:"") (s1 substr($0,16,4) s1)
}
END{
print val
}
' Input_file
To save output of this code into a shell variable try following:
value=$(awk -v s1="'" 'BEGIN{OFS=", "} substr($0,10,1)=="T"{val=(val?val OFS:"") (s1 substr($0,16,4) s1)} END{print val}' Input_file)
Explanation: Adding detailed explanation for above code.
awk -v s1="'" ' ##Starting awk program from here setting s1 to ' here.
BEGIN{ OFS=", " } ##Setting OFS as comma space here.
substr($0,10,1)=="T"{ ##Checking condition if 10th character is T then do following.
val=(val?val OFS:"") (s1 substr($0,16,4) s1) ##Creating val which has values from current line as per OP requirement.
}
END{ ##Starting END block of this program from here.
print val ##Printing val here.
}
' Input_file ##Mentioning Input_file name here.
I'd use sed for this:
sed -En '/^.{9}T/ s/^.{15}(....).*/\1/p' file
And then to get your exact output, pipe that into
... | sed "s/.*/'&'/" | paste -sd,
I'd use perl over awk here for its better arrays (in particular, joining one into a string). Something like:
perl -nE "push #n, substr(\$_, 15, 4) if /^.{9}T/;
END { say join(',', map { \"'\$_'\" } #n) }" "$INPUT_FILE_PATH"

Copy one csv header to another csv with type modification

I want to copy one csv header to another in row wise with some modifications
Input csv
name,"Mobile Number","mobile1,mobile2",email2,Address,email21
test, 123456789,+123456767676,a#test.com,testaddr,a1#test.com
test1,7867778,8799787899898,b#test,com, test2addr,b2#test.com
In new csv this should be like this and file should also be created. And for sting column I will pass the column name so only that column will be converted to string
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
As you see above all these header with type modification should be inserted in different rows
I have tried with below command but this is only for copy first row
sed '1!d' input.csv > output.csv
You may try this alternative gnu awk command as well:
awk -v FPAT='"[^"]+"|[^,]+' 'NR == 1 {
for (i=1; i<=NF; ++i)
print gensub(/"/, "", "g", $i) "." ($i ~ /,/ ? "string" : "auto") "()"
exit
}' file
name.auto()
Mobile Number.auto()
mobile1,mobile2.string()
email2.auto()
Address.auto()
email21.auto()
Or using sed:
sed -i -e '1i 1234567890.string(),My address is test.auto(),abc3#gmail.com.auto(),120000003.auto(),abc-003.auto(),3.com.auto()' -e '1d' test.csv
EDIT: As per OP's comment to print only first line(header) please try following.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
exit
}
' Input_file > output_file
Could you please try following, written and tested with GUN awk with shown samples.
awk -v FPAT='[^,]*|"[^"]+"' '
FNR==1{
for(i=1;i<=NF;i++){
if($i~/^".*,.*"$/){
gsub(/"/,"",$i)
print $i".string()"
}
else{
print $i".auto()"
}
}
next
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program and setting FPAT to [^,]*|"[^"]+".
FNR==1{ ##Checking condition if this is first line then do following.
for(i=1;i<=NF;i++){ ##Running for loop from i=1 to till NF value.
if($i~/^".*,.*"$/){ ##Checking condition if current field starts from " and ends with " and having comma in between its value then do following.
gsub(/"/,"",$i) ##Substitute all occurrences of " with NULL in current field.
print $i".string()" ##Printing current field and .string() here.
}
else{ ##else do following.
print $i".auto()" ##Printing current field dot auto() string here.
}
}
next ##next will skip all further statements from here.
}
1 ##1 will print current line.
' Input_file ##Mentioning Input_file name here.

How to run a bash script in a loop

i wrote a bash script in order to pull substrings and save it to an output file from two input files that looks like this:
input file 1
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
input file 2
gene1 10 20
gene2 40 50
genen x y
my script
>output_file
cat input_file2 | while read row; do
echo $row > temp
geneName=`awk '{print $1}' temp`
startPos=`awk '{print $2}' temp`
endPos=`awk '{print $3}' temp`
length=$(expr $endPos - $startPos)
for i in temp; do
echo ">${geneName}" >> genes_fasta
awk -v S=$startPos -v L=$length '{print substr($0,S,L)}' input_file1 >> output file
done
done
how can i make it work in a loop for more than one string in the input file 1?
new input file looks like this:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotypen...
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn...
I would like to have a different out file for every genotype and that the file name would be the genotype name.
thank you!
If I'm understanding correctly, would you try the following:
awk '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > genotype
print substr($0, start[i], len[i]) >> genotype
}
close(genotype)
}' input_file2 input_file1
input_file1:
>genotype1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>genotype2
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
>genotype3
nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Input_file2:
gene1 10 20
gene2 40 50
gene3 20 25
[Results]
genotype1:
>gene1
aaaaaaaaaa
>gene2
aaaaaaaaaa
>gene3
aaaaa
genotype2:
>gene1
bbbbbbbbbb
>gene2
bbbbbbbbbb
>gene3
bbbbb
genotype3:
>gene1
nnnnnnnnnn
>gene2
nnnnnnnnnn
>gene3
nnnnn
[EDIT]
If you want to store the output files to a different directory,
please try the following instead:
dir="./outdir" # directory name to store the output files
# you can modify the name as you want
mkdir -p "$dir"
awk -v dir="$dir" '
FNR==NR {
name[NR] = $1
start[NR] = $2
len[NR] = $3 - $2
count = NR
next
}
/^>/ {
sub(/^>/,"")
genotype=$0
next
}
{
for (i = 1; i <= count; i++) {
print ">" name[i] > dir"/"genotype
print substr($0, start[i], len[i]) >> dir"/"genotype
}
close(dir"/"genotype)
}' input_file2 input_file1
The 1st two lines are executed in bash to define and mkdir the destination directory.
Then the directory name is passed to awk via -v option
Hope this helps.
Could you please try following, where I am assuming that your Input_file1's column which starts with > should be compared with 1st column of Input_file2's first column (since samples are confusing so based on OP's attempt this has been written).
awk '
FNR==NR{
start_point[$1]=$2
end_point[$1]=$3
next
}
/^>/{
sub(/^>/,"")
val=$0
next
}
{
print val ORS substr($0,start_point[val],end_point[val])
val=""
}
' Input_file2 Input_file1
Explanation: Adding explanation for above code.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first Input_file named Input_file2 is being read.
start_point[$1]=$2 ##Creating an array named start_point with index $1 of current line and its value is $2.
end_point[$1]=$3 ##Creating an array named end_point with index $1 of current line and its value is $3.
next ##next will skip all further statements from here.
}
/^>/{ ##Checking condition if a line starts from > then do following.
sub(/^>/,"") ##Substituting starting > with NULL.
val=$0 ##Creating a variable val whose value is $0.
next ##next will skip all further statements from here.
}
{
print val ORS substr($0,start_point[val],end_point[val]) ##Printing val newline(ORS) and sub-string of current line whose start value is value of start_point[val] and end point is value of end_point[val].
val="" ##Nullifying variable val here.
}
' Input_file2 Input_file1 ##Mentioning Input_file names here.

Resources