AWK merge values and replace strings - bash

I have two files one separated by tabs and other separated by semicolons. Both files have a common ID in the first column. On the one hand I want to combine the values in column 3 based on a common ID. On the other hand I want to replace the string in column 2 of the first file with the string in column 2 of the second file, while respecting the Common ID.
First file:
ID;String;Category;
2;es un anuncio interesante que le puede servir para alguien;321;0;;
3;es un anuncio de un banco que quiere presentarse;72;0;;
4;es un anuncio de un banco que ofrece prestamos para empresas.;52;0;;
4;es un anuncio de un banco que ofrece prestamos para empresas.;70;0;;
5;credito pyme banamex para hacer crecer tu negocio;50;0;;
5;credito pyme banamex para hacer crecer tu negocio;52;0;;
5;credito pyme banamex para hacer crecer tu negocio;70;0;;
5;credito pyme banamex para hacer crecer tu negocio;71;0;;
Second file:
ID String Category;
2 Es un anuncio interesante que le puede servir para alguien.
3 Es un anuncio de un banco que quiere presentarse.
4 Es un anuncio de un banco que ofrece prestamos para empresas.
5 Credito Pyme Banamex para hacer crecer tu negocio.
Desired output:
ID String Category
2 Es un anuncio interesante que le puede servir para alguien. 321
3 Es un anuncio de un banco que quiere presentarse. 72
4 Es un anuncio de un banco que ofrece prestamos para empresas. 52 70
5 Credito Pyme Banamex para hacer crecer tu negocio. 50 52 70 71
What I have done:
awk 'BEGIN { FS=";";} NR==FNR{ CAT[$1]=CAT[$1]"\t"$3; next;}{FS="\t";textos[$1]=$2;} END{ for (ID in CAT) {print ID,textos[ID],CAT[ID];}}' fileA fileB
My Output:
2 Es un anuncio interesante que le puede servir para alguien.
3 Es un anuncio de un banco que quiere presentarse. 72
4 Es un anuncio de un banco que ofrece prestamos para empresas. 52 70
5 Credito Pyme Banamex para hacer crecer tu negocio 50 52 70 71
¡¡In the first line the value of the third column doesn't appear!!

You can use this awk:
awk -F';' -v OFS='\t' 'FNR==NR {
a[$1] = a[$1] OFS $3
next
}
FNR==1 {
FS="\t"
print
}
$1 in a {
print $1, $2 a[$1]
}' file1 file2
Output:
ID String Category
2 Es un anuncio interesante que le puede servir para alguien. 321
3 Es un anuncio de un banco que quiere presentarse. 72
4 Es un anuncio de un banco que ofrece prestamos para empresas. 52 70
5 Credito Pyme Banamex para hacer crecer tu negocio. 50 52 70 71

Related

Analysis on the basis of comparison of 1st column of 1 files with 1st column of N number of files and print all files based of column 1

I have tab separated files and need to compare FILE_1 with N (10) files, If the IDS of column 1 of first file match with the 1st column of other files print file 1 and value of the other files and if the IDS not presnt , first file and NA to the column of other file. The example of the input and expected output file are given below.
File 1
A 1.1 0.2 0.3 1.1
B 1.3 2.1 0.2 0.1
C 1.8 0.5 2.6 3.8
D 1.2 5.1 1.7 0.1
E 1.9 4.3 2.8 1.6
F 1.6 5.1 2.9 7.1
G 1.8 2.8 0.3 3.7
H 1.9 3.6 3.7 0.1
I 1.0 2.4 4.9 2.5
J 1.1 2.0 0.1 0.4
File 2
A d1 Q2 Q.3 E.1
B a.3 S.1 A.2 R.1
J a.1 2.0 031 4a4
File 3
E 1d9 4a3 2A8 1D6
F 1a.6 5a1 2W9 7Q1
J QA8 1.8 0W3 3E7
File 4
F 1aa 5a 2Q 7WQ
G ac UW 0QW 3aQ
A QQ aws AW qw
I have tried the following code with two file initially but not getting the expected output
awk '
FILENAME == "File_2" {
id = $0
val[id] = $2","$3","$5
}
FILENAME == "File_1" {
id = $1
string
if (val[id] == "") {
print id " " "NA"
} else {
print id " " val[id]
}
}
' File_2 File_1
The above code print the File_2 and NA at the end of each line.
My expected output is looks like below
Final Expected Output
A 1.1 0.2 0.3 1.1 d1 Q2 Q.3 E.1 NA NA NA NA QQ aws AW qw
B 1.3 2.1 0.2 0.1 a.3 S.1 A.2 R.1 NA NA NA NA NA NA NA NA
C 1.8 0.5 2.6 3.8 NA NA NA NA NA NA NA NA NA NA NA NA
D 1.2 5.1 1.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
E 1.9 4.3 2.8 1.6 NA NA NA NA 1d9 4a3 2A8 1D6 NA NA NA NA
F 1.6 5.1 2.9 7.1 NA NA NA NA 1a.6 5a1 2W9 7Q1 1aa 5a 2Q 7WQ
G 1.8 2.8 0.3 3.7 NA NA NA NA NA NA NA NA ac UW 0QW 3aQ
H 1.9 3.6 3.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
I 1.0 2.4 4.9 2.5 NA NA NA NA NA NA NA NA NA NA NA NA
J 1.1 2.0 0.1 0.4 a.1 2.0 031 4a4 QA8 1.8 0W3 3E7 NA NA NA NA
Using GNU awk for arrays of arrays, ARGIND, and gensub():
$ cat tst.awk
BEGIN { FS=OFS="\t" }
ARGIND < (ARGC-1) {
key = $1
sub("[^"FS"]+"FS"?","")
fileNrsKeys2vals[ARGIND][key] = $0
fileNrs2numFlds[ARGIND] = NF
next
}
{
printf "%s", $0
for ( fileNr=1; fileNr<ARGIND; fileNr++ ) {
if ( fileNr in fileNrs2numFlds ) {
numFlds = fileNrs2numFlds[fileNr]
printf "%s", ( $1 in fileNrsKeys2vals[fileNr] ?
OFS fileNrsKeys2vals[fileNr][$1] :
gensub(/ /,OFS"NA","g",sprintf("%*s",numFlds,"")) )
}
}
print ""
}
$ awk -f tst.awk file2 file3 file4 file1
A 1.1 0.2 0.3 1.1 d1 Q2 Q.3 E.1 NA NA NA NA QQ aws AW qw
B 1.3 2.1 0.2 0.1 a.3 S.1 A.2 R.1 NA NA NA NA NA NA NA NA
C 1.8 0.5 2.6 3.8 NA NA NA NA NA NA NA NA NA NA NA NA
D 1.2 5.1 1.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
E 1.9 4.3 2.8 1.6 NA NA NA NA 1d9 4a3 2A8 1D6 NA NA NA NA
F 1.6 5.1 2.9 7.1 NA NA NA NA 1a.6 5a1 2W9 7Q1 1aa 5a 2Q 7WQ
G 1.8 2.8 0.3 3.7 NA NA NA NA NA NA NA NA ac UW 0QW 3aQ
H 1.9 3.6 3.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
I 1.0 2.4 4.9 2.5 NA NA NA NA NA NA NA NA NA NA NA NA
J 1.1 2.0 0.1 0.4 a.1 2.0 031 4a4 QA8 1.8 0W3 3E7 NA NA NA NA
This solution requires a " | sort" since awk arrays are not guaranteed to be in order. It also is sensitive to the number of spaces immediately following the index letter ("A", "B", "C", etc.):
Mac_3.2.57$cat mergeLinesV0.awk
BEGIN {
i1=1
i2=1
i3=1
i4=1
} NR == FNR {
ar1[i1]=$0
i1=i1+1
f1size=FNR
next
}{
f1done=1
} NR-f1size == FNR && f1done {
ar2[i2]=$0
i2=i2+1
f2size=FNR
next
}{
f2done=1
} NR-f1size-f2size == FNR && f2done {
ar3[i3]=$0
i3=i3+1
f3size=FNR
next
}{
f3done=1
} NR-f1size-f2size-f3size == FNR && f3done {
ar4[i4]=$0
i4=i4+1
f4size=FNR
next
} END {
for(i1 in ar1){
printf("%s ", ar1[i1])
found2=0
for(i2 in ar2){
if(substr(ar1[i1],1,1)==substr(ar2[i2],1,1)){
printf("%s ", substr(ar2[i2],5))
found2=1
}
}
if(!found2){
printf("NA NA NA NA ")
}
found3=0
for(i3 in ar3){
if(substr(ar1[i1],1,1)==substr(ar3[i3],1,1)){
printf("%s ", substr(ar3[i3],5))
found3=1
}
}
if(!found3){
printf("NA NA NA NA ")
}
found4=0
for(i4 in ar4){
if(substr(ar1[i1],1,1)==substr(ar4[i4],1,1)){
printf("%s\n", substr(ar4[i4],5))
found4=1
}
}
if(!found4){
printf("NA NA NA NA\n")
}
}
}
Mac_3.2.57$awk -f mergeLinesV0.awk File1 File2 File3 File4 | sort
A 1.1 0.2 0.3 1.1 d1 Q2 Q.3 E.1 NA NA NA NA QQ aws AW qw
B 1.3 2.1 0.2 0.1 a.3 S.1 A.2 R.1 NA NA NA NA NA NA NA NA
C 1.8 0.5 2.6 3.8 NA NA NA NA NA NA NA NA NA NA NA NA
D 1.2 5.1 1.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
E 1.9 4.3 2.8 1.6 NA NA NA NA 1d9 4a3 2A8 1D6 NA NA NA NA
F 1.6 5.1 2.9 7.1 NA NA NA NA 1a.6 5a1 2W9 7Q1 1aa 5a 2Q 7WQ
G 1.8 2.8 0.3 3.7 NA NA NA NA NA NA NA NA ac UW 0QW 3aQ
H 1.9 3.6 3.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
I 1.0 2.4 4.9 2.5 NA NA NA NA NA NA NA NA NA NA NA NA
J 1.1 2.0 0.1 0.4 a.1 2.0 031 4a4 QA8 1.8 0W3 3E7 NA NA NA NA
Mac_3.2.57$cat File1
A 1.1 0.2 0.3 1.1
B 1.3 2.1 0.2 0.1
C 1.8 0.5 2.6 3.8
D 1.2 5.1 1.7 0.1
E 1.9 4.3 2.8 1.6
F 1.6 5.1 2.9 7.1
G 1.8 2.8 0.3 3.7
H 1.9 3.6 3.7 0.1
I 1.0 2.4 4.9 2.5
J 1.1 2.0 0.1 0.4
Mac_3.2.57$cat File2
A d1 Q2 Q.3 E.1
B a.3 S.1 A.2 R.1
J a.1 2.0 031 4a4
Mac_3.2.57$cat File3
E 1d9 4a3 2A8 1D6
F 1a.6 5a1 2W9 7Q1
J QA8 1.8 0W3 3E7
Mac_3.2.57$cat File4
F 1aa 5a 2Q 7WQ
G ac UW 0QW 3aQ
A QQ aws AW qw
Mac_3.2.57$
Given your 4 example files (as file1.txt .. file4.txt), here is a ruby that does that:
ruby -lne '
BEGIN{
files={}
seen=Set.new()
data=Hash.new { |h, k| h[k] = Hash.new { |hh, kk| hh[kk] = [] } }
}
fields=$_.split(/\t/)
if $<.file.lineno==1; files[$<.file.path]=fields.length-1; end
seen<<fields[0]
data[fields[0]][files.keys.last]=fields[1..]
END{
seen.each{|k| row=[k]
files.each{|file, width|
if data[k][file].empty?
row.push(*["NA"]*width)
else
row.push(*data[k][file])
end
}
puts row.join("\t")
}
}' file?.txt
Prints:
A 1.1 0.2 0.3 1.1 d1 Q2 Q.3 E.1 NA NA NA NA QQ aws AW qw
B 1.3 2.1 0.2 0.1 a.3 S.1 A.2 R.1 NA NA NA NA NA NA NA NA
C 1.8 0.5 2.6 3.8 NA NA NA NA NA NA NA NA NA NA NA NA
D 1.2 5.1 1.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
E 1.9 4.3 2.8 1.6 NA NA NA NA 1d9 4a3 2A8 1D6 NA NA NA NA
F 1.6 5.1 2.9 7.1 NA NA NA NA 1a.6 5a1 2W9 7Q1 1aa 5a 2Q 7WQ
G 1.8 2.8 0.3 3.7 NA NA NA NA NA NA NA NA ac UW 0QW 3aQ
H 1.9 3.6 3.7 0.1 NA NA NA NA NA NA NA NA NA NA NA NA
I 1.0 2.4 4.9 2.5 NA NA NA NA NA NA NA NA NA NA NA NA
J 1.1 2.0 0.1 0.4 a.1 2.0 031 4a4 QA8 1.8 0W3 3E7 NA NA NA NA
This produces exact expected output:
gawk '{
a[$1] = 1
for (i = 2; i <= 5; ++i)
b[$1, (ARGIND - 1) * 4 + (i - 2)] = $i
}
END {
PROCINFO["sorted_in"] = "#ind_str_asc";
for (i in a) {
t = i
for (j = 0; j < ARGIND * 4; ++j)
t = t OFS (b[i, j] ? b[i, j] : "NA")
print t
}
}' File_{1..4} | column -t

making four new columns based on 8 existing columns

Below you can see the reproduced sample of my data.
DATA <- structure(list(ID = c("101", "101", "101", "101", "101", "101","101", "101", "101", "101"), IDA = c("1", "1", "2", "3", "4","5", "5", "1859", "1860", "1861"), DATE = structure(c(1300928400,1277946000, 1277946000, 1278550800, 1278550800, 1453770000, 1329958800,1506474000, 1485133200, 1485133200), tzone = "UTC", class = c("POSIXct","POSIXt")), NR = c("CH-0001", "CH-0001","CH-0002", "CH-0003", "CH-0004", "CH-0005","CH-0005", "CH-1859", "CH-1860", "CH-1861"), PAT = c("101-1", "101-1", "101-2", "101-3", "101-4", "101-5","101-5", "101-1859", "101-1860", "101-1861"), INT1 = c(245005,280040, 280040, 280040, 280040, 240040, 240040, NA, NA, NA),INT2 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), INT3 = c(NA_real_,NA_real_, 280010, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, 245035, NA_real_), INT4 = c(NA_real_, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,NA_real_, NA_real_), INTX1 = c(NA_real_, 275040, NA_real_,NA_real_, NA_real_, NA_real_, 240080, NA_real_, NA_real_,NA_real_), INTX2 = c(276790, NA_real_, 7612645, NA_real_,NA_real_, NA_real_, 5078219, NA_real_, NA_real_, NA_real_), INTX173 = c(NA_real_, NA_real_, NA_real_, 3456878,NA_real_, NA_real_, 3289778, NA_real_, NA_real_, NA_real_), INTX4 = c(NA_real_, NA_real_, 11198767, NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 7025676), KAT = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1)), row.names = c(NA,-10L), class = c("tbl_df", "tbl", "data.frame"))
As you see, I have eight columns called: INT1:INT4 and INTX1:INTX4. For each row there are only a maximum of four values for these variables and the rest are NAs. I need to create four new variables called ING1:ING4 and tell R to check the 8 columns one by one per row and assign the first value it finds in that row to ING1, the second value to ING2, the third value to ING3, and the fourth value to ING4.At the end, it is possible that, for a row, all or some of the ING1:ING4 columns are filled with values.
I would expect for row 1 I get the following ING columns:
ING1 == 245005, ING2 == 276790, ING3 == NA, ING4 ==NA
I think I need to write a loop for that but as I am a beginner I am lost how to do it. Could you kindly help me with it?
Try this:
fun <- function(select, prefix = "ING", ncol = -1, data = cur_data()) {
select <- substitute(select)
out <- asplit(t(
apply(subset(data, select = eval(select)), 1, sort, na.last = TRUE)
), 2)
names(out) <- paste0(prefix, seq_along(out))
if (ncol > 0) out <- out[seq_len(ncol)]
do.call(data.frame, out)
}
And its use:
dplyr
library(dplyr)
DATA %>%
mutate(fun(INT1:INTX4, ncol=4))
# # A tibble: 10 × 18
# ID IDA DATE NR PAT INT1 INT2 INT3 INT4 INTX1 INTX2 INTX173 INTX4 KAT ING1 ING2 ING3 ING4
# <chr> <chr> <dttm> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 101 1 2011-03-24 01:00:00 CH-0001 101-1 245005 NA NA NA NA 276790 NA NA 0 245005 276790 NA NA
# 2 101 1 2010-07-01 01:00:00 CH-0001 101-1 280040 NA NA NA 275040 NA NA NA 0 275040 280040 NA NA
# 3 101 2 2010-07-01 01:00:00 CH-0002 101-2 280040 NA 280010 NA NA 7612645 NA 11198767 0 280010 280040 7612645 11198767
# 4 101 3 2010-07-08 01:00:00 CH-0003 101-3 280040 NA NA NA NA NA 3456878 NA 0 280040 3456878 NA NA
# 5 101 4 2010-07-08 01:00:00 CH-0004 101-4 280040 NA NA NA NA NA NA NA 0 280040 NA NA NA
# 6 101 5 2016-01-26 01:00:00 CH-0005 101-5 240040 NA NA NA NA NA NA NA 0 240040 NA NA NA
# 7 101 5 2012-02-23 01:00:00 CH-0005 101-5 240040 NA NA NA 240080 5078219 3289778 NA 0 240040 240080 3289778 5078219
# 8 101 1859 2017-09-27 01:00:00 CH-1859 101-1859 NA NA NA NA NA NA NA NA 1 NA NA NA NA
# 9 101 1860 2017-01-23 01:00:00 CH-1860 101-1860 NA NA 245035 NA NA NA NA NA 1 245035 NA NA NA
# 10 101 1861 2017-01-23 01:00:00 CH-1861 101-1861 NA NA NA NA NA NA NA 7025676 1 7025676 NA NA NA
base R
cbind(DATA, fun(data = DATA, INT1:INTX4, ncol=4))
# ID IDA DATE NR PAT INT1 INT2 INT3 INT4 INTX1 INTX2 INTX173 INTX4 KAT ING1 ING2 ING3 ING4
# 1 101 1 2011-03-24 01:00:00 CH-0001 101-1 245005 NA NA NA NA 276790 NA NA 0 245005 276790 NA NA
# 2 101 1 2010-07-01 01:00:00 CH-0001 101-1 280040 NA NA NA 275040 NA NA NA 0 275040 280040 NA NA
# 3 101 2 2010-07-01 01:00:00 CH-0002 101-2 280040 NA 280010 NA NA 7612645 NA 11198767 0 280010 280040 7612645 11198767
# 4 101 3 2010-07-08 01:00:00 CH-0003 101-3 280040 NA NA NA NA NA 3456878 NA 0 280040 3456878 NA NA
# 5 101 4 2010-07-08 01:00:00 CH-0004 101-4 280040 NA NA NA NA NA NA NA 0 280040 NA NA NA
# 6 101 5 2016-01-26 01:00:00 CH-0005 101-5 240040 NA NA NA NA NA NA NA 0 240040 NA NA NA
# 7 101 5 2012-02-23 01:00:00 CH-0005 101-5 240040 NA NA NA 240080 5078219 3289778 NA 0 240040 240080 3289778 5078219
# 8 101 1859 2017-09-27 01:00:00 CH-1859 101-1859 NA NA NA NA NA NA NA NA 1 NA NA NA NA
# 9 101 1860 2017-01-23 01:00:00 CH-1860 101-1860 NA NA 245035 NA NA NA NA NA 1 245035 NA NA NA
# 10 101 1861 2017-01-23 01:00:00 CH-1861 101-1861 NA NA NA NA NA NA NA 7025676 1 7025676 NA NA NA

Reformating data with awk command depending on a if condition

i have a script that reformated the content of a source file in a target file.
It does it for every files in a directory.
Here is a source file exemple :
TABLE;APGFPOLI;
Contrat;CHAR(16);Numéro du contrat
Libelle;CHAR(30);Libellé du contrat
DtCreation;CHAR(8);Date de création
DtMaj;CHAR(8);Date de dernière MAJ
DtEffet;CHAR(8);Date d'effet adhésion
MotifAdh;CHAR(2);Motif d'adhésion
DtRadiation;CHAR(8);Date de radiation
DtEnrRad;CHAR(8);Date enregistrement radiat
MotifRad;CHAR(2);Motif de radiation
MtPrime;Numérique 8.2;Montant prime d'origine
DtEffetSusp;CHAR(8);Date d'effet de suspension
DtFinSusp;CHAR(8);Date de fin de suspension
MotifSusp;CHAR(2);Motif de suspension
DestBord;CHAR(1);Destinataire du bordereau
CdDest;CHAR(5);Code du destinataire
NivRupBord;CHAR(1);Niveau rupture bordereau
BordCETIP;CHAR(1);Bordereau CTIP
EnvBordNom;CHAR(1);Envoi bordereau nominatif
Indice;CHAR(2);Indice appliqué
Echeance;CHAR(2);Echéance de l'indice (MM)
Effectif;CHAR(5);Effectif
CdRegr;CHAR(3);Code regroupement 1
CdGroupe;CHAR(3);Code regroupement 2
Periodicite;CHAR(1);Périodicité
Terme;CHAR(1);Terme
Produit;CHAR(6);Code produit affecté
Inspecteur;CHAR(5);Inspecteur
CleInsp;CHAR(1);Clé inspecteur
Filler;CHAR(6);Filler
And here is the target file generated by the shell:
01 APGFPOLI.
* Numéro du contrat.
05 Contrat PIC X(16).
* Libellé du contrat.
05 Libelle PIC X(30).
* Date de création.
05 DtCreation PIC X(8).
* Date de dernière MAJ.
05 DtMaj PIC X(8).
* Date d'effet adhésion.
05 DtEffet PIC X(8).
* Motif d'adhésion.
05 MotifAdh PIC X(2).
* Date de radiation.
05 DtRadiation PIC X(8).
* Date enregistrement radiat.
05 DtEnrRad PIC X(8).
* Motif de radiation.
05 MotifRad PIC X(2).
* Montant prime d'origine.
05 MtPrime Numérique 8.2.
* Date d'effet de suspension.
05 DtEffetSusp PIC X(8).
* Date de fin de suspension.
05 DtFinSusp PIC X(8).
* Motif de suspension.
05 MotifSusp PIC X(2).
* Destinataire du bordereau.
05 DestBord PIC X(1).
* Code du destinataire.
05 CdDest PIC X(5).
* Niveau rupture bordereau.
05 NivRupBord PIC X(1).
* Bordereau CTIP.
05 BordCETIP PIC X(1).
* Envoi bordereau nominatif.
05 EnvBordNom PIC X(1).
* Indice appliqué.
05 Indice PIC X(2).
* Echéance de l'indice (MM).
05 Echeance PIC X(2).
* Effectif.
05 Effectif PIC X(5).
* Code regroupement 1.
05 CdRegr PIC X(3).
* Code regroupement 2.
05 CdGroupe PIC X(3).
* Périodicité.
05 Periodicite PIC X(1).
* Terme.
05 Terme PIC X(1).
* Code produit affecté.
05 Produit PIC X(6).
* Inspecteur.
05 Inspecteur PIC X(5).
* Clé inspecteur.
05 CleInsp PIC X(1).
* Filler.
05 Filler PIC X(6).
What i am trying to do is change that line :
MtPrime;Numérique 8.2;Montant prime d'origine
Like this :
05 MtPrime PIC 9(8).v9(2).
As you can see it changed "Numérique X.Y" by PIC 9(X).v9(Y).
The condition is that if i have only one number "X" after "Numerique" i need to reformat it like this :
"PIC 9(X)"
BUT if i have a number "X" DOT another number "Y" i need to print it like this :
"PIC 9(X).v9(Y)"
Using awk command and being a full beginner i have no clue how can i achieve this.
Here is my shell :
#!/bin/bash
SOURCE_DIRECTORY="/home/yha/AG2R/SOURCE/*"
TARGET_DIRECTORY="/home/yha/AG2R/COPY/"
for f in $SOURCE_DIRECTORY
do
b=$(basename "$f")
echo "Processing $f file..";
awk -F ';' '$1=="TABLE" && $3=="" {printf "01 %s.\n\n", $2; next} {sub(/CHAR/,"PIC X", $2);printf " * %s.\n\n 05 %s %s.\n\n", $3, $1, $2;}' "$f" > "$TARGET_DIRECTORY/$b.cpy"
done
For the awk part, you might use a regex to check for the word followed by a digits and an optional part with a dot and digits.
^Numérique [0-9]+(\.[0-9]+)?$
If there is a match, you can split on a space or dot. Then you can assemble the string to print by checking if there is a 3rd entry in the array with splitted values and start the string with the value of the 2nd value.
The data
$cat file
TABLE;APGFPOLI;
DtEnrRad;CHAR(8);Date enregistrement radiat
MotifRad;CHAR(2);Motif de radiation
MtPrime;Numérique 8.2;Montant prime d'origine
MtPrime;Numérique 5;Montant prime d'origine
DtEffetSusp;CHAR(8);Date d'effet de suspension
awk script
awk -F ';' '
$1=="TABLE" && $3=="" {
printf "01 %s.\n\n", $2;
next
}
{
result = $2
if ($2 ~ /^Numérique [0-9]+(\.[0-9]+)?$/) {
nr=split($2,a,"[ .]")
result = "PIC 9(" a[2] ")"
if (nr == 3) {
result = result ".v9(" a[3] ")"
}
}
sub(/CHAR/,"PIC X", result);
printf " * %s.\n\n 05 %s %s.\n\n", $3, $1, result;
}' file
Output
01 APGFPOLI.
* Date enregistrement radiat.
05 DtEnrRad PIC X(8).
* Motif de radiation.
05 MotifRad PIC X(2).
* Montant prime d'origine.
05 MtPrime PIC 9(8).v9(2).
* Montant prime d'origine.
05 MtPrime PIC 9(5).
* Date d'effet de suspension.
05 DtEffetSusp PIC X(8).

Shell: printf alignment is inconsistent for non-ASCII text

I'm writing a makefile that creates directories and displays a special message if the newly created dir is empty. This is done with a macro:
CREADIRVACIO = #printf "Creando directorio %-50s ¡DIRECTORIO VACÍO!\n"
With this macro, I call the message with its appropirate dir name and it's printed:
$(CREADIRVACIO) "Fundamentos de programación"
#mkdir $(FP_OUT)
My problem here is that the output part that says ¡DIRECTORIO VACÍO! should be aligned in the same column. It's not:
➜ make
==========================
# COMENZANDO COMPILACIÓN #
==========================
Eliminando compilación anterior
Creando raíz de archivos compilados
Creando directorio 1º 1er cuatrimestre
Creando directorio Álgebra lineal y estructuras matemáticas ¡DIRECTORIO VACÍO!
Creando directorio Cálculo ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos físicos y tecnológicos ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos de programación ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos del software ¡DIRECTORIO VACÍO!
Creando directorio 1º 2º cuatrimestre
Creando directorio Estadística ¡DIRECTORIO VACÍO!
Creando directorio Ingeniería, empresa y sociedad ¡DIRECTORIO VACÍO!
Creando directorio Lógica y métodos discretos ¡DIRECTORIO VACÍO!
Creando directorio Metodología de la programación ¡DIRECTORIO VACÍO!
Creando directorio Tecnología y organización de los computadores ¡DIRECTORIO VACÍO!
Creando directorio 2º 1er cuatrimestre
Creando directorio Estructura de computadores ¡DIRECTORIO VACÍO!
Creando directorio Estructura de datos ¡DIRECTORIO VACÍO!
Creando directorio Programación y diseño orientado a objetos ¡DIRECTORIO VACÍO!
Creando directorio Sistemas concurrentes y distribuidos ¡DIRECTORIO VACÍO!
Creando directorio Sistemas operativos ¡DIRECTORIO VACÍO!
Creando directorio 2º 2º cuatrimestre
Creando directorio Arquitectura de computadores ¡DIRECTORIO VACÍO!
Creando directorio Algorítmica ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos de bases de datos ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos de ingeniería del software ¡DIRECTORIO VACÍO!
Creando directorio Inteligencia artificial ¡DIRECTORIO VACÍO!
Creando directorio 3º 1er cuatrimestre
Creando directorio Diseño y desarrollo de sistemas de información ¡DIRECTORIO VACÍO!
Creando directorio Fundamentos de redes ¡DIRECTORIO VACÍO!
Creando directorio Informática gráfica ¡DIRECTORIO VACÍO!
Creando directorio Ingeniería de servidores ¡DIRECTORIO VACÍO!
Creando directorio Modelos de computación ¡DIRECTORIO VACÍO!
What am I doing wrong?
Thanks!

How to sort lines based on first column ONLY?

Given input.txt :
12 pas
24 chinois
3 22
67 Il
32 Mais
4 héritier
155 vers
56 troupes
5 L
2 83
97 an
My sorting command :
sort -nr ./input.txt > ./out.txt
I get :
3 22
2 83
155 vers
97 an
67 Il
56 troupes
32 Mais
24 chinois
12 pas
5 L
4 héritier
How to returns ?:
155 vers
97 an
67 Il
56 troupes
32 Mais
24 chinois
12 pas
5 L
4 héritier
3 22
2 83
Use -t and -k such :
sort -n -r -t':' -k1,1 input.txt > out.txt
It returns :
155 vers
97 an
67 Il
56 troupes
32 Mais
24 chinois
12 pas
5 L
4 héritier
3 22
2 83
Explanation:
-n: Numeric sort
-r: Reverse (descending)
-t: Changes field separator to ':' character
-k: Sort key starts on field 2 and ends on field 2
Thanks to bash output lines sorted by descending number.

Resources