Awk multiple file manipulation - shell

Ok, let's try this again.
How can I open multiple files within AWK, and then just print them all to standard output? The following prints only the first line of each file.
BEGIN {
}
{
$file = $1;
(getline < $file)
print $0;
}
awk -f program.awk myindex
myindex is a list of files
file1
file2
file3
file4
an example of file1
rigrg
gdfgbt
rfghrth
thfg
bhtd
ht
hthrtjhrth
rtg
rthhrthrt

It sounds like you need something like this:
awk '
NR == FNR { ARGV[ARGC++]=$0; next }
FNR == 1 { found=0 }
$2 == "motd" { found=1 }
found
$1 == "customer" { nextfile }
' myindex
Untested of course since you didn't provide testable sample input/output. The above uses GNU awk for nextfile, with other awks replace nextfile with found=0; next.

I'll propose a different approach since getline use needs to be very precise...
$ awk '/motd/{p=1} /Customer/{p=0} p' $(awk '{print $0".info"}' index)
motd
good stuff 1
good stuff 1
motd
good stuff 2
good stuff 2
motd
good stuff 3
good stuff 3
prepare the file names as arguments to the main script. I added 1/2/3 suffix to show that the data is coming from the corresponding file.
where
==> index <==
one
two
three
==> one.info <==
blah
blah
blah
motd
good stuff 1
good stuff 1
Customer
blah
blah
end
==> three.info <==
blah
blah
blah
motd
good stuff 3
good stuff 3
Customer
blah
blah
end
==> two.info <==
blah
blah
blah
motd
good stuff 2
good stuff 2
Customer
blah
blah

to print lines between motd and Customer from all files listed in
index file
cat + sed pipeline:
cat index | xargs -I {} sed -n '/^motd$/,/^Customer$/{/^motd$/d; /^Customer$/d;p}' {}".information"
The above will output the needed lines excluding pattern lines

Related

add ### at the beginning of a file if there is a match with the content of a list of strings in another file

I have a file with some strings, I need to grep these strings in another file and if match add ### at the beginnig of the line that match.
Assuming this file (1.txt) the file with strings:
123
456
789
and this one the file (2.txt) where to perform the add of the ###:
mko 123 nhy
zaq rte vfr
cde nbv 456
789 bbb aaa
ooo www qqq
I'm expecting this output:
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq
I've already tried the following without success:
cat 1.txt |while read line ; do sed '/^$line/s/./###&/' 2.txt >2.txt.out; done
With your shown samples please try following awk code.
awk '
FNR==NR{
arr[$0]
next
}
{
for(i=1;i<=NF;i++){
if($i in arr){
$0="###" $0
break
}
}
}
1
' 1.txt 2.txt
Explanation: Adding detailed explanation here.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition when 1.txt is being read.
arr[$0] ##Creating array arr with index of current line.
next ##next will skip all further all statements from here.
}
{
for(i=1;i<=NF;i++){ ##Traversing through all fields from here.
if($i in arr){ ##Checking if current field is present in arr then do following.
$0="###" $0 ##Adding ### before current line.
break;
}
}
}
1 ##Printing current edited/non-edited line here.
' 1.txt 2.txt ##Mentioning Input_file names here.
This might work for you (GNU sed):
sed 's#.*#/&/s/^#*/###/#' file1 | sed -f - file2
Create a sed script from file1 and run it against file2.
$ while read -r line; do sed -i "/\<$line\>/s/^/###/" 2.txt; done < 1.txt
$ cat 1.txt
###mko 123 nhy
zaq rte vfr
###cde nbv 456
###789 bbb aaa
ooo www qqq

Complex csv question: how to generate a final csv after comparing multiple csvs (following manner) using shell scripting?

assume
file1.csv
Schemaname.tablename.columns
exam1
exam2
filetomatch.csv
exam1
exam2
exam4
exam5
exam6
I used
awk 'NR==FNR{a[$1];next} ($1) in a' file1.csv filetomatch.csv >> result.csv (each time one csv is produced)
result
exam 1
exam 2
to match the results.
I have n number of files to comapre to filetomatch.csv
i need out put to be as follows
file matchedcolumns
file1 exam 1
exam 2
file2 exam 4
.
.
.
filen exam 2
exam 3
and so on..
How can i concatenate result.csvs everytime with first field as file name.
also is there a way to show the null columns as well
How can i add null values using this?
Example
File1 Column1
File1 Column1
File2 null
File3 column3
and so on
>> result.csv should be doing the concatenation for you.
for example, create test files
$ for i in {1..4}; do echo $i > file$i.txt; done
$ head file?.txt
==> file1.txt <==
1
==> file2.txt <==
2
==> file3.txt <==
3
==> file4.txt <==
4
run some awk script on all files, print the filename part of output and concatenate the results
$ for f in file{1..4}.txt; do awk '{print FILENAME, $0}' "$f" >> results.csv; done
$ cat results.csv
file1.txt 1
file2.txt 2
file3.txt 3
file4.txt 4
found this two useful:
awk 'NR==FNR{a[$1];next}($1) in a{ print FILENAME, ($1) }' file1.csv filetomatch.csv
Merge the commmon values in a column
awk -F, '{ if (f == $1) { for (c=0; c <length($1) ; c++) printf " "; print FS $2 FS $3 } else { print $0 } } { f = $1 }' file.csv

cat multiple files into one using same amount of rows as file B from A B C

This is a strange question, I have been looking around and I wasn't able to find anything to match with what I wish to do.
What I'm trying to do is;
File A, File B, File C
5 Lines, 3 Lines, 2 Lines.
Join all files in one file matching the same amount of the file B
The output should be
File A, File B, File C
3 Lines, 3 Lines, 3 Lines.
So in file A I have to remove two lines, in File C i have to duplicate 1 line so I can match the same lines as file B.
I was thinking to do a count to see how many lines each file has first
count1=`wc -l FileA| awk '{print $1}'`
count2=`wc -l FileB| awk '{print $1}'`
count3=`wc -l FileC| awk '{print $1}'`
Then to do a gt then file B remove lines, else add lines.
But I have got lost as I'm not sure how to continue with this, I never seen anyone trying to do this.
Can anyone point me to an idea?
the output should be as per attached picture below;
Output
thanks.
Could you please try following. I have made # as a separator you could change it as per your need too.
paste -d'#' file1 file2 file3 |
awk -v file2_lines="$(wc -l < file2)" '
BEGIN{
FS=OFS="#"
}
FNR<=file2_lines{
$1=$1?$1:prev_first
$3=$3?$3:prev_third
print
prev_first=$1
prev_third=$3
}'
Example of running above code:
Lets say following are Input_file(s):
cat file1
File1_line1
File1_line2
File1_line3
File1_line4
File1_line5
cat file2
File2_line1
File2_line2
File2_line3
cat file3
File3_line1
File3_line2
When I run above code in form of script following will be the output:
./script.ksh
File1_line1#File2_line1#File3_line1
File1_line2#File2_line2#File3_line2
File1_line3#File2_line3#File3_line2
you can get the first n lines of a files with the head command resp sed.
you can generate new lines with echo.
i'm going to use sed, as it allows in-place editing of a file (so you don't have to deal with temporary files):
#!/bin/bash
fix_numlines() {
local filename=$1
local wantlines=$2
local havelines=$(grep -c . "${filename}")
head -${wantlines} "${filename}"
if [ $havelines -lt $wantlines ]; then
for i in $(seq $((wantlines-havelines))); do echo; done
fi
}
lines=$(grep -c . fileB)
fix_numlines fileA ${lines}
fix_numlines fileB ${lines}
fix_numlines fileC ${lines}
if you want columnated output, it's even simpler:
paste fileA fileB fileC | head -$(grep -c . fileB)
Another for GNU awk that outputs in columns:
$ gawk -v seed=$RANDOM -v n=2 ' # n parameter is the file index number
BEGIN { # ... which defines the record count
srand(seed) # random record is printed when not enough records
}
{
a[ARGIND][c[ARGIND]=FNR]=$0 # hash all data to a first
}
END {
for(r=1;r<=c[n];r++) # loop records
for(f=1;f<=ARGIND;f++) # and fields for below output
printf "%s%s",((r in a[f])?a[f][r]:a[f][int(rand()*c[f])+1]),(f==ARGIND?ORS:OFS)
}' a b c # -v n=2 means the second file ie. b
Output:
a1 b1 c1
a2 b2 c2
a3 b3 c1
If you don't like the random pick of a record, replace int(rand()*c[f])+1] with c[f].
$ gawk ' # remember GNU awk only
NR==FNR { # count given files records
bnr=FNR
next
}
{
print # output records of a b c
if(FNR==bnr) # ... up to bnr records
nextfile # and skip to next file
}
ENDFILE { # if you get to the end of the file
if(bnr>FNR) # but bnr not big enough
for(i=FNR;i<bnr;i++) # loop some
print # and duplicate the last record of the file
}' b a b c # first the file to count then all the files to print
To make a file have n lines you can use the following function (usage: toLength n file). This omits lines at the end if the file is too long and repeats the last line if the file is too short.
toLength() {
{ head -n"$1" "$2"; yes "$(tail -n1 "$2")"; } | head -n"$1"
}
To set all files to the length of FileB and show them side by side use
n="$(wc -l < FileB)"
paste <(toLength "$n" FileA) FileB <(toLength "$n" FileC) | column -ts$'\t'
As observed by the user umläute the side-by-side output makes things even easier. However, they used empty lines to pad out short files. The following solution repeats the last line to make short files longer.
stretch() {
cat "$1"
yes "$(tail -n1 "$1")"
}
paste <(stretch FileA) FileB <(stretch FileC) | column -ts$'\t' |
head -n"$(wc -l < FileB)"
This is a clean way using awk where we read each file only a single time:
awk -v n=2 '
BEGIN{ while(1) {
for(i=1;i<ARGC;++i) {
if (b[i]=(getline tmp < ARGV[i])) a[i] = tmp
}
if (b[n]) for(i=1;i<ARGC;++i) print a[i] > ARGV[i]".new"
else {break}
}
}' f1 f2 f3 f4 f5 f6
This works in the following way:
the lead file is defined by the index n. Here we choose the lead file to be f2.
We do not process the files in the standard read record, fields sequentially, but we use the BEGIN block where we read the files in parallel.
We do an infinite loop while(1) where we will break out if the lead-file has no more input.
Per cycle, we read a new line of each file using getline. If the file i has a new line, store it in a[i], and set the outcome of getline into b[i]. If file i has reached its end, keep the last line in mind.
Check the outcome of the lead file with b[n]. If we still read a line, print all the lines to the files f1.new, f2.new, ..., otherwise, break out of the infinite loop.

awk - Compare columns from two files and replace text in first file

I have two files. The first has 1 column and the second has 3 columns. I want to compare first columns of both files. If there is a coincidence, replace column 2 and 3 for specific values; if not, print the same line.
File 1:
$ cat file1
26
28
30
File 2:
$ cat file2
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,r,1510139756
27,a,0
28,r,1510244156
29,a,0
30,r,1510157364
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
Desired output:
$ cat file2
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,a,0
27,a,0
28,a,0
29,a,0
30,a,0
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
I am using gawk to do this (it's inside a shell script and I am using solaris) but I can't get the output right. It only prints the lines that matches:
$fuente="file2"
gawk -v fuente="$fuente" 'FNR==NR{a[FNR]=$1; next}{print $1,$2="a",$3="0" }' $fuente file1 > file3
The output I got:
$ cat file3
26 a 0
28 a 0
30 a 0
awk one-liner:
awk 'NR==FNR{ a[$1]; next }$1 in a{ $2="a"; $3=0 }1' file1 FS=',' OFS=',' file2
The output:
1,a,0
2,a,0
22,a,0
23,a,0
24,a,0
25,a,0
26,a,0
27,a,0
28,a,0
29,a,0
30,a,0
31,a,0
32,a,0
33,r,1510276164
34,a,0
40,a,0
Really spread out for clarity; called (fuente.awk) like so:
awk -F \, -v fuente=file1 -f fuente.awk file2 # -F == IFS
BEGIN {
OFS="," # set OFS to make printing easier
while (getline x < fuente > 0) # safe way; read file into array
{
a[++i]=x # stuff indexed array
}
}
{ # For each line in file2
for (k=1 ; k<=i ; k++) # Lop over array (elements in file1)
{
if (($1==a[k]) && (! flag))
{
print($1,"a",0) # Found print new line
flag=1 # print only once
}
}
if (! flag) # Not found
{
print($0) # print original
}
flag=0 # reset flag
}
END { }

Append contents of file A to the end of each line in file B? bash

I really can't get this one.
File A has this:
1.1.1.1
2.2.2.2
3.3.3.3
etc..
File B will always have the exact same amount of lines and they always will correspond:
oneoneoneone
twotwotwotwo
3ee3ee3ee3ee
I want to append file A to file B so it looks like:
1.1.1.1 oneoneoneone
2.2.2.2 twotwotwotwo
3.3.3.3 3ee3ee3ee3ee
This is what I have but not working like it should:
for z in `cat /tmp/fileB; do sed "s/(.*)/\1$z/" < /tmp/fileA >> /tmp/c;done
Any suggestions?
If you want to append the lines in fileB to the lines in fileA (as indicated by your desired output), you can simply do:
paste fileA fileB
That uses a tab for the delimeter, so you might prefer
paste -d ' ' fileA fileB
If you want to do it with awk, you can do:
awk '{ getline b < "fileB"; print $0, b}' fileA
This may be possible with sed, but it's not advisable. Similar to what you seem to be trying with the loop, you can also do:
while read b; do read -u 4 a; echo $a $b; done < fileb 4< filea

Resources