Need an awk script or any other way to do this on unix - bash

i have small file with around 50 lines and 2 fields like below
file1
-----
12345 8373
65236 7376
82738 2872
..
..
..
i have some around 100 files which are comma"," separated as below:
file2
-----
1,3,4,4,12345,,,23,3,,,2,8373,1,1
each file has many lines similar to the above line.
i want to extract from all these 100 files whose
5th field is eqaul to 1st field in the first file and
13th field is equal to 2nd field in the first file
I want to search all the 100 files using that single file?
i came up with the below in case of a single comma separated file.i am not even sure whether this is correct!
but i have multiple comma separated files.
awk -F"\t|," 'FNR==NR{a[$1$2]++;next}($5$13 in a)' file1 file2
can anyone help me pls?
EDIT:
the above command is working fine in case of a single file.

Here is another using an array, avoiding multiple work files:
#!/bin/awk -f
FILENAME == "file1" {
keys[$1] = ""
keys[$2] = ""
next
}
{
split($0, fields, "," )
if (fields[5] in keys && fields[13] in keys) print "*:",$0
}
I am using split because the field seperator in the two files are different. You could swap it around if necessary. You should call the script thus:
runit.awk file1 file2
An alternative is to open the first file explicitly (using "open") and reading it (readline) in a BEGIN block.

Here is a simple approach. Extract each line from the small file, split it into fields and then use awk to print lines from the other files which match those fields:
while read line
do
f1=$(echo $line | awk '{print $1}')
f2=$(echo $line | awk '{print $2}')
awk -v f1="$f1" -v f2="$f2" -F, '$5==f1 && $13==f2' file*
done < small_file

Related

How to replace only a column and for the rows contains specific values

I have the file with | delimited, and am trying to perform below logic
cat list.txt
101
102
103
LIST=`cat list.txt`
Input file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|110|101
Expected result
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UNKNOWN|101
I tried 2 methods,
using fgrep by passing list.txt as input and tried to segregate as 2 files. One matches the list and second not matching and post that non matching file using awk & gsub replacing the 3rd column with UNKNOWN, but issue here is in 3rd row 4th column contains the value available in list.txt, so not able to get expected result
Tried using one liner awk by passing list in -v VAR. Here no changes in the results.
awk -F"|" -v VAR="$LIST" '{if($3 !~ $VAR) {{{gsub(/.*/,"UNKNOWN", $3)1} else { print 0}' input_file
Can you please suggest me how to attain the expected results
There is no need to use cat to read complete file in a variable.
You may just use this awk:
awk 'BEGIN {FS=OFS="|"}
FNR==NR {a[$1]; next}
!($3 in a) {$3 = "UKNNOWN"} 1' list.txt input_file
1|Anand|101|1001
2|Raj|103|1002
3|Unix|UKNNOWN|101

Bash vlookup kind of solution

I have two files,
File 1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
File 2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
i want to look up column 7 value from File 1 to check if it matches with column 7 value from File 2 and when matched, replace the that line in file 2 with corresponding line in file 1
So the output would be
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Thanks in advance.
You can do that with the following script:
BEGIN { FS="," }
NR==FNR {
lookup[$7] = $0
next
}
{
if (lookup[$7] != "") {
$0 = lookup[$7]
}
print
}
END {
print ""
print "Lookup table used was:"
for (i in lookup) {
print " Key '"i"', Value '"lookup[i]"'"
}
}
The BEGIN section simply sets the field separator to , so individual fields can be easily processed.
The NR and FNR variables are, respectively, the line number of the full input stream (all files) and the line number of the current file in the input stream. When you are processing the first (or only) file, these will be equal, so we use this as a means to simply store the lines from the first file, keyed on field seven.
When NR and FNR are not equal, it's because you've started the second file and this is where we want to replace lines if their key exists in the first file.
This is done by simply checking if a line exists in the lookup table with the desired key and, if it does, replacing the current line the lookup table line. Then we print the (original or replaced) line.
The END section is there just for debugging purposes, it outputs the lookup table that was created and used, and you can remove it once you're satisfied the script works as expected.
You'll see the output in the following transcript, illustrating hopefully that it is working correctly:
pax$ cat file1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
pax$ cat file2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
pax$ awk -f sudarshan.awk file1 file2
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Lookup table used was:
Key '36', Value '3,3,0,0,Test3,1540591243,36'
Key '52', Value '2,1,1,1,Test1,1540584051,52'
Key '54', Value '6,5,1,1,Test2,1540579206,54'
If you need it as a "short as possible" one-liner to use from your script, just use:
awk -F, 'NR==FNR{x[$7]=$0;next}{if(x[$7]!=""){$0=x[$7]};print}' file1 file2
though I prefer the readable version myself.
This might work for you (GNU sed):
sed -r 's|^([^,]*,){6}([^,]*).*|/^([^,]*,){6}\2/s/.*/&/p|' file1 | sed -rnf - file2
Turn file1 into a sed script and using the 7th field as a key lookup replace any line in file2 that matches.
In your example the 7th field is the last one, so a short version of the above solution is:
sed -r 's|.*,(.*)|/.*,\1/s/.*/&/p|' file1 | sed -nf - file2

Extract first 5 fields from semicolon-separated file

I have a semicolon-separated file with 10 fields on each line. I need to extract only the first 5 fields.
Input:
A.txt
1;abc ;xyz ;0.0000;3.0; ; ;0.00; ; xyz;
Output file:
B.txt
1;abc ;xyz ;0.0000;3.0;
You can cut from field1-5:
cut -d';' -f1-5 file
If the ending ; is needed, you can append it by other tool or using grep(assume your grep has -P option):
kent$ grep -oP '^(.*?;){5}' file
1;abc ;xyz ;0.0000;3.0;
In sed you can match the pattern string; 5 times:
sed 's/\(\([^;]*;\)\{5\}\).*/\1/' A.txt
or, when your sedsupports -r:
sed -r 's/(([^;]*;){5}).*/\1/' A.txt
cut -f-5 -d";" A.txt > B.txt
Where:
- -f selects the fields (-5 from start to 5)
- -d provides a delimiter, (here the semicolon)
Given that the input is field-based, using awk is another option:
awk 'BEGIN { FS=OFS=";"; ORS=OFS"\n" } { NF=5; print }' A.txt > B.txt
If you're using BSD/macOS, insert $1=$1; after NF=5; to make this work.
FS=OFS=";" sets both the input field separator, FS, and the output field separator, OFS, to a semicolon.
The input field separator is used to break each input record (line) into fields.
The output field separator is used to rebuild the record when individual fields are modified or the number of fields are modified.
ORS=OFS"\n" sets the output record separator to a semicolon followed by a newline, given that a trailing ; should be output.
Simply omit this statement if the trailing ; is undesired.
{ NF=5; print } truncates the input record to 5 fields, by setting NF, the number (count) of fields to 5 and then prints the modified record.
It is at this point that OFS comes into play: the first 5 fields are concatenated to form the output record, using OFS as the separator.
Note: BSD/macOS Awk doesn't modify the record just by setting NF; you must additionally modify a field explicitly for the changed field count to take effect: a dummy operation such as $1=$1 (assigning field 1 to itself) is sufficient.
awk '{print $1,$2,$3}' A.txt >B.txt
1;abc ;xyz ;0.0000;3.0;

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

awk script: check if all words(fields) from one file are contained in another file

I am new to awk scripting.
I want to do a field by word (field) comparison of two files File1.txt and File2.txt.
The files contain a list of | (pipe) separated field.
File 1:
-------------------
aaa|bbb|ccc|eee|fff
lll|mmm|nnn|ooo|ppp
rrr|sss|ttt|uuu|vvv
File 2:
-------------------
aaa|bbb|ccc|eee|fff
rrr|sss|ttt|uuu|vvv
rrr|sss|ttt|uuu|uuu
We compare the same line no. in both the files.
Fields in Line 1 of both file match.
In Line 2 all the fields (lll, mmm, nnn, ooo, ppp) donot not match with all fields (rrr, sss, ttt, uuu, vvv) in line 2 of File 2. Similarly the 5th field (vvv, uuu) of 3rd line in both the files donot match.
Hence Line no. 2 and Line no. 3 should get echoed by bash.
Both files will follow an order.
this line should do:
awk 'NR==FNR{a[FNR]=$0;next}a[FNR]!=$0' file1 file2
output:
rrr|sss|ttt|uuu|vvv
rrr|sss|ttt|uuu|uuu
Two compare two files, better use already inbuilt command sdiff:
sdiff File1 File2
This will display the lines which differ in both files.
Doing with awk.
awk -F '|' 'NR==FNR{a[$0];next}!($0 in a){print $0}' file1 file2
The following lines may be adapted following needs, another language like perl may be more appropriate
i=1
while read -r -u4 l1 || read -r -u5 l2; do
if [[ $l1 != $l2 ]]; then
echo "$i: $l1 != $l2"
fi
((i+=1))
done 4<file1 5<file2

Resources