Bash vlookup kind of solution - bash

I have two files,
File 1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
File 2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
i want to look up column 7 value from File 1 to check if it matches with column 7 value from File 2 and when matched, replace the that line in file 2 with corresponding line in file 1
So the output would be
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Thanks in advance.

You can do that with the following script:
BEGIN { FS="," }
NR==FNR {
lookup[$7] = $0
next
}
{
if (lookup[$7] != "") {
$0 = lookup[$7]
}
print
}
END {
print ""
print "Lookup table used was:"
for (i in lookup) {
print " Key '"i"', Value '"lookup[i]"'"
}
}
The BEGIN section simply sets the field separator to , so individual fields can be easily processed.
The NR and FNR variables are, respectively, the line number of the full input stream (all files) and the line number of the current file in the input stream. When you are processing the first (or only) file, these will be equal, so we use this as a means to simply store the lines from the first file, keyed on field seven.
When NR and FNR are not equal, it's because you've started the second file and this is where we want to replace lines if their key exists in the first file.
This is done by simply checking if a line exists in the lookup table with the desired key and, if it does, replacing the current line the lookup table line. Then we print the (original or replaced) line.
The END section is there just for debugging purposes, it outputs the lookup table that was created and used, and you can remove it once you're satisfied the script works as expected.
You'll see the output in the following transcript, illustrating hopefully that it is working correctly:
pax$ cat file1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
pax$ cat file2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
pax$ awk -f sudarshan.awk file1 file2
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Lookup table used was:
Key '36', Value '3,3,0,0,Test3,1540591243,36'
Key '52', Value '2,1,1,1,Test1,1540584051,52'
Key '54', Value '6,5,1,1,Test2,1540579206,54'
If you need it as a "short as possible" one-liner to use from your script, just use:
awk -F, 'NR==FNR{x[$7]=$0;next}{if(x[$7]!=""){$0=x[$7]};print}' file1 file2
though I prefer the readable version myself.

This might work for you (GNU sed):
sed -r 's|^([^,]*,){6}([^,]*).*|/^([^,]*,){6}\2/s/.*/&/p|' file1 | sed -rnf - file2
Turn file1 into a sed script and using the 7th field as a key lookup replace any line in file2 that matches.
In your example the 7th field is the last one, so a short version of the above solution is:
sed -r 's|.*,(.*)|/.*,\1/s/.*/&/p|' file1 | sed -nf - file2

Related

Extract lines from file2 that exist in file1 using a loop

I am very new at shell scripting and I am having some trouble with the following task:
I want to extract lines from file2 that are found also in file1 and extract those lines to a new file3. I am only allowed to use loops for this (I know it works with the basic grep command, but I need to find a way with a loop)
File1
John 5 red books
Ashley 4 yellow music
Susan 8 green films
File2
John
Susan
Desired output for file3 would be:
John 5 red books
Susan 8 green films
The desired output has to be found using bash script and a loop. I have tried the following loop, but I am missing some lines in the results by using this:
while read line
do
grep "${line}" $file1
done < $file2 >> file3.txt
If anyone has any thoughts on how to improve my script or any new ideas (again using loops) it would be greatly appreciated. Thank you!
Looping here is a good educational exercise but it isn't ideal for this in the real world.
Technically, this AWK solution works and uses a loop, but I'm guessing it's not what your instructor is looking for:
awk 'NR == FNR { find[$1]=1; next } find[$1]' File2 File1 >File3
I've swapped the order of the files so the file with the data (File1) is loaded after the file listing what we want (File2).
This starts with a condition that ensures we're on the first file AWK reads (NR is the "number of records" (lines) seen so far across all inputs and FNR is the current file's number of records, so since this clause requires them to be the same value, it can only fire on the first input file). It sets a hash (a data structure with key/value pairs, a.k.a. an associative array or dictionary) whose key is the value of the first column ($1) on the line so we can extract it later, then next skips the later stanza for that input line.
When the code loops through the next file (File1), the first clause does not fire and instead the first column of input is looked up in the find hash. If it is present, its value is 1 and that evaluates to true, so we print the value. (A clause with no action implies { print })
See Toby Speight's answer for a native bash answer with only builtins. It uses loops and hashes. You'll likely find that solution is slower on larger data sets.
Since you're using Bash, you could create an associative array from File2, and use that to check membership. Something like (untested):
read -a names <File2
local -A n
for i in "${names[#]}"
do n["$i"]="$i"
done
while read -r name rest
do [ "${n[$name]}" ] && echo "$name $rest"
done <File1 >file3
Awk solution:
awk 'NR==FNR{ arr[$0]="";next } { for (i in arr) { if (i == $1 ) { print $0 } } }' file2 file1
First we create an array of with the data in file2. We then use this to check the first space delimited piece of data and print if there is a match,
With awk :
$ awk 'NR==FNR{ a[$1];next } $1 in a' file2 file1`
With grep:
$ grep -F -f file2 file1

Split one file into multiple files based on pattern with awk

I have a binary file with the following format:
file
04550525023506346054(....)64645634636346346344363468badcafe268664363463463463463463463464647(....)474017497417428badcafe34376362623626(....)262
and I need to split it in multiple files (using awk) that look like this:
file1
045505250235063460546464563463634634634436346
file2
8badcafe26866436346346346346346346346464747401749741742
file3
8badcafe34376362623626262
I have found on stackoverflow the following line:
cat file |
awk -v RS="\x8b\xad\xca\xfe" 'NR > 1 { print RS $0 > "file" (NR-1); close("file" (NR-1)) }'
and it works for all the files but the first.
Indeed, the file I called file1, is not created because it does not start with the eye catcher 8badcafe.
How can I fix the previous command line in order to have the output I need?
Thanks!
try:
awk '{gsub(/8badcafe/,"\n&");num=split($0, a,"\n");for(i=1;i<=num;i++){print a[i] > "file"++e}}' Input_file
Substituting the string "8badcafe" to a new line and string's value. Then splitting the current line into an array named a whose field separator is new line. then traversing through the array a's all values and printing them one by one into the file1, file2.... with "file" and a increasing count variable named e.
Output files as follows:
cat file1
045505250235063460546464563463634634634436346
cat file2
8badcafe26866436346346346346346346346464747401749741742
cat file3
8badcafe34376362623626262

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

Extracting field from last row of given table using sed

I would like to write a bash script to extract a field in the last row of a table. I will illustrate by example. I have a text file containing tables with space delimited fields like ...
Table 1 (foobar)
num flag name comments
1 ON Frank this guy is frank
2 OFF Sarah she is tall
3 ON Ahmed who knows him
Table 2 (foobar)
num flag name comments
1 ON Mike he is short
2 OFF Ahmed his name is listed twice
I want to extract the first field in the last row of Table1, which is 3. Ideally I would like to be able to use any given table's title to do this. There are guaranteed carriage returns between each table. What would be the best way to accomplish this, preferably using sed and grep?
Awk is perfect for this, print the first field in the last row for each record:
$ awk '!$1{print a}{a=$1}END{print a}' file
3
2
Just from the first record:
$ awk '!$1{print a;exit}{a=$1}' file
3
Edit:
For a given table title:
$ awk -v t="Table 1" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
3
$ awk -v t="Table 2" '$0~t{f=1}!$1&&f{print a;f=0}{a=$1}END{if (f) print a}' file
2
This sed line seems to work for your sample.
table='Table 2'
sed -n "/$table"'/{n;n;:next;h;n;/^$/b last;$b last;b next;:last;g;s/^\s*\(\S*\).*/\1/p;}' file
Explanation: When we find a line matching the table name in $table, we skip that line, and the next (the field labels). Starting at :next we push the current line into the hold space, get the next line and see if it is blank or the end of the file, if not we go back to :next, push the current line into hold and get another. If it is blank or EOF, we skip to :last, pull the hold space (the last line of the table) into pattern space, chop out all but the first field and print it.
Just read each block as a record with each line as a field and then print the first sub-field of the last field of whichever record you care about:
$ awk -v RS= -F'\n' '/^Table 1/{split($NF,a," "); print a[1]}' file
3
$ awk -v RS= -F'\n' '/^Table 2/{split($NF,a," "); print a[1]}' file
2
Better tool to that is awk!
Here is a kind legible code:
awk '{
if(NR==1) {
row=$0;
next;
}
if($0=="") {
$0=row;
print $1;
} else {
row=$0;
}
} END {
if(row!="") {
$0=row;
print $1;
}
}' input.txt

Need an awk script or any other way to do this on unix

i have small file with around 50 lines and 2 fields like below
file1
-----
12345 8373
65236 7376
82738 2872
..
..
..
i have some around 100 files which are comma"," separated as below:
file2
-----
1,3,4,4,12345,,,23,3,,,2,8373,1,1
each file has many lines similar to the above line.
i want to extract from all these 100 files whose
5th field is eqaul to 1st field in the first file and
13th field is equal to 2nd field in the first file
I want to search all the 100 files using that single file?
i came up with the below in case of a single comma separated file.i am not even sure whether this is correct!
but i have multiple comma separated files.
awk -F"\t|," 'FNR==NR{a[$1$2]++;next}($5$13 in a)' file1 file2
can anyone help me pls?
EDIT:
the above command is working fine in case of a single file.
Here is another using an array, avoiding multiple work files:
#!/bin/awk -f
FILENAME == "file1" {
keys[$1] = ""
keys[$2] = ""
next
}
{
split($0, fields, "," )
if (fields[5] in keys && fields[13] in keys) print "*:",$0
}
I am using split because the field seperator in the two files are different. You could swap it around if necessary. You should call the script thus:
runit.awk file1 file2
An alternative is to open the first file explicitly (using "open") and reading it (readline) in a BEGIN block.
Here is a simple approach. Extract each line from the small file, split it into fields and then use awk to print lines from the other files which match those fields:
while read line
do
f1=$(echo $line | awk '{print $1}')
f2=$(echo $line | awk '{print $2}')
awk -v f1="$f1" -v f2="$f2" -F, '$5==f1 && $13==f2' file*
done < small_file

Resources