Split one file into multiple files based on pattern with awk - bash

I have a binary file with the following format:
file
04550525023506346054(....)64645634636346346344363468badcafe268664363463463463463463463464647(....)474017497417428badcafe34376362623626(....)262
and I need to split it in multiple files (using awk) that look like this:
file1
045505250235063460546464563463634634634436346
file2
8badcafe26866436346346346346346346346464747401749741742
file3
8badcafe34376362623626262
I have found on stackoverflow the following line:
cat file |
awk -v RS="\x8b\xad\xca\xfe" 'NR > 1 { print RS $0 > "file" (NR-1); close("file" (NR-1)) }'
and it works for all the files but the first.
Indeed, the file I called file1, is not created because it does not start with the eye catcher 8badcafe.
How can I fix the previous command line in order to have the output I need?
Thanks!

try:
awk '{gsub(/8badcafe/,"\n&");num=split($0, a,"\n");for(i=1;i<=num;i++){print a[i] > "file"++e}}' Input_file
Substituting the string "8badcafe" to a new line and string's value. Then splitting the current line into an array named a whose field separator is new line. then traversing through the array a's all values and printing them one by one into the file1, file2.... with "file" and a increasing count variable named e.
Output files as follows:
cat file1
045505250235063460546464563463634634634436346
cat file2
8badcafe26866436346346346346346346346464747401749741742
cat file3
8badcafe34376362623626262

Related

Bash vlookup kind of solution

I have two files,
File 1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
File 2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
i want to look up column 7 value from File 1 to check if it matches with column 7 value from File 2 and when matched, replace the that line in file 2 with corresponding line in file 1
So the output would be
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Thanks in advance.
You can do that with the following script:
BEGIN { FS="," }
NR==FNR {
lookup[$7] = $0
next
}
{
if (lookup[$7] != "") {
$0 = lookup[$7]
}
print
}
END {
print ""
print "Lookup table used was:"
for (i in lookup) {
print " Key '"i"', Value '"lookup[i]"'"
}
}
The BEGIN section simply sets the field separator to , so individual fields can be easily processed.
The NR and FNR variables are, respectively, the line number of the full input stream (all files) and the line number of the current file in the input stream. When you are processing the first (or only) file, these will be equal, so we use this as a means to simply store the lines from the first file, keyed on field seven.
When NR and FNR are not equal, it's because you've started the second file and this is where we want to replace lines if their key exists in the first file.
This is done by simply checking if a line exists in the lookup table with the desired key and, if it does, replacing the current line the lookup table line. Then we print the (original or replaced) line.
The END section is there just for debugging purposes, it outputs the lookup table that was created and used, and you can remove it once you're satisfied the script works as expected.
You'll see the output in the following transcript, illustrating hopefully that it is working correctly:
pax$ cat file1
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
3,3,0,0,Test3,1540591243,36
pax$ cat file2
2,1,0,2,Test1,1540584051,52
6,5,0,2,Test2,1540579206,54
pax$ awk -f sudarshan.awk file1 file2
2,1,1,1,Test1,1540584051,52
6,5,1,1,Test2,1540579206,54
Lookup table used was:
Key '36', Value '3,3,0,0,Test3,1540591243,36'
Key '52', Value '2,1,1,1,Test1,1540584051,52'
Key '54', Value '6,5,1,1,Test2,1540579206,54'
If you need it as a "short as possible" one-liner to use from your script, just use:
awk -F, 'NR==FNR{x[$7]=$0;next}{if(x[$7]!=""){$0=x[$7]};print}' file1 file2
though I prefer the readable version myself.
This might work for you (GNU sed):
sed -r 's|^([^,]*,){6}([^,]*).*|/^([^,]*,){6}\2/s/.*/&/p|' file1 | sed -rnf - file2
Turn file1 into a sed script and using the 7th field as a key lookup replace any line in file2 that matches.
In your example the 7th field is the last one, so a short version of the above solution is:
sed -r 's|.*,(.*)|/.*,\1/s/.*/&/p|' file1 | sed -nf - file2

Merge two pipe separated files into one file based on some condition

I have two files as below:
File1:
a1|f1|c1|d1|e1
a2|f1|c2|d2|e2
a3|f2|c3|d3|e3
a4|f2|c4|d4|e4
a5|f4|c5|d5|e5
File2:
z1|f1|c1|d1|e1
z2|f1|c2|d2|e2
z3|f2|c3|d3|e3
z4|f2|c4|d4|e4
z5|f3|c5|d5|e5
Output file should have lines interleaved from both the files such that the rows are sorted according to 2nd field.
Output file:
a1|f1|c1|d1|e1
a2|f1|c2|d2|e2
z1|f1|c1|d1|e1
z2|f1|c2|d2|e2
a3|f2|c3|d3|e3
a4|f2|c4|d4|e4
z3|f2|c3|d3|e3
z4|f2|c4|d4|e4
z5|f3|c5|d5|e5
a5|f4|c5|d5|e5
I tried appending File2 to File1 and then sort on 2nd field. But it does not maintain the order present in the source files.
file_1:
a1|f1|c1|d1|e1
a2|f1|c2|d2|e2
a3|f2|c3|d3|e3
a4|f2|c4|d4|e4
a5|f4|c5|d5|e5
file_2:
z1|f1|c1|d1|e1
z2|f1|c2|d2|e2
z3|f2|c3|d3|e3
z4|f2|c4|d4|e4
z5|f3|c5|d5|e5
awk -F"|" '{a[$2] = a[$2]"\n"$0;} END {for (var in a) print a[var]}' file_1 file_2 | sed '/^\s*$/d'
awk
-F : tokenize the data on '|' character.
a[$2] : creates an hash table whose key is string identified by $2 and
value is previous data at a[$2] + current complete line ($0) separated by newline.
sed
used to remove the empty lines from the output.
Output:
a1|f1|c1|d1|e1
a2|f1|c2|d2|e2
z1|f1|c1|d1|e1
z2|f1|c2|d2|e2
a3|f2|c3|d3|e3
a4|f2|c4|d4|e4
z3|f2|c3|d3|e3
z4|f2|c4|d4|e4
z5|f3|c5|d5|e5
a5|f4|c5|d5|e5

ksh shell script to print and delete matched line based on a string

I have 2 files like below. I need a script to find string from file2 in file1 and delete the line which contains the string from file1 and put it in another file (output1.txt). Also it shld print the lines deleted and the string if the string doesn't exist in File1 (Ouput2.txt).
File1:
Apple
Boy: Goes to school
Cat
File2:
Boy
Dog
I need output like below.
Output1.txt:
Apple
Cat
Output2.txt:
Dog
Can anyone help please
If you have awk available on your system:
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File2 File1 > Output1.txt
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File1 File2 > Output2.txt
The script is storing in an array a the first element $1 of the first file given in argument.
If the first parameter of the second file is not part of the array, print it.
Note that the delimiter is either a space or a :

How to add numbers from multiple files and write to another file using unix

I have five files as shown below, with the single line with comma separated value
File 1
abc,100
File 2
abc,200
File 3
abc,300
File 4
abc,700
File 5
abc,800
I need the output as by adding the numbers from all above files.
the output script should be in the single line code.
Output file
abc,2100
awk -F, '{code=$1; total += $2} END {printf("%s,%d\n", code, total)}' file1 file2 file3 file4 file5 > outputfile
Try:
awk -F\, '{a[$1]+=$2}END{for (i in a){print i","a[i]}}' file* > target
This will be usable for mutiple key input files.
For the new expected output:
awk -F\, '{a[$1]+=$2}END{for (i in a){key=key"_"i;cont+=a[i]};sub(/^_/,"",key);print key","cont}' file*
Results
abc_bbc,2100

Need an awk script or any other way to do this on unix

i have small file with around 50 lines and 2 fields like below
file1
-----
12345 8373
65236 7376
82738 2872
..
..
..
i have some around 100 files which are comma"," separated as below:
file2
-----
1,3,4,4,12345,,,23,3,,,2,8373,1,1
each file has many lines similar to the above line.
i want to extract from all these 100 files whose
5th field is eqaul to 1st field in the first file and
13th field is equal to 2nd field in the first file
I want to search all the 100 files using that single file?
i came up with the below in case of a single comma separated file.i am not even sure whether this is correct!
but i have multiple comma separated files.
awk -F"\t|," 'FNR==NR{a[$1$2]++;next}($5$13 in a)' file1 file2
can anyone help me pls?
EDIT:
the above command is working fine in case of a single file.
Here is another using an array, avoiding multiple work files:
#!/bin/awk -f
FILENAME == "file1" {
keys[$1] = ""
keys[$2] = ""
next
}
{
split($0, fields, "," )
if (fields[5] in keys && fields[13] in keys) print "*:",$0
}
I am using split because the field seperator in the two files are different. You could swap it around if necessary. You should call the script thus:
runit.awk file1 file2
An alternative is to open the first file explicitly (using "open") and reading it (readline) in a BEGIN block.
Here is a simple approach. Extract each line from the small file, split it into fields and then use awk to print lines from the other files which match those fields:
while read line
do
f1=$(echo $line | awk '{print $1}')
f2=$(echo $line | awk '{print $2}')
awk -v f1="$f1" -v f2="$f2" -F, '$5==f1 && $13==f2' file*
done < small_file

Resources