Bash Replace value with string code - bash

I have two files that go like this:
file1(reference file)
BBB;33
AAA;2
CCC;5
file2
5;.;.;.
33;.;.;.
I would like to replace the corresponding string in the first column of the file 1 with the corresponding value in column1 file 2 so to have:
output
CCC;.;.;.
BBB;.;.;.
Hope this is clear,
Thanks for suggestions.

If I understand you correctly and the order is correct in the files,
$ cat file1
BBB;33
AAA;2
CCC;5
$ cat file2
33;.;.;.
2;.;.;.
5;.;.;.
$ paste file1 file2 | sed 's/\([0-9]\+\)\t\1;//'
BBB;.;.;.
AAA;.;.;.
CCC;.;.;.
Add > file3 to the last command to write the output to file3. Then you can do mv file3 file1.

Related

Remove basename (last directory) from a path and store them in a file

In bash, how to do it?
For eg i have a text file named FILE1 having 4 paths in it seperated by new line:
abc/def/zzz.txt
ghi.jkl/zzz.txt
mno.pqr/zzz.txt
stu.wvx/zzz.txt
I want to create another file named FILE2 from FILE1, which only includes:
abc/def/
ghi/jkl/
mno/pqr/
stu/wvx/
How to do this?
using sed:
sed -r 's|[^\/]+$||g' FILE1 > FILE2
and see regex demo.
#!/bin/bash
for line in $(cat FILE1)
do
echo "${line%/*}/" >> FILE2
done
or
#!/bin/bash
while read -r line
do
echo "${line%/*}/" >> FILE2
done < FILE1

Split one file into multiple files based on pattern with awk

I have a binary file with the following format:
file
04550525023506346054(....)64645634636346346344363468badcafe268664363463463463463463463464647(....)474017497417428badcafe34376362623626(....)262
and I need to split it in multiple files (using awk) that look like this:
file1
045505250235063460546464563463634634634436346
file2
8badcafe26866436346346346346346346346464747401749741742
file3
8badcafe34376362623626262
I have found on stackoverflow the following line:
cat file |
awk -v RS="\x8b\xad\xca\xfe" 'NR > 1 { print RS $0 > "file" (NR-1); close("file" (NR-1)) }'
and it works for all the files but the first.
Indeed, the file I called file1, is not created because it does not start with the eye catcher 8badcafe.
How can I fix the previous command line in order to have the output I need?
Thanks!
try:
awk '{gsub(/8badcafe/,"\n&");num=split($0, a,"\n");for(i=1;i<=num;i++){print a[i] > "file"++e}}' Input_file
Substituting the string "8badcafe" to a new line and string's value. Then splitting the current line into an array named a whose field separator is new line. then traversing through the array a's all values and printing them one by one into the file1, file2.... with "file" and a increasing count variable named e.
Output files as follows:
cat file1
045505250235063460546464563463634634634436346
cat file2
8badcafe26866436346346346346346346346464747401749741742
cat file3
8badcafe34376362623626262

ksh shell script to print and delete matched line based on a string

I have 2 files like below. I need a script to find string from file2 in file1 and delete the line which contains the string from file1 and put it in another file (output1.txt). Also it shld print the lines deleted and the string if the string doesn't exist in File1 (Ouput2.txt).
File1:
Apple
Boy: Goes to school
Cat
File2:
Boy
Dog
I need output like below.
Output1.txt:
Apple
Cat
Output2.txt:
Dog
Can anyone help please
If you have awk available on your system:
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File2 File1 > Output1.txt
awk -v FS='[ :]' 'NR==FNR{a[$1]}NR>FNR&&!($1 in a){print $1}' File1 File2 > Output2.txt
The script is storing in an array a the first element $1 of the first file given in argument.
If the first parameter of the second file is not part of the array, print it.
Note that the delimiter is either a space or a :

How to add numbers from multiple files and write to another file using unix

I have five files as shown below, with the single line with comma separated value
File 1
abc,100
File 2
abc,200
File 3
abc,300
File 4
abc,700
File 5
abc,800
I need the output as by adding the numbers from all above files.
the output script should be in the single line code.
Output file
abc,2100
awk -F, '{code=$1; total += $2} END {printf("%s,%d\n", code, total)}' file1 file2 file3 file4 file5 > outputfile
Try:
awk -F\, '{a[$1]+=$2}END{for (i in a){print i","a[i]}}' file* > target
This will be usable for mutiple key input files.
For the new expected output:
awk -F\, '{a[$1]+=$2}END{for (i in a){key=key"_"i;cont+=a[i]};sub(/^_/,"",key);print key","cont}' file*
Results
abc_bbc,2100

Shell script - copy lines from file by key

I have two input files such that:
file1
123
456
789
file2
123|foo
456|bar
999|baz
I need to copy the lines from file2 whose keys are in file1, so the end result is:
file3
123|foo
456|bar
Right now, I'm using a shell script that loops through they key file and uses grep for each one:
grep "^${keys[$keyindex]}|" $datafile >&4
But as you can imagine, this is extremely slow. The key file (file1) has approximately 400,000 keys and the data file (file2) has about 750,000 rows. Is there a better way to do this?
You can try using join:
join -t'|' file1.txt file2.txt > file3.txt
I would use something like Python, which would process it pretty fast if you used an optimized data type like set. Not sure of your exact requirements, so you would need to adjust accordingly.
#!/usr/bin/python
# Create a set to store all of the items in file1
Set1 = set()
for line in open('file1', 'r'):
Set1.add(line.strip())
# Open a file to write to
file4 = open('file4', 'w')
# Loop over file2, and only write out the items found in Set1
for line in open('file2', 'r'):
if '|' not in line:
continue
parts = line.strip().split('|', 1)
if parts[0] in Set1:
file4.write(parts[1] + "\n")
join is the best solution, if sorting is OK. An awk solution:
awk -F \| '
FILENAME==ARGV[1] {key[$1];next}
$1 in key
' file1 file2

Resources