moving records by getting the specific number's prefix - bash

I want to segregate some records to another file in bash.
I have two files, one that contains all the records which is FILE_A and another that contains possible prefixes (FILE_B) of the numbers contained in FILE_A.
With the values of FILE A and B below, I want to move the records from FILE A that has prefixes contained in FILE_B. Please take note that I must compare 11th to 19th digits of FILE_A only.
To further understand my query, please refer also the output below.
Thank you.
$cat FILE_A
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx755602188xxxx02345
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
xxxxxxxxxx082550123xxxx08234
note: num=11th to 19th digit
file that contains num_prefix
$cat FILE_B
04466358
0446646
02345
08234
note: num_prefix=all the values above
OUTPUT:
cat new_generated_file
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
It is important that the script may only compare the 11th-19th digit of File_A to File_B because the last 5-digit may affect the output.
Like this one:
$ sed 's/^0//' File_B > File_C; grep -f File_C File_A
gives me this output
xxxxxxxxxx234575234xxxx01234
xxxxxxxxxx755602188xxxx02345
xxxxxxxxxx044664690xxxx04567
xxxxxxxxxx044663581xxxx01234
xxxxxxxxxx082550123xxxx08234
(xxxxxxxxxx755602188xxxx02345 and xxxxxxxxxx082550123xxxx08234 were not supposed to be there)
because 08234 02345 (last 5 digits) are both in File_C

You can use:
grep -f <(sed 's/^0//' fileB) fileA
xxxxxxxxxx234575234xxxxx
xxxxxxxxxx044664690xxxxx
xxxxxxxxxx044663581xxxxx
xxxxxxxxxx082340123xxxxx
Update:
sed 's/^0//' fileB > fileC
while read -r f; do
`echo "$f" | cut -c 11-19 | grep -qf fileC` && echo "$f"
done < fileA
Update 2:
sed 's/^0//' fileB > fileC
cut -c 11-19 fileA | grep -f fileC > fileD
grep -F -f fileD fileA

#!/bin/ksh
sed -n '/^*$/ !{s/^0\{1,\}/0*/;s/^/^.{10}/;p;}' fileB > /tmp/CleanPrefix.egrep
egrep -f /tmp/CleanPrefix.egrep fileA
rm /tmp/CleanPrefix.egrep
Should also work for bash (without the first line to stay in bash)
It use a (e)grep regex as selector to avoid interference with surrounding content
Assuming the fileA is always starting with 10 char (any printing char)
Adapt to allow number starting with 0 to be with oy without in the fileA
posix sed compliant so use --posix option on GNU sed

Related

Unix/bash :Print filename as first string before each line in a log file

Looking for help on how to get append the name of the file as 1st string in each row of the file.
A file which has only content. I am trying to merge 2 files but with the content should have first string as the name of the file then row 1. etc. Lets consider 2 files with name FileA and FileB. FileA has 2 lines, FileB has 2 lines.
FileA
Tom is Cat
Jerry is Mouse
FileB
Cat is Tom
Mouse is Jerry
Expected Output of merged file
FileA Tom is Cat
FileA Jerry is Mouse
FileB Cat is Tom
FileB Mouse is Jerry.
I am struggling to find a solution to this. Please help
Use sed to substitute the filename at the beginning of each line of the file:
sed 's/^/FileA /' fileA >> mergedFile
sed 's/^/FileB /' fileB >> mergedFile
For an arbitrary number of files you can loop over all the filenames, and construct the sed substitution command dynamically using the variable with the filenames.
while read -r f
do
sed "s|^|$f |" "$f"
done < file.txt > merge.txt
Using awk and brace expansion.
awk '{print FILENAME, $0}' file{A,B} | tee mergefile
file names can be anything if that is not what you have, just put them as argument with awk
awk '{print FILENAME, $0}' filefoo filebar filemore ...
Can be done with grep also if your grep has the -H option/flag
grep -H . fileA fileB
Again filenames can be anything.
Using tee to send the output to stdout and mergefile.
If you prefer ripgrep over grep these two commands produce the same output:
$ grep --with-filename '' File*
$ rg --with-filename --no-heading --no-line-number '' File*
FileA:Tom is Cat
FileA:Jerry is Mouse
FileB:Cat is Tom
FileB:Mouse is Jerry

Unix-Read File Line by line.Check if a string exists on another file and do required operation

I need some assistance on the below.
File1.txt
aaa:/path/to/aaa:777
bob:/path/to/bbb:700
ccc:/path/to/ccc:600
File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
I should iterate file2.txt and if aaa exists in File1.txt, then i should compare the file permission. If the file permission is same for aaa in both the files then ignore.
If they are different then write them in the output.txt
So in above case
Output.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
How can i achieve this in unix shell script? Please suggest
I agree with the comment of #Marc that you should try something before asking here.
However, the following answer is difficult to find when you never have seen the constructions, so I give you something to study.
When you want to parse line by line, you can start with
while IFS=: read -r file path mode; do
comparewith=$(grep "^${file}:${path}:" File2.txt | cut -d: -f3)
# compare and output
done < File1.txt
For large files that will become very slow.
You can first filter the lines you want to compare from File2.txt.
You want to grep strings like aaa:/path/to/aaa:, including the last :. With cut -d: -f1-2 you might be fine with your inputfile, but maybe it is better to remove the last three characters:
sed 's/...$//' File1.txt.
You can let grep use the output as a file with expressions using <():
grep -f <(sed 's/...$//' File1.txt) File2.txt
Your example files don't show the situation when both files have identical lines (that you want to skip), you will need another process substitution to get that working:
grep -v -f File1.txt <(grep -f <(sed 's/...$//' File1.txt ) File2.txt )
Another solution, worth trying yourself, is using awk (see What is "NR==FNR" in awk? for accessing 2 files).
comm - compare two sorted files line by line
According to manual, comm -13 <file1> <file2> must print only lines unique to <file2>:
$ ls
File1.txt File2.txt
$ cat File1.txt
aaa:/path/to/aaa:777
bbb:/path/to/bbb:700
ccc:/path/to/ccc:600
$ cat File2.txt
aaa:/path/to/aaa:700
bbb:/path/to/bbb:700
ccc:/path/to/ccc:644
$ comm -13 File1.txt File2.txt
aaa:/path/to/aaa:700
ccc:/path/to/ccc:644
$ # Nice!
But it doesn't check for lines in <file1> that are "similar" to corresponding lines of <file2>. I. e. it won't work as you want if File1.txt has line BOB:/path/to/BOB:700 and File2.txt has BBB:/path/to/BBB:700 since it will print the latter (while you want it not to be printed).
It also won't do what you want if strings bbb:/path/to/bbb:700 and bbb:/another/path/to/bbb:700 are supposed to be "identical".

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

grep two files but some words not found

I have a list of names in fileA and want to get those lines in fileB.csv that contain a name in the list.
fileA looks like
noah
liam
jacob
mason
william
fileB.csv looks like
noah,1
liam,2
yoyoyo,44
williams,4
william,5
I want to output
noah,1
liam,2
william,5
But I got
noah,1
liam,2
What I did is (sed 's/$/,/' fileA | grep -wf fileA fileB.csv)
or even grep -wf fileA fileB.csv
However, I have no idea why some words not shown up.
You are modifying the fileA with sed so that you can use this modified file as the pattern for grep. If so, you need to do it properly:
grep -wf <(sed 's/$/,/' fileA) fileB.csv
Otherwise, sed ... | grep -wf fileA fileB was just doing the grep command, without taking into account the sed part of cleaning the file.
However, there is no need to add any comma to make this work, since this alone does it:
$ grep -wf fileA fileB
noah,1
liam,2
william,5
Note also that adding a comma will break the matching when using -w:
$ echo "hello,bye" | grep -w "hello"
hello,bye
$ echo "hello,bye" | grep -w "hello,"
$

Searching for Strings

I would like to have a shell script that searches two files and returns a list of strings:
File A contains just a list of unique alphanumeric strings, one per line, like this:
accc_34343
GH_HF_223232
cwww_34343
jej_222
File B contains a list of SOME of those strings (some times more than once), and a second column of infomation, like this:
accc_34343 dog
accc_34343 cat
jej_222 cat
jej_222 horse
I would like to create a third file that contains a list of the strings from File A that are NOT in File B.
I've tried using some loops with grep -v, but that doesn't work. So, in the above example, the new file would have this as it's contents:
GH_HF_223232
cwww_34343
Any help is greatly appreciated!
Here's what you can do:
grep -v -f <(awk '{print $1}' file_b) file_a > file_c
Explanation:
grep -v : Use -v option to grep to invert the matching
-f : Use -f option to grep to specify that the patterns are from file
<(awk '{print $1}' file_b): The <(awk '{print $1}' file_b) is to simply extract the first column values from file_b without using a temp file; the <( ... ) syntax is process substitution.
file_a : Tell grep that the file to be searched is file_a
> file_c : Output to be written to file_c
comm is used to find intersections and differences between files:
comm -23 <(sort fileA) <(cut -d' ' -f1 fileB | sort -u)
result:
GH_HF_223232
cwww_34343
I assume your shell is bash/zsh/ksh
awk 'FNR==NR{a[$0];next}!($1 in a)' fileA fileB
check here

Resources