Duplicate entries in file - shell

I have a file with content as below,
123 ABC
12345 ABC-test
In the shell script, I need an exact entry instead of two duplicate results, but unable to get the exact entry.
For example:
grep "ABC"
returns both the entries, but I want a specific entry, i.e., if I search for "ABC", I should get only "123 ABC" and not the other entry.

Since you consider words to be whitespace-separated chunks, it is easier to use awk here since it reads lines (records) and splits them into fields (non-whitespace chunks) by default:
awk '$2=="ABC"' file > newfile
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' file > newfile
Here, the first awk will output all lines where the second word is ABC. The second awk outputs all lines with ABC followed/preceded with a whitespace or at start/end of the line.
See the online demo:
#!/bin/bash
s='123 ABC
12345 ABC-test'
awk '$2=="ABC"' <<< "$s"
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' <<< "$s"
Output:
123 ABC

You have to forge proper regex (regular expression) - in this case you want only those lines, where ABC is not surrounded by other characters (is on boundaries):
grep -e '\bABC\b'
should do the work. -e switch enables extended regular expressions in grep. Check also some regex tutorials, i.e. https://www.regular-expressions.info/tutorial.html.

Related

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

Remove comma and change word order

I have a text file in which every line has this format (the number of words before and after the comma may vary, but it is at least one before and one after it):
some words,a few other words
And I want its content to be displayed on the terminal like this:
a few other words some words
How can I do this? The only thing I can think of is using the tr command to replace the comma with a space, but I have no clue as to how to change the order of the words.
Any help would be appreciated.
If you don't mind using awk, you could use the -F option:
$>cat f
some words,a few other words
foo,bar
$>awk -F',' '{print $2,$1}' f
a few other words some words
bar foo
An other option is using sed:
$>sed 's/\(.*\),\(.*\)/\2 \1/g' f
a few other words some words
bar foo
Finally, with a while loop:
$>while IFS=, read part1 part2; do echo "$part2" "$part1";done <f
a few other words some words
bar foo
You have to be sure that there is only 1 comma on each line though…
cat yourfile.txt|sed -E 's/(.*),(.*)/\2 \1/'
This is what AWK is for (when defining field separator to be the comma):
$ echo "some words,a few other words" | awk -F, '{ print $2,$1 }'
a few other words some words
Edit:
Just noticed that #fredtantini had the same (accepted) solution. I leave it here as it shows the solution in a more concise (clear) way (i.e., it doesn't utilise an external file).

grep matching specific position in lines using words from other file

I have 2 file
file1:
12342015010198765hello
12342015010188765hello
12342015010178765hello
whose each line contains fields at fixed positions, for example, position 13 - 17 is for account_id
file2:
98765
88765
which contains a list of account_ids.
In Korn Shell, I want to print lines from file1 whose position 13 - 17 match one of account_id in file2.
I can't do
grep -f file2 file1
because account_id in file2 can match other fields at other positions.
I have tried using pattern in file2:
^.{12}98765.*
but did not work.
Using awk
$ awk 'NR==FNR{a[$1]=1;next;} substr($0,13,5) in a' file2 file1
12342015010198765hello
12342015010188765hello
How it works
NR==FNR{a[$1]=1;next;}
FNR is the number of lines read so far from the current file and NR is the total number of lines read so far. Thus, if FNR==NR, we are reading the first file which is file2.
Each ID in in file2 is saved in array a. Then, we skip the rest of the commands and jump to the next line.
substr($0,13,5) in a
If we reach this command, we are working on the second file, file1.
This condition is true if the 5 character long substring that starts at position 13 is in array a. If the condition is true, then awk performs the default action which is to print the line.
Using grep
You mentioned trying
grep '^.{12}98765.*' file2
That uses extended regex syntax which means that -E is required. Also, there is no value in matching .* at the end: it will always match. Thus, try:
$ grep -E '^.{12}98765' file1
12342015010198765hello
To get both lines:
$ grep -E '^.{12}[89]8765' file1
12342015010198765hello
12342015010188765hello
This works because [89]8765 just happens to match the IDs of interest in file2. The awk solution, of course, provides more flexibility in what IDs to match.
Using sed with extended regex:
sed -r 's#.*#/^.{12}&/p#' file2 |sed -nr -f- file1
Using Basic regex:
sed 's#.*#/^.\\{12\\}&/p#' file1 |sed -n -f- file
Explanation:
sed -r 's#.*#/^.{12}&/p#' file2
will generate an output:
/.{12}98765/p
/.{12}88765/p
which is then used as a sed script for the next sed after pipe, which outputs:
12342015010198765hello
12342015010188765hello
Using Grep
The most convenient is to put each alternative in a separate line of the file.
You can look at this question:
grep multiple patterns single file argument list too long

searching/extracting specific word not the complete line in unix

I would like to search a word in a file in Unix that should return only the word not the complete line.
For ex:
Sample.text:
Hello abc hi aeabcft 123abc OK
Expected output:
abc
aeabcft
123abc
If I search abc in file Sample.txt using grep, it will return complete line but I want the words that contains abc
You can use grep -Eo with an enhanced regex to search all matching words
grep -Eo '\b[[:alnum:]]*abc[[:alnum:]]*\b' Sample.text
abc
aeabcft
123abc
As per man grep:
-o, --only-matching
Prints only the matching part of the lines.
If you need to also do other processing to the file that grep can't accomplish, you could use Awk for printing only the regex match on the line.
awk -v r="abc" '{m=match($0,r,a)}m{print a[0]}' file
Otherwise I'd just use anubhava's grep -o suggestion as it's shorter and clearer.

Extract all characters after a match - shell script

I am in need to extract all characters after a pattern match.
For example ,
NAME=John
Age=16
I need to extract all characters after "=". Output should be like
John
16
I cant go with perl or Jython for this purpose because of some restrictions.
I tried with grep , but to my knowledge I came as shown below only
echo "NAME=John" |grep -o -P '=.{0,}'
You were pretty close:
grep -oP '(?<=\w=)\w+' file
makes it.
Explanation
it looks for any word after word= and prints it.
-o stands for "Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line".
-P stands for "Interpret PATTERN as a Perl regular expression".
(?<=\w=)\w+ means: match only \w+ following word=. More info in [Regex tutorial - Lookahead][1] and in [this nice explanation by sudo_O][2].
Test
$ cat file
NAME=John
Age=16
$ grep -oP '(?<=\w=)\w+' file
John
16
One sed solution
sed -ne 's/.*=//gp' <filename>
another awk solution
awk -F= '$0=$2' <filename>
Explanation:
in sed we remove anything from the beginning of a line till a = and print the rest.
in awk we break the string in 2 parts, separated by =, now after that $0=$2 is making replacing the whole string with the second portion

Resources