terminal extrac words ending with ".abc" from file - bash

I want to do the following through the terminal. I have a file with many lines, each line containing a whole sentence. Some lines are empty. I want to read the file and extract all words that end with .abc. I want to do this through the terminal. How might I do that?

grep can be very usefull
$ cat input
.abc
.abdadf
assadf.abc
adsfas.abcadf
asdf.abc
$ grep -o '\b[^\.]*\.abc\b' input
assadf.abc
asdf.abc
What it does
-o prints the string in the line which match the regex given
\b[^\.]*\.abc\b regex matches any word wich ends with .abc
\b word boundary
[^\.] anything other than a .
* matches zero or more
\.abc\b matches .abc followed by word boundary \b
Note
If the word can contain more than one . then modify the regex as
\b.*\.abc\b
where .* would match anything including .

To find all the words that ends with .abc.
grep -oP '\S*\.abc(?=\s|$)' file
\S* Zero or more non-space charcaters.
(?=\s|$) Positive lookahead asserts that the character following the match must be a space or end of the line anchor.

Try awk among various other possibities.
awk '/\.abc$/' file

You can use sed command also.
sed -n '/\.abc$/ p' file

Related

Replace a specific character at any word's begin and end in bash

I need to remove the hyphen '-' character only when it matches the pattern 'space-[A-Z]' or '[A-Z]-space'. (Assuming all letters are uppercase, and space could be a space, or newline)
sample.txt
I AM EMPTY-HANDED AND I- WA-
-ANT SOME COO- COOKIES
I want the output to be
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
I've looked around for answers using sed and awk and perl, but I could only find answers relating to removing all characters between two patterns or specific strings, but not a specific character between [A-Z] and space.
Thanks heaps!!
If perl is your option, would you try the following:
perl -pe 's/(^|(?<=\s))-(?=[A-Z])//g; s/(?<=[A-Z])-((?=\s)|$)//g' sample.txt
(?<=\s) is a zero-width lookbehind assertion which matches leading
whitespace without including it in the matched substring.
(?=[A-Z]) is a zero-width lookahead assertion which matches trailing
character between A and Z without including it in the matched substring.
As a result, only the dash characters which match the pattern above are
removed from the original text.
The second statement s/..//g is the flipped version of the first one.
Could you please try following.
awk '{for(i=1;i<=NF;i++){if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){sub(/-/,"",$i)}}} 1' Input_file
Adding a non-one liner form of solution:
awk '
{
for(i=1;i<=NF;i++){
if($i ~ /^-[a-zA-Z]+$|^[a-zA-Z]+-$/){
sub(/-/,"",$i)
}
}
}
1
' Input_file
Output will be as follows.
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
If you can provide Extended Regular Expressions to sed (generally with the -E or -r option), then you can shorten your sed expression to:
sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
Where the basic form is sed -E 's/find1/replace1/g;s/find2/replace2/g' file which can also be written as separate expressions sed -E -e 's/find1/replace1/g' -e 's/find2/replace2/g' (your choice).
The details of s/find1/replace1/g are:
find1 is
(^|\s) locate and capture at the beginning or whitespace,
followed by the '-' hyphen,
then capture the next \w (word-character); and
replace1 is simply \1\2 reinsert both captures with the first two backreferences.
The next substitution expression is similar, except now you are looking for the hyphen followed by a whitespace or at the end. So you have:
find2 being
a capture of \w (word-character),
followed by the hyphen,
followed by a capture of either a following space or the end (\s|$), then
replace2 is the same as before, just reinsert the captured characters using backreferences.
In each case the g indicates a global replace of all occurrences.
(note: the \w word-character also includes the '_' (underscore), so while unlikely you would have a hyphen and underscore together, if you do, you need to use the [A-Za-z] list instead of \w)
Example Use/Output
In your case, then output is:
$ sed -E 's/(^|\s)-(\w)/\1\2/g;s/(\w)-(\s|$)/\1\2/g' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES
remove the hyphen '-' character only when it matches the pattern 'space-[A-Z]' or '[A-Z]-space'. Assuming all letters are uppercase, and space could be a space, or newline
It's:
sed 's/\( \|^\)-\([A-Z]\)/\1\2/g; s/\([A-Z]\)-\( \|$\)/\1\2/g'
s - substitute
/
\( \|^\) - space or beginning of the line
- - hyphen...
\(A-Z]\) - a single upper case character
/
\1\2 - The \1 is replaced by the first \(...\) thing. So it is replaced by a space or nothing. \2 is replaced by the single upper case character found. Effectively - is removed.
/
g apply the regex globally
; - separate two s commands
s
Same as above. The $ means end of the line.
awk '{sub(/ -/,"");sub(/^-|-$/,"");sub(/- /," ")}1' file
I AM EMPTY-HANDED AND I WA
ANT SOME COO COOKIES

grep for a variable content with a dot

i found many similar questions about my issue but i still don't find the correct one for me.
I need to grep for the content of a variable plus a dot but it doesn't run escaping the dot after the variable. For example:
The file content is
item.
newitem.
My variable content is item. and i want to grep for the exact word, therefore I must use -w and not -F but with the command I can't obtain the correct output:
cat file | grep -w "$variable\."
Do you have suggestions please?
Hi, I have to rectify my scenario. My file contains some FQDN and for some reasons I have to look for hostname. with the dot.
Unfortunatelly the grep -wF doesn't run:
My file is
hostname1.domain.com
hostname2.domain.com
and the command
cat file | grep -wF hostname1.
doesn't show any output. I have to find another solution and I'm not sure that grep could help.
If $variable contains item., you're searching for item.\. which is not what you want. In fact, you want -F which interprets the pattern literally, not as a regular expression.
var=item.
echo $'item.\nnewitem.' | grep -F "$var"
Try:
grep "\b$word\."
\b: word boundary
\.: the dot itself is a word boundary
Following awk solution may help you in same.
awk -v var="item." '$0==var' Input_file
You are dereferencing variable and append \. to it, which results in calling
cat file | grep -w "item.\.".
Since grep accepts files as parameter, calling grep "item\." file should do.
from man grep
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent
character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
and
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string at the edge of a word, and \B matches the empty string
provided it's not at the edge of a word. The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]].
as the last character is a . it must be followed by a non word [A-Za-z0-9_] however the next character is d
grep '\<hostname1\.'
should work as \< ensures previous chracter is not a word constituent.
You can dynamically construct the search pattern and then call grep
rexp='^hostname1\.'
grep "$rexp" file.txt
The single quotes tell bash not to interpret special characters in the variable. Double quotes tell bash to allow replacing $rexp with its value. The caret ( ^ ) in the expression tells grep to look for lines starting with 'hostname1.'

grep for a specific pattern in a file?

I have a file textFile.txt
abc_efg#qwe.asd
abc_aer#
#avret
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
qwe.caer
I want to grep to get specific lines :
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
That is the ones that have
[a-z]_[a-z]#[a-z].[a-z]
but the part before the # can have any number of "_"
So far this is what I have :
grep "[a-z]_[a-z]#[a-z].[a-z]" textFile.txt
But I got only one line as the output.
wqe_a#qwea.cae
Could I know a better way to do this ? :)
you can add the _ simply inside [a-z_] so the new command is:
grep "[a-z_]#[a-z].[a-z]" textFile.txt
or if you want it to start with a non _ you can have
grep "[a-z][a-z_]#[a-z].[a-z]" textFile.txt
I would suggest keeping it simple by checking only one # is present in each line:
grep -E '^[^#]+#[^#]+$' file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
The following selects lines that have at least one underline character followed by letters before the at-sign and one or more letters followed by at least one literal period after the at-sign:
$ grep '_[a-z]\+#[a-z]\+\.' textFile.txt
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
Notes
An unescaped period matches any character. If you want to match a literal period, it must be escaped like '.`.
Thus, #[a-z].[a-z] matches an at-sign, followed by a letter, followed by anything at all, followed by a letter.
[a-z] matches a single letter. Thus _[a-z]# would match only if there was only one character between the underline and the at-sign. To match one or more letters, use [a-z]\+.
#[a-z]\+\. will match an at-sign, followed by one or more letters, followed by a literal period character.
When you do [a-z] it only matches one character of that set. That's why you are only getting wqe_a#qwea.cae back from your grep call because there is only one character between the _ and the #.
To match more than one character, you can use + or *. + means one or more of the set and * any number of that set. As well, an unescaped . means any character.
So something like:
grep "[a-z]\+_[a-z]\+#[a-z]\+\.[a-z]\+" textFile.txt would work for this. There are shorter, less specific ways of doing this as well (that other answers have shown).
Note the escapes before the + signs and the . .
This regex should get all valid email from a text file:
grep -E -o "\b[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" file
abc_efg#qwe.asd
afd_wer_asd#qweasd.zxcasd
wqe_a#qwea.cae
This greps for pattern like this text#text.some_more_text

Print all characters upto a matching pattern from a file

Maybe a silly question but I have a text file that needs to display everything upto the first pattern match which is a '/'. (all lines contain no blank spaces)
Example.txt:
somename/for/example/
something/as/another/example
thisfile/dir/dir/example
Preferred output:
somename
something
thisfile
I know this grep code will display everything after a matching pattern:
grep -o '/[^\n]*' '/my/file.txt'
So is there any way to do the complete opposite, maybe rm everything after matching pattern or invert to display my preferred output?
Thanks.
If you're calling an external command like grep, you can get the same results your require with the sed command, i.e.
echo "something/as/another/example" | sed 's:/.*::'
something
Instead of focusing on what you want to keep, think about what you want to remove, in this case everything after the first '/' char. This is what this sed command does.
The leading s means substitute, the :/.*: is the pattern to match, with /.* meaning match the first /' char and all characters after that. The 2nd half of thesedcommand is the replacement. With::`, this means replace with nothing.
The traditional idom for sed is to use s/str/rep/, using / chars to delimit the search from the replacement, but you can use any character you want after the initial s (substitute) command.
Some seds expect the / char, and want a special indication that the following character is the sub/replace delimiter. So if s:/.*:: doesn't work, then s\:/.*:: should work.
IHTH.
Yu can use a much simpler reg exp:
/[^/]*/
The forward slash after the carat is what you're matching to.
jsFiddle
Assuming filename as "file.txt"
cat file.txt | cut -d "/" -f 1
Here, we are cutting the input line with "/" as the delimiter (-d "/"). Then we select the first field (-f 1).
You just need to include starting anchor ^ and also the / in a negated character class.
grep -o '^[^/]*' file

searching specefic word in shell script

I have a problem.Please give me a solution.
I have to run a command as I given below, which will list all the files that contain the string given "abcde1234".
find /path/to/dir/ * | xargs grep abcde1234
But here it will display the files which contain the string "abcde1234567" also.But I nee only files which contain the word "abcde1234". What modification shall I need in the command ??
When I need something like that, I use the \< and \> which mean word boundary. Like this:
grep '\<abcde1234\>'
The symbols \< and \> respectively match the empty string at the beginning and end of a word.
But that's me. The correct way might be to use the -w switch instead (which I tend to forget about):
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it
must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
One more thing: instead of find + xargs you can use just find with -exec. Or actually just grep with -r:
grep -w -r abcde1234 /path/to/dir/
$ grep abcde1234 *
This will grep the string abcde1234 in current directory, with the file name in which the string is.
Ex:
abc.log: abcde1234 found
Hi I got the answer for this.By attaching $ with word to be searched, it will display the files that contain only that word.
The command will be like this.
find /path/to/dir/ * | xargs grep abcde1234$
Thanks.

Resources