Remove certain lines in string? Shell - shell

Lets say I have a variable containing the following string (separated by blank spaces but when echoed it looks like lines):
animal:whale
animal:dog
animal:2_mice
animal:cat
And I have another variable containing the names of the animals I want to delete from the first string separated by spaces.
Lets say names="cat dog"
How would I go about deleting the lines that contain cat and dog from the first string?
I've tried looking up sed and grep approaches but haven't found any info that would work so far.
I should also point out that the expression needs to match exactly, so if the second variable contained just ca instead of cat, cat should not get deleted.

echo "$variable" | grep -Ev ":($(echo "$names" | sed 's/ /|/g'))$"

Using grep inverse search, assumes values in names variable is space separated and one word each:
$ echo "$var"
animal:whale
animal:dog
animal:2_mice
animal:cat
$ echo "$names"
cat dog
$ echo "$var" | grep -v $(printf ' -e :%s$' $names)
animal:whale
animal:2_mice

Related

How to properly validate a part of the output of a command in BASH [duplicate]

Given a file, for example:
potato: 1234
apple: 5678
potato: 5432
grape: 4567
banana: 5432
sushi: 56789
I'd like to grep for all lines that start with potato: but only pipe the numbers that follow potato:. So in the above example, the output would be:
1234
5432
How can I do that?
grep 'potato:' file.txt | sed 's/^.*: //'
grep looks for any line that contains the string potato:, then, for each of these lines, sed replaces (s/// - substitute) any character (.*) from the beginning of the line (^) until the last occurrence of the sequence : (colon followed by space) with the empty string (s/...// - substitute the first part with the second part, which is empty).
or
grep 'potato:' file.txt | cut -d\ -f2
For each line that contains potato:, cut will split the line into multiple fields delimited by space (-d\ - d = delimiter, \ = escaped space character, something like -d" " would have also worked) and print the second field of each such line (-f2).
or
grep 'potato:' file.txt | awk '{print $2}'
For each line that contains potato:, awk will print the second field (print $2) which is delimited by default by spaces.
or
grep 'potato:' file.txt | perl -e 'for(<>){s/^.*: //;print}'
All lines that contain potato: are sent to an inline (-e) Perl script that takes all lines from stdin, then, for each of these lines, does the same substitution as in the first example above, then prints it.
or
awk '{if(/potato:/) print $2}' < file.txt
The file is sent via stdin (< file.txt sends the contents of the file via stdin to the command on the left) to an awk script that, for each line that contains potato: (if(/potato:/) returns true if the regular expression /potato:/ matches the current line), prints the second field, as described above.
or
perl -e 'for(<>){/potato:/ && s/^.*: // && print}' < file.txt
The file is sent via stdin (< file.txt, see above) to a Perl script that works similarly to the one above, but this time it also makes sure each line contains the string potato: (/potato:/ is a regular expression that matches if the current line contains potato:, and, if it does (&&), then proceeds to apply the regular expression described above and prints the result).
Or use regex assertions: grep -oP '(?<=potato: ).*' file.txt
grep -Po 'potato:\s\K.*' file
-P to use Perl regular expression
-o to output only the match
\s to match the space after potato:
\K to omit the match
.* to match rest of the string(s)
sed -n 's/^potato:[[:space:]]*//p' file.txt
One can think of Grep as a restricted Sed, or of Sed as a generalized Grep. In this case, Sed is one good, lightweight tool that does what you want -- though, of course, there exist several other reasonable ways to do it, too.
This will print everything after each match, on that same line only:
perl -lne 'print $1 if /^potato:\s*(.*)/' file.txt
This will do the same, except it will also print all subsequent lines:
perl -lne 'if ($found){print} elsif (/^potato:\s*(.*)/){print $1; $found++}' file.txt
These command-line options are used:
-n loop around each line of the input file
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
You can use grep, as the other answers state. But you don't need grep, awk, sed, perl, cut, or any external tool. You can do it with pure bash.
Try this (semicolons are there to allow you to put it all on one line):
$ while read line;
do
if [[ "${line%%:\ *}" == "potato" ]];
then
echo ${line##*:\ };
fi;
done< file.txt
## tells bash to delete the longest match of ": " in $line from the front.
$ while read line; do echo ${line##*:\ }; done< file.txt
1234
5678
5432
4567
5432
56789
or if you wanted the key rather than the value, %% tells bash to delete the longest match of ": " in $line from the end.
$ while read line; do echo ${line%%:\ *}; done< file.txt
potato
apple
potato
grape
banana
sushi
The substring to split on is ":\ " because the space character must be escaped with the backslash.
You can find more like these at the linux documentation project.
Modern BASH has support for regular expressions:
while read -r line; do
if [[ $line =~ ^potato:\ ([0-9]+) ]]; then
echo "${BASH_REMATCH[1]}"
fi
done
grep potato file | grep -o "[0-9].*"

1. How to use the input not including the first one 2.Using grep and sed to find the pattern entered by the user and how to create the next line

The command that I'm making wants the first input to be a file and search how many times a certain pattern occurs within the file, using grep and sed.
Ex:
$ cat file1
oneonetwotwotwothreefourfive
Intended output:
$ ./command file1 one two three
one 2
two 3
three 1
The problem is the file does not have any lines and is just a long list of letters. I'm trying to use sed to replace the pattern I'm looking for with "FIND" and move the list to the next line and this continues until the end of file. Then, use $grep FIND to get the line that contains FIND. Finally, use wc -l to find a number of lines. However, I cannot find the option to move the list to the next line
Ex:
$cat file1
oneonetwosixone
Intended output:
FIND
FIND
twosixFIND
Another problem that I've been having is how to use the rest of the input, not including the file.
Failed attempt:
file=$1
for PATTERN in 2 3 4 5 ... N
do
variable=$(sed 's/$PATTERN/find/g' $file | grep FIND $file | wc -l)
echo $PATTERN $variable
exit
Another failed attempt:
file=$1
PATTERN=$($2,$3 ... $N)
for PATTERN in $*
do variable=$(sed 's/$PATTERN/FIND/g' $file | grep FIND $file | wc-1)
echo $PATTERN $variable
exit
Any suggestions and help will be greatly appreciated. Thank you in advance.
Non-portable solution with GNU grep:
file=$1
shift
for pattern in "$#"; do
echo "$pattern" $(grep -o -e "$pattern" <"$file" | wc -l)
done
If you want to use sed and your "patterns" are actually fixed strings (which don't contain characters that have special meaning to sed), you could do something like:
file=$1
shift
for pattern in "$#"; do
echo "$pattern" $(
sed "s/$pattern/\n&\n/g" "$file" |\
grep -e "$pattern" | wc -l
)
done
Your code has several issues:
you should quote use of variables where word splitting may happen
don't use ALLCAPS variable names - they are reserved for use by the shell
if you put a string in single-quotes, variable expansion does not happen
if you give grep a file, it won't read standard input
your for loop has no terminating done
This might work for you (GNU bash,sed and uniq):
f(){ local file=$1;
shift;
local args="$#";
sed -E 's/'${args// /|}'/\n&\n/g
s/(\n\S+)\n\S+/\1/g
s/\n+/\n/g
s/.(.*)/echo "\1"|uniq -c/e
s/ *(\S+) (\S+)/\2 \1/mg' $file; }
Separate arguments into file and remaining arguments.
Apply arguments as alternation within a sed substitution command which splits words into lines separated by a newline either side.
Remove unwanted words and unwanted newlines.
Evaluate the manufactured file within a sed substitution using the uniq command with the -c option.
Rearrange the output and print the result.
The problem is the file does not have any lines
Great! So the problem reduces to putting newlines.
func() {
file=$1
shift
rgx=$(printf "%s\\|" "$#" | sed 's#\\|$##');
# put the newline between words
sed 's/\('"$rgx"'\)/&\n/g' "$file" |
# it's just standard here
sort | uniq -c |
# filter only input - i.e. exclude fourfive
grep -xf <(printf " *[0-9]\+ %s\n" "$#")
};
func <(echo oneonetwotwotwothreefourfive) one two three
outputs:
2 one
1 three
3 two

extract string between '$$' characters - $$extractabc$$

I am working on shell script and new to it. I want to extract the string between double $$ characters, for example:
input:
$$extractabc$$
output
extractabc
I used grep and sed but not working out. Any suggestions are welcome!
You could do
awk -F"$" '{print $3}' file.txt
assuming the file contained input:$$extractabc$$ output:extractabc. awk splits your data into pieces using $ as a delimiter. First item will be input:, next will be empty, next will be extractabc.
You could use sed like so to get the same info.
sed -e 's/.*$$\(.*\)$$.*/\1/' file.txt
sed looks for information between $$s and outputs that. The goal is to type something like this .*$$(.*)$$.*. It's greedy but just stay with me.
looks for .* - i.e. any character zero or more times before $$
then the string should have $$
after $$ there'll be any character zero or more times
then the string should have another $$
and some more characters to follow
between the 2 $$ is (.*). String found between $$s is given a placeholder \1
sed finds such information and publishes it
Using grep PCRE (where available) and look-around:
$ echo '$$extractabc$$' | grep -oP "(?<=\\$\\$).*(?=\\$\\$)"
extractabc
echo '$$extractabc$$' | awk '{gsub(/\$\$/,"")}1'
extractabc
Here is an other variation:
echo "$$extractabc$$" | awk -F"$$" 'NF==3 {print $2}'
It does test of there are two set of $$ and only then prints whats between $$
Does also work for input like blabla$$some_data$$moreblabla
How about remove all the $ in the input?
$ echo '$$extractabc$$' | sed 's/\$//g'
extractabc
Same with tr
$ echo '$$extractabc$$' | tr -d '$'
extractabc

read values of txt file from bash [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 5 years ago.
I'm trying to read values from a text file.
I have test1.txt which looks like
sub1 1 2 3
sub8 4 5 6
I want to obtain values '1 2 3' when I specify 'sub1'.
The closest I get is:
subj="sub1"
grep "$subj" test1.txt
But the answer is:
sub8 4 5 6
I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:
test2.txt looks like:
sub1
1 2 3
sub8
4 5 6
However, when I type
grep "$subj" test2.txt
The answer is:
sub1
It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful
Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt
The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).
With your original text file format:
target=sub1
while IFS=$' \t\n' read -r key values; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.
As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:
target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
Alternately, using awk (for a standard UNIX text file):
target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt
...or, for a file with CR-only newlines:
target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'
This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).
grep "sub1" test1.txt | cut -c6-
or
grep -A 1 "sub1" test2.txt | tail -n 1
You doing it right, but it seems like test1.txt has a wrong value in it.
with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.
then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.
In the end the command would look like this ...
$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-
But this doenst explain why you could not find sub1 in the first place.
Did you read the proper file ?
There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:
egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"
or
egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"
The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.
BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).
-------------- EDIT ---------
As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3
What happens? After regexp /sub1/ the three following fields are printed.
Any drawbacks? It affects the space.
Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt
It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

String substitute in Shell script

I suppose to strip down a substring in my shell script. I am trying as follows:
fileName="Test_VSS_TT.csv.old"
here i want to remove the string ".csv.old" and my
test=${fileName%.*}
but getting bad substitution error.
you are looking for test=${filename%%.*}
the doc for parameter expansion in bash here and in zsh here
%.* will match the first .* pattern, whereas %%.* will match the longest one
[edit]
if sed is available, you could try something like that : echo "filename.txt.bin" | sed "s/\..*//g" which yields filename
Here you go,
$ echo $f
Test_VSS_TT.csv.old
$ test=${f%%.*}
$ echo $test
Test_VSS_TT
%% will do a longest match. So it matches from the first dot upto the last and then removes the matched characters.
If your intention is to extract file name without extension, then how about this?
$ echo ${fileName}
Test_VSS_TT.csv.old
$ test=`echo ${fileName} |cut -d '.' -f1`
$ echo $test
Test_VSS_TT
echo "Test_VSS_TT.csv.old"| awk -F"." '{print $1}'

Resources