Using "awk" to find a string between two other specific strings: - bash

I have a large output of text that'll include several lines like this:
sending:WHATIWANT:output
How would I use awk to make it so that this output would ONLY include WHATIWANT on each line?
edit: there is a changing amount of text before and after WHATIWANT so something like awk -F: '{print $2}' would not always work

From what you mention in the comments, this should do it:
perl -n -e'/sending:([^:]+):output/ && print $1' input_file
This runs a simple regex match line-by-line, capturing the interesting part and then printing it. It assumes that WHATIWANT does not contain the character :
If for some reason you absolutely must use awk(1), then I think you don't have much choice but to do this:
awk -F: '{ for (i = 2; i < NF; i++) if ($(i-1) == "sending" && $(i+1) == "output") print $i }' input_file
It basically splits each line by : and iterates through every field, comparing the left and right fields until it finds one that is between sending and output. Again, it assumes that WHATIWANT does not have a :

Can't you just use sed?
echo "asfasfdsf__sending:WHATIWANT:output__asdfadas" | sed -n 's/.*sending\:\([a-zA-Z0-9]*\)\:output.*/\1/p'
Gives you "WHATIWANT"

Related

AWK Finding a way to print lines containing a word from a comma separated string

I want to write a bash script that only prints lines that, on their second column, contain a word from a comma separated string. Example:
words="abc;def;ghi;jkl"
>cat log1.txt
hello;abc;1234
house;ab;987
mouse;abcdef;654
What I want is to print only lines that contain a whole word from the "words" variable. That means that "ab" won't match, neither will "abcdef". It seems so simple yet after trying for manymany hours, I was unable to find a solution.
For example, I tried this as my awk command, but it matched any substring.
-F \; -v b="TSLA;NVDA" 'b ~ $2 { print $0 }'
I will appreciate any help. Thank you.
EDIT:
A sample input would look like this
1;UNH;buy;344.74
2;PG;sell;138.60
3;MSFT;sell;237.64
4;TSLA;sell;707.03
A variable like this would be set
filter="PG;TSLA"
And according to this filter, I want to echo these lines
2;PG;sell;138.60
4;TSLA;sell;707.03
Grep is a good choice here:
grep -Fw -f <(tr ';' '\n' <<<"$words") log1.txt
With awk I'd do
awk -F ';' -v w="$words" '
BEGIN {
n = split(w, a, /;/)
# next line moves the words into the _index_ of an array,
# to make the file processing much easier and more efficient
for (i=1; i<=n; i++) words[a[i]]=1
}
$2 in words
' log1.txt
You may use this awk:
words="abc;def;ghi;jkl"
awk -F';' -v s=";$words;" 'index(s, FS $2 FS)' log1.txt
hello;abc;1234

bash grep for string and ignore above one line

One of my script will return output as below,
NameComponent=Apache
Fixed=False
NameComponent=MySQL
Fixed=True
So in the above output, I am trying to ignore the below output using grep grep -vB1 'False' which seems not working,
NameComponent=Apache
Fixed=False
Is it possible to perform this using grep or is any better way with awk..
<some-command> |tac |sed -e '/False/ { N; d}' |tac
NameComponent=MySQL
Fixed=True
For every line that matches "False", the code in the {} gets executed. N takes the next line into the pattern space as well, and then d deletes the whole thing before moving on to the next line. Note: using multiple pipes is not considered as good practice.
#Karthi1234: If your Input_file is same as provided samples then try:
awk -F' |=' '($2 != "Apache" && $2 != "False")' Input_file
First making field separator as a space or = then checking here if field 2nd's value is not equal to sting Apache and False and mentioned no action to be performed so default print action will be done by awk.
EDIT: as per OP's request following is the code changed one, try:
awk '!/Apache/ && !/False/' Input_file
You could change strings too in case if these are not the ones which you want, logic should be same.
EDIT2: eg--> You could change values of string1 and string2 and increase the conditions if needed as per your requirement.
awk '!/string1/ && !/string2/' Input_file
If I understand the question correctly you will always have a line before "Fixed=..." and you want to print both lines if and only if "Fixed=True"
The following awk should do the trick:
< command > | awk 'BEGIN {prev='NA'} {if ($0=="Fixed=True") {print prev; print $0;} prev=$0;}'
Note that if the first line is "Fixed=True" it will print the string "NA" as the first line.

Bash split string according to string

In python, I would do something simple like sRet = sOut.split('Word')
In bash, scrounged from other answers, I have the following two methods that are insufficient in my case, but may be useful to someone in the future:
sOut="I want this Point to matter"
1) sRet=( $sOut )
2) IFS="Point " read -r -a sRet <<< ${sOut}
echo ${sRet[-1]}
I want returned: "to matter"
(1) gives: "matter"
(2) gives: "er"
The first only splits by spaces, the second splits by the last character, in this case it would be 't'.
How do I split by a full string, as I would in python?
sOut="I want this Point to matter"
s="Point "
[[ $sOut =~ $s(.*) ]] && echo ${BASH_REMATCH[1]}
Output:
to matter
IFS is single character, so you will need to deploy another tool. I'd suggest awk in this case:
$ awk -F 'Point' '{print $NF}' <<< "$sOut"
to matter
You can replace 'Point' with a variable holding the delimiter. You can also change which part of the split you get back. The variable $NF means "the last element". You can also use $1 for the first element, $2 for the second, and so on.
You can use awk for splitting the string:
text="I want this Point to matter"
s='Point'
awk -v s="$s" -v text="$text" 'BEGIN {split(text, a, "[[:blank:]]*" s "[[:blank:]]*");
for (i in a) print a[i]}'
I want this
to matter
To get only the last match:
awk -v s="$s" -v text="$text" 'BEGIN {n=split(text, a, "[[:blank:]]*" s "[[:blank:]]*"); print a[n]}'
to matter
Or:
awk -v s="$s" 'BEGIN{FS="[[:blank:]]*" s "[[:blank:]]*"} {print $NF}' <<< "$text"
to matter
IFS on the other hand doesn't work with multiple character string. So IFS='Point' will split the output on each character P, o, i, n, t.
sDelim="Point"
sRet1=$(awk -F ${sDelim} '{print $1}' <<< ${sOut})
sRet2=$(awk -F ${sDelim} '{print $NF}' <<< ${sOut})
Given all the other excellent answers, I prefer this one most for the following reasons:
1) Its short ans sweet
2) Everything is fairly explicit when wanting to use variables
3) Any elements can be selected: 1,2,.. from the beginning, NF, NF-1,.. from the end
4) if sDelim is not actually in sOut, the script doesn't freak out
Thanks mainly to #bishop for leading me to this
You could use the parenthesis feature of sed to retrieve
the string that is matched.
The below code:
sOut="I want this point to matter"
s="point "
echo $sOut | sed "s/.*$s\(.*\)/\1/"
would give me:
to matter
as output.

How can I find unique characters per line of input?

Is there any way to extract the unique characters of each line?
I know I can find the unique lines of a file using
sort -u file
I would like to determine the unique characters of each line (something like sort -u for each line).
To clarify: given this input:
111223234213
111111111111
123123123213
121212122212
I would like to get this output:
1234
1
123
12
Using sed
sed ':;s/\(.\)\(.*\)\1/\1\2/;t' file
Basically what it does is capture a character and check if it appears anywhere else on the line. It also captures all the characters between these.
Then it replaces all of that including the second occurence with just first occurence and then what was inbetween.
t is test and jumps to the : label if the previous command was successful. Then this repeats until the s/// command fails meaning only unique characters remain.
; just separates commands.
1234
1
123
12
Keeps order as well.
It doesn't get things in the original order, but this awk one-liner seems to work:
awk '{for(i=1;i<=length($0);i++){a[substr($0,i,1)]=1} for(i in a){printf("%s",i)} print "";delete a}' input.txt
Split apart for easier reading, it could be stand-alone like this:
#!/usr/bin/awk -f
{
# Step through the line, assigning each character as a key.
# Repeated keys overwrite each other.
for(i=1;i<=length($0);i++) {
a[substr($0,i,1)]=1;
}
# Print items in the array.
for(i in a) {
printf("%s",i);
}
# Print a newline after we've gone through our items.
print "";
# Get ready for the next line.
delete a;
}
Of course, the same concept can be implemented pretty easily in pure bash as well:
#!/usr/bin/env bash
while read s; do
declare -A a
while [ -n "$s" ]; do
a[${s:0:1}]=1
s=${s:1}
done
printf "%s" "${!a[#]}"
echo ""
unset a
done < input.txt
Note that this depends on bash 4, due to the associative array. And this one does get things in the original order, because bash does a better job of keeping array keys in order than awk.
And I think you've got a solution using sed from Jose, though it has a bunch of extra pipe-fitting involved. :)
The last tool you mentioned was grep. I'm pretty sure you can't do this in traditional grep, but perhaps some brave soul might be able to construct a perl-regexp variant (i.e. grep -P) using -o and lookarounds. They'd need more coffee than is in me right now though.
One way using perl:
perl -F -lane 'print do { my %seen; grep { !$seen{$_}++ } #F }' file
Results:
1234
1
123
12
Another solution,
while read line; do
grep -o . <<< $line | sort -u | paste -s -d '\0' -;
done < file
grep -o . convert 'row line' to 'column line'
sort -u sort letters and remove repetead letters
paste -s -d '\0' - convert 'column line' to 'row line'
- as a filename argument to paste to tell it to use standard input.
This awk should work:
awk -F '' '{delete a; for(i=1; i<=NF; i++) a[$i]; for (j in a) printf "%s", j; print ""}' file
1234
1
123
12
Here:
-F '' will break the record char by char giving us single character in $1, $2 etc.
Note: For non-gnu awk use:
awk 'BEGIN{FS=""} {delete a; for(i=1; i<=NF; i++) a[$i];
for (j in a) printf "%s", j; print ""}' file
This might work for you (GNU sed):
sed 's/\B/\n/g;s/.*/echo "&"|sort -u/e;s/\n//g' file
Split each line into a series of lines. Unique sort those lines. Combine the result back into a single line.
Unique and sorted alternative to the others, using sed and gnu tools:
sed 's/\(.\)/\1\n/g' file | sort | uniq
which produces one character per line; If you want those on one line, just do:
sed 's/\(.\)/\1\n/g' file | sort | uniq | sed ':a;N;$!ba;s/\n//g;'
This has the advantage of showing the characters in sorted order, rather than order of appearance.

AWK array parsing issue

My two input files are pipe separated.
File 1 :
a|b|c|d|1|44
File 2 :
44|ab|cd|1
I want to store all my values of first file in array.
awk -F\| 'FNR==NR {a[$6]=$0;next}'
So if I store the above way is it possible to interpret array; say I want to know $3 of File 1. How can I get tat from a[].
Also will I be able to access array values if I come out of that awk?
Thanks
I'll answer the question as it is stated, but I have to wonder whether it is complete. You state that you have a second input file, but it doesn't play a role in your actual question.
1) It would probably be most sensible to store the fields individually, as in
awk -F \| '{ for(i = 1; i < NF; ++i) a[$NF,i] = $i } END { print a[44,3] }' filename
See here for details on multidimensional arrays in awk. You could also use the split function:
awk -F \| '{ a[$NF] = $0 } END { split(a[44], fields); print fields[3] }'
but I don't see the sense in it here.
2) No. At most you can print the data in a way that the surrounding shell understands and use command substitution to build a shell array from it, but POSIX shell doesn't know arrays at all, and bash only knows one-dimensional arrays. If you require that sort of functionality, you should probably use a more powerful scripting language such as Perl or Python.
If, any I'm wildly guessing here, you want to use the array built from the first file while processing the second, you don't have to quit awk for this. A common pattern is
awk -F \| 'FNR == NR { for(i = 1; i < NF; ++i) { a[$NF,i] = $i }; next } { code for the second file here }' file1 file2
Here FNR == NR is a condition that is only true when the first file is processed (the number of the record in the current file is the same as the number of the record overall; this is only true in the first file).
To keep it simple, you can reach your goal of storing (and accessing) values in array without using awk:
arr=($(cat yourFilename |tr "|" " ")) #store in array named arr
# accessing individual elements
echo ${arr[0]}
echo ${arr[4]}
# ...or accesing all elements
for n in ${arr[*]}
do
echo "$n"
done
...even though I wonder if that's what you are looking for. Inital question is not really clear.

Resources