How to print keys from all key-value pairs - bash

Text file looks like this:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3
How can I extract keys so that:
key11|key12|key13
key21|key22|key23
I have tried unsuccessfully :
awk '{ gsub(/[^[|]=]+=/,"") }1' file.txt
gives back the actual data:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3

Since you tagged bash
while IFS='=|' read -ra words; do
n=${#words[#]}
for ((i=1; i<n; i+=2)); do
unset words[i]
done
( IFS='|'; echo "${words[*]}" )
done < file

gawk
This can be done by awk, by setting FS and OFS :
kent$ awk -F'=[^|]*' -v OFS="" '$1=$1' file
key11|key12|key13
key21|key22|key23
or safer: awk -F.... '{$1=$1}1' file
substitution (by sed for example):
kent$ sed 's/=[^|]*//g' file
key11|key12|key13
key21|key22|key23

Here's one solution
echo "key11=val1|key12=val2|key13=val3" \
| awk -F'[=|]' '{
for (i=1;i<=NF;i+=2){
printf("%s%s", $i, (i<(NF-1))?"|":"")
}
print""
}'
output
key11|key12|key13
It should also work by passing in the filename as an argument to awk, i.e.
awk -F'[=|]' '{for (i=1;i<=NF;i+=2){printf("%s%s", $i, (i<(NF-1))?"|":"") }print""}' file1 [file_more_as_will_fit]
Discussion
We use a multiple character value for FS (FieldSeperator) so each = and | char mark the beginning of a new field.
-F'[=|]'
Because we know we want to start with field1 for output and skip every other field, we use
for (i=1;i<=NF;i+=2)
printf formats the output as defined by the format string '%s%s' . There area a zillion options available for printf format strs, but you only need the value for $i (the looping value that generates the key) and whether to print a | char or not.
printf("%s%s", $i ...)
And we use awk's ternary operator, which evaluates what element number is being processed (i<..). As long as it is not the 2nd to last field, the | char is emitted.
(i<(NF-1))?"|":""
IHTH

sed
I did this with sed:
sed -r 's/([[:alnum:]]*)=[[:alnum:]]*/\1/g' < file.txt
tested here and got:
key11|key12|key13
key21|key22|key23
s/<pattern>/<subst>/ means "replace <pattern> by <subst>", and with the g in the end it will do it for every pattern found in the line.
The [[:alnum:]]* is equivalent to [0-9a-zA-Z]*, and means any number of letters or digits.
The first pattern between parentesis will correspond to \1 in the substitution, the second \2 and so on.
So, it will match every "key=value" and replace it by "key".

awk -F'[=|]' '{print $1,$3,$5}' OFS="|" file
key11|key12|key13
key21|key22|key23

Related

AWK Finding a way to print lines containing a word from a comma separated string

I want to write a bash script that only prints lines that, on their second column, contain a word from a comma separated string. Example:
words="abc;def;ghi;jkl"
>cat log1.txt
hello;abc;1234
house;ab;987
mouse;abcdef;654
What I want is to print only lines that contain a whole word from the "words" variable. That means that "ab" won't match, neither will "abcdef". It seems so simple yet after trying for manymany hours, I was unable to find a solution.
For example, I tried this as my awk command, but it matched any substring.
-F \; -v b="TSLA;NVDA" 'b ~ $2 { print $0 }'
I will appreciate any help. Thank you.
EDIT:
A sample input would look like this
1;UNH;buy;344.74
2;PG;sell;138.60
3;MSFT;sell;237.64
4;TSLA;sell;707.03
A variable like this would be set
filter="PG;TSLA"
And according to this filter, I want to echo these lines
2;PG;sell;138.60
4;TSLA;sell;707.03
Grep is a good choice here:
grep -Fw -f <(tr ';' '\n' <<<"$words") log1.txt
With awk I'd do
awk -F ';' -v w="$words" '
BEGIN {
n = split(w, a, /;/)
# next line moves the words into the _index_ of an array,
# to make the file processing much easier and more efficient
for (i=1; i<=n; i++) words[a[i]]=1
}
$2 in words
' log1.txt
You may use this awk:
words="abc;def;ghi;jkl"
awk -F';' -v s=";$words;" 'index(s, FS $2 FS)' log1.txt
hello;abc;1234

print first 3 characters and / rest of the string with stars

I'have this input like this
John:boofoo
I want to print rest of the string with stars and keep only 3 characters of the string.
The output will be like this
John:boo***
this my command
awk -F ":" '{print $1,$2 ":***"}'
I want to use only print command if possible. Thanks
With GNU sed:
echo 'John:boofoo' | sed -E 's/(:...).*/\1***/'
Output:
John:boo***
With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=":"} {print $1, substr($2,1,3) gensub(/./,"*","g",substr($2,4))}' file
John:boo***
With any awk:
awk 'BEGIN{FS=OFS=":"} {tl=substr($2,4); gsub(/./,"*",tl); print $1, substr($2,1,3) tl}' file
John:boo***
Could you please try following. This will print stars(keeping only first 3 letters same as it is) how many characters are present in 2nd field after first 3 characters.
awk '
BEGIN{
FS=OFS=":"
}
{
stars=""
val=substr($2,1,3)
for(i=4;i<=length($2);i++){
stars=stars"*"
}
$2=val stars
}
1
' Input_file
Output will be as follows.
John:boo***
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS value as : here.
} ##Closing block of BEGIN section here.
{ ##Here starts main block of awk program.
stars="" ##Nullifying variable stars here.
val=substr($2,1,3) ##Creating variable val whose value is 1st 3 letters of 2nd field.
for(i=4;i<=length($2);i++){ ##Starting a for loop from 4(becasue we need to have from 4th character to till last in 2nd field) till length of 2nd field.
stars=stars"*" ##Keep concatenating stars variable to its own value with *.
}
$2=val stars ##Assigning value of variable val and stars to 2nd field here.
}
1 ##Mentioning 1 here to print edited/non-edited lines for Input_file here.
' Input_file ##Mentioning Input_file name here.
Or even with good old sed
$ echo "John:boofoo" | sed 's/...$/***/'
Output:
John:boo***
(note: this just replaces the last 3 characters of any string with "***", so if you need to key off the ':', see the GNU sed answer from Cyrus.)
Another awk variant:
awk -F ":" '{print $1 FS substr($2, 1, 3) "***"}' <<< 'John:boofoo'
John:boo***
Since we have the tags awk, bash and sed: for completeness sake here is a bash only solution:
INPUT="John:boofoo"
printf "%s:%s\n" ${INPUT%%:*} $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}")
It uses two arguments to printf after the format string. The first one is INPUT stripped of by everything uncluding and after the :. Lets break down the second argument $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}"):
$(...) the string is interpreted as a bash command its output is substituted as last argument to printf
TMP1=${INPUT#*:}; remove everything up to and including the :, store the string in TMP1.
TMP2=${TMP1:3}; geht all characters of TMP1 from offset 3 to the end and store them in TMP2.
echo "${TMP1:0:3}${TMP2//?/*}" output the temporary strings: the first three chars from TMP1 unmodified and all chars from TMP2 as *
the output of the last echo is the last argument to printf
Here is the bash -x output:
+ INPUT=John:boofoo
++ TMP1=boofoo
++ TMP2=foo
++ echo 'boo***'
+ printf '%s:%s\n' John 'boo***'
John:boo***
Another sed : replace all chars after the third by *
sed -E ':A;s/([^:]*:...)(.*)[^*]([*]*)/\1\2\3*/;tA'
Some more awk
awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' infile
You can use the %* form of printf, which accepts a variable width. And, if you use '0' as your value to print, combined with the right-aligned text that's zero padded on the left..
Better Readable:
awk 'BEGIN{
FS=OFS=":"
}
{
s=sprintf("%0*d",length(substr($2,4)),0);
gsub(/0/,"*",s);
print $1,substr($2,1,3) s
}
' infile
Test Results:
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat f
John:boofoo
$ awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' f
John:boo***
Another pure Bash, using the builtin regular expression predicate.
input="John:boofoo"
if [[ $input =~ ^([^:]*:...)(.*)$ ]]; then
printf '%s%s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//?/*}"
else
echo >&2 "String doesn't match pattern"
fi
We split the string in two parts: the first part being everything up to (and including) the three chars found after the first colon (stored in ${BASH_REMATCH[1]}), the second part being the remaining part of string (stored in ${BASH_REMATCH[2]}). If the string doesn't match this pattern, we just insult the user.
We then print the first part unchanged, and the second part with every character replaced with *.

Iterative replacement of substrings in bash

I'm trying to write a simple script to make several replacements in a big text file. I've a "map" file which contains the records to be searched and replaced,one per line,separated by a space, and a "input" file where I need the changes to be done. The examples files and the script I wrote are beneath.
Map file
new_0 old_0
new_1 old_1
new_2 old_2
new_3 old_3
new_4 old_4
Input file
itsa(old_0)single(old_2)string(old_1)with(old_5)ocurrences(old_4)ofthe(old_3)records
Script
#!/bin/bash
while read -r mapline ; do
mapf1=`awk 'BEGIN {FS=" "} {print $1}' <<< "$mapline"`
mapf2=`awk 'BEGIN {FS=" "} {print $2}' <<< "$mapline"`
for line in $(cat "input") ; do
if [[ "${line}" == *"${mapf2}"* ]] ; then
sed "s/${mapf2}/${mapf1}/g" <<< "${line}"
fi
done < "input"
done < "map"
The thing is that the searches and replaces are made correctly, but I can't find a way to save the output of each iteration and work over it in the next. So, my output looks like this:
itsa(new_0)single(old_2)string(old_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(old_2)string(new_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(new_2)string(old_1)withocurrences(old_4)ofthe(old_3)records
itsa(old_0)single(old_2)string(old_1)withocurrences(old_4)ofthe(new_3)records
itsa(old_0)single(old_2)string(old_1)withocurrences(new_4)ofthe(old_3)records
Yet, the desired output would look like this:
itsa(new_0)single(new_2)string(new_1)withocurrences(new_4)ofthe(new_3)records
May anyone bring some light in this darkly waters??? Thanks in advance!
Improving the existing script
Improvements:
Use "$()" instead of ``. It supports whitespace and is easier to read.
Don't execute sed for each line. sed already loops over all lines and is faster than a loop in bash.
The adapted script:
text="$(< input)"
while read -r mapline; do
mapf1="$(awk 'BEGIN {FS=" "} {print $1}' <<< "$mapline")"
mapf2="$(awk 'BEGIN {FS=" "} {print $2}' <<< "$mapline")"
text="$(sed "s/${mapf2}/${mapf1}/g" <<< "$text")"
done < "map"
echo "$text"
The variable $text contains the complete input file and is modified in each iteration. The output of this script is the file after all replacements were done.
Alternative approach
Convert the map file into a pattern for sed and execute sed just once using that pattern.
pattern="$(sed 's#\(.*\) \(.*\)#s/\2/\1/g#' map)"
sed "$pattern" input
The first command is the conversion step. The file
new_0 old_0
new_1 old_1
...
will result in the pattern
s/old_0/new_0/g
s/old_1/new_1/g
...
It is possible in GNU Awk as follows,
awk 'FNR==NR{hash[$2]=$1; next} \
{for (i=1; i<=NF; i++)\
{for(key in hash) \
{if (match ($i,key)) {$i=sprintf("(%s)",hash[key];break;)}}}print}' \
map-file FS='[()]' OFS= input-file
produces an output as,
itsa(new_0)single(new_2)string(new_1)withold_5ocurrences(new_4)ofthe(new_3)records
Another in Gnu awk, using split and ternary operator(s):
$ awk '
NR==FNR { a[$2]=$1; next }
{
n=split($0,b,"[()]")
for(i=1;i<=n;i++)
printf "%s%s",(i%2 ? b[i] : (b[i] in a? "(" a[b[i]] ")":"")),(i==n?ORS:"")
}' map foo
itsa(new_0)single(new_2)string(new_1)withocurrences(new_4)ofthe(new_3)records
First you read in the map to a hash. When processing the file, split all records by ( and ). Every other could be in the map (i%2==0). While printfing test with ternary operator if matches are found from a and when there is a match, output it parenthesized.

Repeatly replace a delimiter at a given count (4), with another character

Given this line:
12,34,56,47,56,34,56,78,90,12,12,34,45
If the count of the commas(,) is greater than four, replace 4th comma(,) with ||.
If the count is lesser or equal to 4 no need replace the comma(,).
I am able to find the count by the following awk:
awk -F\, '{print NF-1}' text.txt
then I used an if condition to check if the result is greater than 4. But unable to replace 4th comma with ||
Find the count of the delimiter in a line and replace the particular position with another character.
Update:
I want to replace comma with || symbol after every 4th occurrence of the comma. Sorry for the confusion.
Expected output:
12,34,56,47||56,34,56,78||90,12,12,34||45
With GNU awk for gensub():
$ echo '12,34,56,47,56,34' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47||56,34
$ echo '12,34,56,47,56' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47,56
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed 's/,/||/4'
12,34,56,47||56,34,56,78,90,12,12,34,45
$ echo 12,34,56,47 | sed 's/,/||/4'
12,34,56,47
Should work with any POSIX sed
Update:
For the updated question you can use
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed -e 's/\(\([^,]*,\)\{3\}[^,]*\),/\1||/g'
12,34,56,47||56,34,56,78||90,12,12,34||45
Unfortunately, POSIX sed's s command can take either a number or g as a flag, but not both. GNU sed allows the combination, but it does not do what we want in this case. So you have to spell it out in the regular expression.
Using awk you can do:
s='12,34,56,47,56,34,56,78,90,12,12,34,45'
awk -F, '{for (i=1; i<NF; i++) printf "%s%s", $i, (i%4?FS:"||"); print $i}' <<< "$s"
12,34,56,47||56,34,56,78||90,12,12,34||45
if the count is greater than four i want to replace 4th comma(,) with
||
give this line a try (gnu sed):
sed -r '/([^,]*,){4}.*,/s/,/||/4' file
test:
kent$ echo ",,,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,||,
kent$ echo ",,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,,
kent$ echo ",,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,
with awk
awk -F, 'NF-1>4{for(i=1;i<NF;i++){if(i==4)k=k$i"||";else k=k$i","} print k$NF}' filename

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Resources