print first 3 characters and / rest of the string with stars - bash

I'have this input like this
John:boofoo
I want to print rest of the string with stars and keep only 3 characters of the string.
The output will be like this
John:boo***
this my command
awk -F ":" '{print $1,$2 ":***"}'
I want to use only print command if possible. Thanks

With GNU sed:
echo 'John:boofoo' | sed -E 's/(:...).*/\1***/'
Output:
John:boo***

With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=":"} {print $1, substr($2,1,3) gensub(/./,"*","g",substr($2,4))}' file
John:boo***
With any awk:
awk 'BEGIN{FS=OFS=":"} {tl=substr($2,4); gsub(/./,"*",tl); print $1, substr($2,1,3) tl}' file
John:boo***

Could you please try following. This will print stars(keeping only first 3 letters same as it is) how many characters are present in 2nd field after first 3 characters.
awk '
BEGIN{
FS=OFS=":"
}
{
stars=""
val=substr($2,1,3)
for(i=4;i<=length($2);i++){
stars=stars"*"
}
$2=val stars
}
1
' Input_file
Output will be as follows.
John:boo***
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS value as : here.
} ##Closing block of BEGIN section here.
{ ##Here starts main block of awk program.
stars="" ##Nullifying variable stars here.
val=substr($2,1,3) ##Creating variable val whose value is 1st 3 letters of 2nd field.
for(i=4;i<=length($2);i++){ ##Starting a for loop from 4(becasue we need to have from 4th character to till last in 2nd field) till length of 2nd field.
stars=stars"*" ##Keep concatenating stars variable to its own value with *.
}
$2=val stars ##Assigning value of variable val and stars to 2nd field here.
}
1 ##Mentioning 1 here to print edited/non-edited lines for Input_file here.
' Input_file ##Mentioning Input_file name here.

Or even with good old sed
$ echo "John:boofoo" | sed 's/...$/***/'
Output:
John:boo***
(note: this just replaces the last 3 characters of any string with "***", so if you need to key off the ':', see the GNU sed answer from Cyrus.)

Another awk variant:
awk -F ":" '{print $1 FS substr($2, 1, 3) "***"}' <<< 'John:boofoo'
John:boo***

Since we have the tags awk, bash and sed: for completeness sake here is a bash only solution:
INPUT="John:boofoo"
printf "%s:%s\n" ${INPUT%%:*} $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}")
It uses two arguments to printf after the format string. The first one is INPUT stripped of by everything uncluding and after the :. Lets break down the second argument $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}"):
$(...) the string is interpreted as a bash command its output is substituted as last argument to printf
TMP1=${INPUT#*:}; remove everything up to and including the :, store the string in TMP1.
TMP2=${TMP1:3}; geht all characters of TMP1 from offset 3 to the end and store them in TMP2.
echo "${TMP1:0:3}${TMP2//?/*}" output the temporary strings: the first three chars from TMP1 unmodified and all chars from TMP2 as *
the output of the last echo is the last argument to printf
Here is the bash -x output:
+ INPUT=John:boofoo
++ TMP1=boofoo
++ TMP2=foo
++ echo 'boo***'
+ printf '%s:%s\n' John 'boo***'
John:boo***

Another sed : replace all chars after the third by *
sed -E ':A;s/([^:]*:...)(.*)[^*]([*]*)/\1\2\3*/;tA'

Some more awk
awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' infile
You can use the %* form of printf, which accepts a variable width. And, if you use '0' as your value to print, combined with the right-aligned text that's zero padded on the left..
Better Readable:
awk 'BEGIN{
FS=OFS=":"
}
{
s=sprintf("%0*d",length(substr($2,4)),0);
gsub(/0/,"*",s);
print $1,substr($2,1,3) s
}
' infile
Test Results:
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat f
John:boofoo
$ awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' f
John:boo***

Another pure Bash, using the builtin regular expression predicate.
input="John:boofoo"
if [[ $input =~ ^([^:]*:...)(.*)$ ]]; then
printf '%s%s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//?/*}"
else
echo >&2 "String doesn't match pattern"
fi
We split the string in two parts: the first part being everything up to (and including) the three chars found after the first colon (stored in ${BASH_REMATCH[1]}), the second part being the remaining part of string (stored in ${BASH_REMATCH[2]}). If the string doesn't match this pattern, we just insult the user.
We then print the first part unchanged, and the second part with every character replaced with *.

Related

AWK Finding a way to print lines containing a word from a comma separated string

I want to write a bash script that only prints lines that, on their second column, contain a word from a comma separated string. Example:
words="abc;def;ghi;jkl"
>cat log1.txt
hello;abc;1234
house;ab;987
mouse;abcdef;654
What I want is to print only lines that contain a whole word from the "words" variable. That means that "ab" won't match, neither will "abcdef". It seems so simple yet after trying for manymany hours, I was unable to find a solution.
For example, I tried this as my awk command, but it matched any substring.
-F \; -v b="TSLA;NVDA" 'b ~ $2 { print $0 }'
I will appreciate any help. Thank you.
EDIT:
A sample input would look like this
1;UNH;buy;344.74
2;PG;sell;138.60
3;MSFT;sell;237.64
4;TSLA;sell;707.03
A variable like this would be set
filter="PG;TSLA"
And according to this filter, I want to echo these lines
2;PG;sell;138.60
4;TSLA;sell;707.03
Grep is a good choice here:
grep -Fw -f <(tr ';' '\n' <<<"$words") log1.txt
With awk I'd do
awk -F ';' -v w="$words" '
BEGIN {
n = split(w, a, /;/)
# next line moves the words into the _index_ of an array,
# to make the file processing much easier and more efficient
for (i=1; i<=n; i++) words[a[i]]=1
}
$2 in words
' log1.txt
You may use this awk:
words="abc;def;ghi;jkl"
awk -F';' -v s=";$words;" 'index(s, FS $2 FS)' log1.txt
hello;abc;1234

Condition on Nth character of string in a Mth column in bash

I have a sample
$ cat c.csv
a,1234543,c
b,1231456,d
c,1230654,e
I need to grep only numbers where 4th character of 2nd column but not be 0 or 1
Output must be
a,1234543,c
I know this only
awk -F, 'BEGIN { OFS = FS } $2 ~/^[2-9]/' c.csv
Is it possible to put a condition on 4th character?
Could you please try following.
awk 'BEGIN{FS=","} substr($2,4,1)!=0 && substr($2,4,1)!=1' Input_file
OR as per Ed site's suggestion:
awk 'BEGIN{FS=","} substr($2,4,1)!~[01]' Input_file
Explanation: Adding a detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS="," ##Setting field separator as comma here.
} ##Closing BLOCK for this program BEGIN section.
substr($2,4,1)!=0 && substr($2,4,1)!=1 ##Checking conditions if 4th character of current line is NOT 0 and 1 then print the current line.
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed or grep):
grep -vE '^([^,]*,){1}[^,]{3}[01]' file
or:
sed -E '/^([^,]*,){1}[^,]{3}[01]/d' file
Replace the 1 for the m'th-1 column and the 3 for the n'th-1 character in that column.
Grep is the answer.
But here is another way using array and variable substitution
test=( $(cat c.csv) ) # load c.csv data to an array
echo ${test[#]//*,???[0-1]*/} # print all items from an array,
# but remove the ones that correspond to this regex *,???[0-1]*
# so 'b,1231456,d' and 'c,1230654,e' from example will be removed
# and only 'a,1234543,c' will be printed
There are many ways to do this with awk. the most literal form would be:
4th character of 2nd column is not 0 or 1
$ awk -F, '($2 !~ /^...[01]/)' file
$ awk -F, '($2 ~ /^...[^01]/)' file
These will also match a line a,abcdefg,b
2nd column is an integer and 4th character is not 0 or 1
$ awk -F, '($2+0==$2) && ($2!~[.]) && ($2 !~ /^...[01]/)'
$ awk -F, '($2 ~ /^[0-9][0-9][0-9][^01][0-9]*$/)'

How to print keys from all key-value pairs

Text file looks like this:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3
How can I extract keys so that:
key11|key12|key13
key21|key22|key23
I have tried unsuccessfully :
awk '{ gsub(/[^[|]=]+=/,"") }1' file.txt
gives back the actual data:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3
Since you tagged bash
while IFS='=|' read -ra words; do
n=${#words[#]}
for ((i=1; i<n; i+=2)); do
unset words[i]
done
( IFS='|'; echo "${words[*]}" )
done < file
gawk
This can be done by awk, by setting FS and OFS :
kent$ awk -F'=[^|]*' -v OFS="" '$1=$1' file
key11|key12|key13
key21|key22|key23
or safer: awk -F.... '{$1=$1}1' file
substitution (by sed for example):
kent$ sed 's/=[^|]*//g' file
key11|key12|key13
key21|key22|key23
Here's one solution
echo "key11=val1|key12=val2|key13=val3" \
| awk -F'[=|]' '{
for (i=1;i<=NF;i+=2){
printf("%s%s", $i, (i<(NF-1))?"|":"")
}
print""
}'
output
key11|key12|key13
It should also work by passing in the filename as an argument to awk, i.e.
awk -F'[=|]' '{for (i=1;i<=NF;i+=2){printf("%s%s", $i, (i<(NF-1))?"|":"") }print""}' file1 [file_more_as_will_fit]
Discussion
We use a multiple character value for FS (FieldSeperator) so each = and | char mark the beginning of a new field.
-F'[=|]'
Because we know we want to start with field1 for output and skip every other field, we use
for (i=1;i<=NF;i+=2)
printf formats the output as defined by the format string '%s%s' . There area a zillion options available for printf format strs, but you only need the value for $i (the looping value that generates the key) and whether to print a | char or not.
printf("%s%s", $i ...)
And we use awk's ternary operator, which evaluates what element number is being processed (i<..). As long as it is not the 2nd to last field, the | char is emitted.
(i<(NF-1))?"|":""
IHTH
sed
I did this with sed:
sed -r 's/([[:alnum:]]*)=[[:alnum:]]*/\1/g' < file.txt
tested here and got:
key11|key12|key13
key21|key22|key23
s/<pattern>/<subst>/ means "replace <pattern> by <subst>", and with the g in the end it will do it for every pattern found in the line.
The [[:alnum:]]* is equivalent to [0-9a-zA-Z]*, and means any number of letters or digits.
The first pattern between parentesis will correspond to \1 in the substitution, the second \2 and so on.
So, it will match every "key=value" and replace it by "key".
awk -F'[=|]' '{print $1,$3,$5}' OFS="|" file
key11|key12|key13
key21|key22|key23

How do I join lines using space and comma

I have the file that contains content like:
IP
111
22
25
I want to print the output in the format IP 111,22,25.
I have tried tr ' ' , but its not working
Welcome to paste
$ paste -sd " ," file
IP 111,22,25
Normally what paste does is it writes to standard output lines consisting of sequentially corresponding lines of each given file, separated by a <tab>-character. The option -s does it differently. It states to paste each line of the files sequentially with a <tab>-character as a delimiter. When using the -d flag, you can give a list of delimiters to be used instead of the <tab>-character. Here I gave as a list " ," indicating, use space and then only commas.
In pure Bash:
# Read file into array
mapfile -t lines < infile
# Print to string, comma-separated from second element on
printf -v str '%s %s' "${lines[0]}" "$(IFS=,; echo "${lines[*]:1}")"
# Print
echo "$str"
Output:
IP 111,22,25
I'd go with:
{ read a; read b; read c; read d; } < file
echo "$a $b,$c,$d"
This will also work:
xargs printf "%s %s,%s,%s" < file
Try cat file.txt | tr '\n' ',' | sed "s/IP,/IP /g"
tr deletes new lines, sed changes IP,111,22,25 into IP 111,22,25
The following awk script will do the requested:
awk 'BEGIN{OFS=","} FNR==1{first=$0;next} {val=val?val OFS $0:$0} END{print first FS val}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here of awk program.
OFS="," ##Setting OFS as comma, output field separator.
} ##Closing BEGIN section of awk here.
FNR==1{ ##Checking if line is first line then do following.
first=$0 ##Creating variable first whose value is current first line.
next ##next keyword is awk out of the box keyword which skips all further statements from here.
} ##Closing FNR==1 BLOCK here.
{ ##This BLOCK will be executed for all lines apart from 1st line.
val=val?val OFS $0:$0 ##Creating variable val whose values will be keep concatenating its own value.
}
END{ ##Mentioning awk END block here.
print first FS val ##Printing variable first FS(field separator) and variable val value here.
}' Input_file ##Mentioning Input_file name here which is getting processed by awk.
Using Perl
$ cat captain.txt
IP
111
22
25
$ perl -0777 -ne ' #k=split(/\s+/); print $k[0]," ",join(",",#k[1..$#k]) ' captain.txt
IP 111,22,25
$

how to replace a string at a specific position in a csv file using bash

I have several .csv files and each csv file has lines which look like this.
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"
Expected output
AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA
However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".
How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?
My code looks like this
$file = "name of file which contains list of all csv files"
for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
y=$(echo "$line" | cut -d',' -f 4)
sed -i "s/${y}/ZZ/" $i
fi
done < $i
Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:
sed -i '/^AA/s/[^,][^,]*/ZZ/4' file
Explanation
sed -i call sed to edit file in place;
general form /find/s/match/replace/occurrence; where
find is /^AA/ line beginning with "AA";
match [^,][^,]* a character not a comma followed by any number of non-commas;
replace /ZZ/4 the 4th occurrence of match with "ZZ".
Note, both awk and sed provide good solutions in this case so see the answers by #perreal and #RavinderSingh13
Example Input File
$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
Example Use/Output
(note: -i not used below so the changes are simply output to stdout)
$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
To robustly do this is just:
$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.
If you want to map multiple strings in one pass:
$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA
and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:
awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file
and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.
EDIT1: Since OP has changed requirement a bit do adding following now.
awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1' Input_file > temp_file && mv temp_file Input_file
Could you please try following.
awk -F, '/^AA/{$4="ZZ"} 1' OFS=, Input_file > temp_file && mv temp_file Input_file
OR
awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1' Input_file > temp_file && mv temp_file Input_file
Explanation: Adding explanation to above code too now.
awk '
BEGIN{ ##Starting BEGIN section of awk which will be executed before reading Input_file.
FS=OFS="," ##Setting field separator and output field separator as comma here for all lines of Input_file.
} ##Closing block for BEGIN section of this program.
/^AA/{ ##Checking condition if a line starts from string AA then do following.
$4="ZZ" ##Setting 4th field as ZZ string as per OP.
} ##Closing this condition block here.
1 ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
' Input_file ##Mentioning Input_file name here.
Using sed:
sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

Resources