How do I join lines using space and comma - bash

I have the file that contains content like:
IP
111
22
25
I want to print the output in the format IP 111,22,25.
I have tried tr ' ' , but its not working

Welcome to paste
$ paste -sd " ," file
IP 111,22,25
Normally what paste does is it writes to standard output lines consisting of sequentially corresponding lines of each given file, separated by a <tab>-character. The option -s does it differently. It states to paste each line of the files sequentially with a <tab>-character as a delimiter. When using the -d flag, you can give a list of delimiters to be used instead of the <tab>-character. Here I gave as a list " ," indicating, use space and then only commas.

In pure Bash:
# Read file into array
mapfile -t lines < infile
# Print to string, comma-separated from second element on
printf -v str '%s %s' "${lines[0]}" "$(IFS=,; echo "${lines[*]:1}")"
# Print
echo "$str"
Output:
IP 111,22,25

I'd go with:
{ read a; read b; read c; read d; } < file
echo "$a $b,$c,$d"
This will also work:
xargs printf "%s %s,%s,%s" < file

Try cat file.txt | tr '\n' ',' | sed "s/IP,/IP /g"
tr deletes new lines, sed changes IP,111,22,25 into IP 111,22,25

The following awk script will do the requested:
awk 'BEGIN{OFS=","} FNR==1{first=$0;next} {val=val?val OFS $0:$0} END{print first FS val}' Input_file
Explanation: Adding explanation for above code now.
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section here of awk program.
OFS="," ##Setting OFS as comma, output field separator.
} ##Closing BEGIN section of awk here.
FNR==1{ ##Checking if line is first line then do following.
first=$0 ##Creating variable first whose value is current first line.
next ##next keyword is awk out of the box keyword which skips all further statements from here.
} ##Closing FNR==1 BLOCK here.
{ ##This BLOCK will be executed for all lines apart from 1st line.
val=val?val OFS $0:$0 ##Creating variable val whose values will be keep concatenating its own value.
}
END{ ##Mentioning awk END block here.
print first FS val ##Printing variable first FS(field separator) and variable val value here.
}' Input_file ##Mentioning Input_file name here which is getting processed by awk.

Using Perl
$ cat captain.txt
IP
111
22
25
$ perl -0777 -ne ' #k=split(/\s+/); print $k[0]," ",join(",",#k[1..$#k]) ' captain.txt
IP 111,22,25
$

Related

issue for condition on unique raws in bash

I want to print rows of a table in a file, the issue is when I use a readline the reprint me the result several times, here is my input file
aa ,DEC ,file1.txt
aa ,CHAR ,file1.txt
cc ,CHAR ,file1.txt
dd ,DEC ,file2.txt
bb ,DEC ,file3.txt
bb ,CHAR ,file3.txt
cc ,DEC ,file1.txt
Here is the result I want to have:
printed in file1.txt
aa#DEC,CHAR
cc#CHAR,DEC
printed in file2.txt
dd#DEC
printed in file3.txt
bb#DEC,CHAR
here is it my attempt :
(cat input.txt|while read line
do
table=`echo $line|cut -d"," -f1
variable=`echo $line|cut -d"," -f2
file=`echo $line|cut -d"," -f3
echo ${table}#${variable},
done ) > ${file}
This can be done in a single pass gnu awk like this:
awk -F ' *, *' '{
map[$3][$1] = (map[$3][$1] == "" ? "" : map[$3][$1] ",") $2
}
END {
for (f in map)
for (d in map[f])
print d "#" map[f][d] > f
}' file
This will populate this data:
=== file1.txt ===
aa#DEC,CHAR
cc#CHAR,DEC
=== file2.txt ===
dd#DEC
=== file3.txt ===
bb#DEC,CHAR
With your shown samples, could you please try following, written and tested in shown samples in GNU awk.
awk '
{
sub(/^,/,"",$3)
}
FNR==NR{
sub(/^,/,"",$2)
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
next
}
(($1,$3) in arr){
close(outputFile)
outputFile=$3
print $1"#"arr[$1,$3] >> (outputFile)
delete arr[$1,$3]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/^,/,"",$3) ##Substituting starting comma in 3rd field with NULL.
}
FNR==NR{ ##Checking condition FNR==NR will be true when first time Input_file is being read.
sub(/^,/,"",$2) ##Substituting starting comma with NULL in 2nd field.
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
##Creating arr with index of 1st and 3rd fields, which has 2nd field as value.
next ##next will skip all further statements from here.
}
(($1,$3) in arr){ ##Checking condition if 1st and 3rd fields are in arr then do following.
close(outputFile) ##Closing output file, to avoid "too many opened files" error.
outputFile=$3 ##Setting outputFile with value of 3rd field.
print $1"#"arr[$1,$3] >> (outputFile)
##printing 1st field # arr value and output it to outputFile here.
delete arr[$1,$3] ##Deleting array element with index of 1st and 3rd field here.
}
' Input_file Input_file ##Mentioning Input_file 2 times here.
You have several errors in your code. You can use the built-in read to split on a comma, and the parentheses are completely unnecessary.
while IFS=, read -r table variable file
do
echo "${table}#${variable}," >>"$file"
done< input.txt
Using $file in a redirect after done is an error; the shell wants to open the file handle to redirect to before file is defined. But as per your requirements, each line should go to a different `file.
Notice also quoting fixes and the omission of the useless cat.
Wrapping fields with the same value onto the same line would be comfortably easy with an Awk postprocessor, but then you might as well do all of this in Awk, as in the other answer you already received.

Extract specific substring in shell

I have a file which contains following line:
ro fstype=sd timeout=10 console=ttymxc1,115200 show=true
I'd like to extract and store fstype attribue "sd" in a variable.
I did the job using bash
IFS=" " read -a args <<< file
for arg in ${args[#]}; do
if [[ "$arg" =~ "fstype" ]]; then
id=$(cut -d "=" -f2 <<< "$arg")
echo $id
fi
done
and following awk command in another shell script:
awk -F " " '{print $2}' file | cut -d '=' -f2
Because 'fstype' argument position and file content can differ, how to do the same things and keep compatibility in shell script ?
Could you please try following.
awk 'match($0,/fstype=[^ ]*/){print substr($0,RSTART+7,RLENGTH-7)}' Input_file
OR more specifically to handle any string before = try following:
awk '
match($0,/fstype=[^ ]*/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print val
val=""
}
' Input_file
With sed:
sed 's/.*fstype=\([^ ]*\).*/\1/' Input_file
awk code's explanation:
awk ' ##Starting awk program from here.
match($0,/fstype=[^ ]*/){ ##Using match function to match regex fstype= till first space comes in current line.
val=substr($0,RSTART,RLENGTH) ##Creating variable val which has sub-string of current line from RSTART to till RLENGTH.
sub(/.*=/,"",val) ##Substituting everything till = in value of val here.
print val ##Printing val here.
val="" ##Nullifying val here.
}
' Input_file ##mentioning Input_file name here.
Any time you have tag=value pairs in your data I find it best to start by creating an array (f[] below) that maps those tags (names) to their values:
$ awk -v tag='fstype' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
sd
$ awk -v tag='console' -F'[ =]' '{for (i=2;i<NF;i+=2) f[$i]=$(i+1); print f[tag]}' file
ttymxc1,115200
With the above approach you can do whatever you like with the data just by referencing it by it's name as the index in the array, e.g.:
$ awk -F'[ =]' '{
for (i=2;i<NF;i+=2) f[$i]=$(i+1)
if ( (f["show"] == "true") && (f["timeout"] < 20) ) {
print f["console"], f["fstype"]
}
}' file
ttymxc1,115200 sd
If your data has more than 1 row and there can be different fields on each row (doesn't appear to be true for your data) then add delete f as the first line of the script.
If the key and value can be matched by the regex fstype=[^ ]*, grep and -o option which extracts matched pattern can be used.
$ grep -o 'fstype=[^ ]*' file
fstype=sd
In addition, regex \K can be used with -P option (please make sure this option is only valid in GNU grep).
Patterns that are to the left of \K are not shown with -o.
Therefore, below expression can extract the value only.
$ grep -oP 'fstype=\K[^ ]*' file
sd

Condition on Nth character of string in a Mth column in bash

I have a sample
$ cat c.csv
a,1234543,c
b,1231456,d
c,1230654,e
I need to grep only numbers where 4th character of 2nd column but not be 0 or 1
Output must be
a,1234543,c
I know this only
awk -F, 'BEGIN { OFS = FS } $2 ~/^[2-9]/' c.csv
Is it possible to put a condition on 4th character?
Could you please try following.
awk 'BEGIN{FS=","} substr($2,4,1)!=0 && substr($2,4,1)!=1' Input_file
OR as per Ed site's suggestion:
awk 'BEGIN{FS=","} substr($2,4,1)!~[01]' Input_file
Explanation: Adding a detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS="," ##Setting field separator as comma here.
} ##Closing BLOCK for this program BEGIN section.
substr($2,4,1)!=0 && substr($2,4,1)!=1 ##Checking conditions if 4th character of current line is NOT 0 and 1 then print the current line.
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed or grep):
grep -vE '^([^,]*,){1}[^,]{3}[01]' file
or:
sed -E '/^([^,]*,){1}[^,]{3}[01]/d' file
Replace the 1 for the m'th-1 column and the 3 for the n'th-1 character in that column.
Grep is the answer.
But here is another way using array and variable substitution
test=( $(cat c.csv) ) # load c.csv data to an array
echo ${test[#]//*,???[0-1]*/} # print all items from an array,
# but remove the ones that correspond to this regex *,???[0-1]*
# so 'b,1231456,d' and 'c,1230654,e' from example will be removed
# and only 'a,1234543,c' will be printed
There are many ways to do this with awk. the most literal form would be:
4th character of 2nd column is not 0 or 1
$ awk -F, '($2 !~ /^...[01]/)' file
$ awk -F, '($2 ~ /^...[^01]/)' file
These will also match a line a,abcdefg,b
2nd column is an integer and 4th character is not 0 or 1
$ awk -F, '($2+0==$2) && ($2!~[.]) && ($2 !~ /^...[01]/)'
$ awk -F, '($2 ~ /^[0-9][0-9][0-9][^01][0-9]*$/)'

how to replace a string at a specific position in a csv file using bash

I have several .csv files and each csv file has lines which look like this.
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"
Expected output
AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA
However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".
How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?
My code looks like this
$file = "name of file which contains list of all csv files"
for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
y=$(echo "$line" | cut -d',' -f 4)
sed -i "s/${y}/ZZ/" $i
fi
done < $i
Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:
sed -i '/^AA/s/[^,][^,]*/ZZ/4' file
Explanation
sed -i call sed to edit file in place;
general form /find/s/match/replace/occurrence; where
find is /^AA/ line beginning with "AA";
match [^,][^,]* a character not a comma followed by any number of non-commas;
replace /ZZ/4 the 4th occurrence of match with "ZZ".
Note, both awk and sed provide good solutions in this case so see the answers by #perreal and #RavinderSingh13
Example Input File
$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
Example Use/Output
(note: -i not used below so the changes are simply output to stdout)
$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
To robustly do this is just:
$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.
If you want to map multiple strings in one pass:
$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA
and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:
awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file
and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.
EDIT1: Since OP has changed requirement a bit do adding following now.
awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1' Input_file > temp_file && mv temp_file Input_file
Could you please try following.
awk -F, '/^AA/{$4="ZZ"} 1' OFS=, Input_file > temp_file && mv temp_file Input_file
OR
awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1' Input_file > temp_file && mv temp_file Input_file
Explanation: Adding explanation to above code too now.
awk '
BEGIN{ ##Starting BEGIN section of awk which will be executed before reading Input_file.
FS=OFS="," ##Setting field separator and output field separator as comma here for all lines of Input_file.
} ##Closing block for BEGIN section of this program.
/^AA/{ ##Checking condition if a line starts from string AA then do following.
$4="ZZ" ##Setting 4th field as ZZ string as per OP.
} ##Closing this condition block here.
1 ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
' Input_file ##Mentioning Input_file name here.
Using sed:
sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

print first 3 characters and / rest of the string with stars

I'have this input like this
John:boofoo
I want to print rest of the string with stars and keep only 3 characters of the string.
The output will be like this
John:boo***
this my command
awk -F ":" '{print $1,$2 ":***"}'
I want to use only print command if possible. Thanks
With GNU sed:
echo 'John:boofoo' | sed -E 's/(:...).*/\1***/'
Output:
John:boo***
With GNU awk for gensub():
$ awk 'BEGIN{FS=OFS=":"} {print $1, substr($2,1,3) gensub(/./,"*","g",substr($2,4))}' file
John:boo***
With any awk:
awk 'BEGIN{FS=OFS=":"} {tl=substr($2,4); gsub(/./,"*",tl); print $1, substr($2,1,3) tl}' file
John:boo***
Could you please try following. This will print stars(keeping only first 3 letters same as it is) how many characters are present in 2nd field after first 3 characters.
awk '
BEGIN{
FS=OFS=":"
}
{
stars=""
val=substr($2,1,3)
for(i=4;i<=length($2);i++){
stars=stars"*"
}
$2=val stars
}
1
' Input_file
Output will be as follows.
John:boo***
Explanation: Adding explanation for above code too here.
awk '
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=":" ##Setting FS and OFS value as : here.
} ##Closing block of BEGIN section here.
{ ##Here starts main block of awk program.
stars="" ##Nullifying variable stars here.
val=substr($2,1,3) ##Creating variable val whose value is 1st 3 letters of 2nd field.
for(i=4;i<=length($2);i++){ ##Starting a for loop from 4(becasue we need to have from 4th character to till last in 2nd field) till length of 2nd field.
stars=stars"*" ##Keep concatenating stars variable to its own value with *.
}
$2=val stars ##Assigning value of variable val and stars to 2nd field here.
}
1 ##Mentioning 1 here to print edited/non-edited lines for Input_file here.
' Input_file ##Mentioning Input_file name here.
Or even with good old sed
$ echo "John:boofoo" | sed 's/...$/***/'
Output:
John:boo***
(note: this just replaces the last 3 characters of any string with "***", so if you need to key off the ':', see the GNU sed answer from Cyrus.)
Another awk variant:
awk -F ":" '{print $1 FS substr($2, 1, 3) "***"}' <<< 'John:boofoo'
John:boo***
Since we have the tags awk, bash and sed: for completeness sake here is a bash only solution:
INPUT="John:boofoo"
printf "%s:%s\n" ${INPUT%%:*} $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}")
It uses two arguments to printf after the format string. The first one is INPUT stripped of by everything uncluding and after the :. Lets break down the second argument $(TMP1=${INPUT#*:};TMP2=${TMP1:3}; echo "${TMP1:0:3}${TMP2//?/*}"):
$(...) the string is interpreted as a bash command its output is substituted as last argument to printf
TMP1=${INPUT#*:}; remove everything up to and including the :, store the string in TMP1.
TMP2=${TMP1:3}; geht all characters of TMP1 from offset 3 to the end and store them in TMP2.
echo "${TMP1:0:3}${TMP2//?/*}" output the temporary strings: the first three chars from TMP1 unmodified and all chars from TMP2 as *
the output of the last echo is the last argument to printf
Here is the bash -x output:
+ INPUT=John:boofoo
++ TMP1=boofoo
++ TMP2=foo
++ echo 'boo***'
+ printf '%s:%s\n' John 'boo***'
John:boo***
Another sed : replace all chars after the third by *
sed -E ':A;s/([^:]*:...)(.*)[^*]([*]*)/\1\2\3*/;tA'
Some more awk
awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' infile
You can use the %* form of printf, which accepts a variable width. And, if you use '0' as your value to print, combined with the right-aligned text that's zero padded on the left..
Better Readable:
awk 'BEGIN{
FS=OFS=":"
}
{
s=sprintf("%0*d",length(substr($2,4)),0);
gsub(/0/,"*",s);
print $1,substr($2,1,3) s
}
' infile
Test Results:
$ awk --version
GNU Awk 3.1.7
Copyright (C) 1989, 1991-2009 Free Software Foundation.
$ cat f
John:boofoo
$ awk 'BEGIN{FS=OFS=":"}{s=sprintf("%0*d",length(substr($2,4)),0); gsub(/0/,"*",s);print $1,substr($2,1,3) s}' f
John:boo***
Another pure Bash, using the builtin regular expression predicate.
input="John:boofoo"
if [[ $input =~ ^([^:]*:...)(.*)$ ]]; then
printf '%s%s\n' "${BASH_REMATCH[1]}" "${BASH_REMATCH[2]//?/*}"
else
echo >&2 "String doesn't match pattern"
fi
We split the string in two parts: the first part being everything up to (and including) the three chars found after the first colon (stored in ${BASH_REMATCH[1]}), the second part being the remaining part of string (stored in ${BASH_REMATCH[2]}). If the string doesn't match this pattern, we just insult the user.
We then print the first part unchanged, and the second part with every character replaced with *.

Resources