Awk to Add [ ] in special characters - bash

I am trying to add [] if the column contains special characters or number except for a comma , at the end. The first line needs to be as it is in the file.
Current:
CREATE TEST
a,
b,
23_test,
Expectation:
CREATE TEST
a,
b,
[23_test],

Assuming that the special characters are digits, whitespaces, minus signs, plus signs, dots, and underscores (please modify the patten according to your definition), how about:
awk 'NR>1 && /[-0-9_+. ]/ {$0 = "[" gensub(",$", "", 1) "],"} {print}' input.txt
If you can be specific that the special characters are any characters other than alphabets and commas, try instead:
awk 'NR>1 && /[^a-zA-Z,]/ {$0 = "[" gensub(",$", "", 1) "],"} {print}' input.txt
Hope this helps.

awk 'NR>1 && /[-0-9_+. ]/ {$0 = "[" gensub(",$", "", 1) "],"} {print}' <filename.out> | sed 's/, ]/]/'

awk '{sub(/23_test/,"[23_test]")}1' file
CREATE TEST
a,
b,
[23_test],

Related

AWK print if found three matches, one false

There is several lines in a file that looks like:
A B C H
A B C D
and, I want to print all lines that contain this RE:
/A\tB/
But, if the line contain and H in the fourth field, do not print, the output would be:
A B C D
It could be written in one line in sed, awk or grep?
The only thing that I know is:
awk '/^A\tB/'
This will work:
awk '$1$2 == "AB" && $4 != "H"' file
If all entries are single characters this will also work:
awk '$1$2$3$4 ~ /^AB.[^H]/' file
With awk one-liner:
awk -F'\t' '$1=="A" && $2=="B" && $4!="H"' file
-F'\t' - tab char \t is treated as field separator
The output:
A B C D
This might work for you (GNU sed):
sed '/^A\tB\t.\t[^H]/!d' file
If a line does not contain A ,B ,any character and a character other than H separated by tabs, delete it.
Could be written:
sed -n '/^A\tB\t.\t[^H]/p' file
Use this.
awk '/^A\tB/ { if ( $4 != "H" ) print }'

How to replace the empty place with next line content in shell script

1,n1,abcd,1234
2,n2,abrt,5666
,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
,k1,yyyy,5234
4,22,yyyy,5234
the above given is my input file abc.txt , all I want the missing first column value should fill with next row first value.
example:
3,h2,yyyy,123x
3,h2,yyyy,123y
I want output like below,
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x// the missing first column value 3 should fill with second row first value
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
How to implement this with help of AWK or some other alternate in shell script,please help.
Using awk you can do:
awk -F, '$1 ~ /^ *$/ {
p=p RS $0
next
}
p!="" {
gsub(RS " +", RS $1, p)
sub("^" RS, "", p)
print p
p=""
} 1' file
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
I would reverse the file, and then replace the value from the previous line:
tac filename | awk -F, '$1 ~ /^[[:blank:]]*$/ {$1 = prev} {print; prev=$1}' | tac
This will also fill in missing values on multiple lines.
With GNU sed:
$ sed '/^ ,/{N;s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/}' infile
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
The sed command does the following:
/^ ,/ { # If the line starts with 'space comma'
N # Append the next line
# Extract the value before the comma, prepend to first line
s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/
}
BSD sed would require an extra semicolon before the closing brace.
This only works with non-contiguous lines with missing values.

Use sed to replace percent signs with brackets

I need to replace text surrounded by percent signs with brackets, eg:
This %is% a %test%
should become
This {is} a {test}
I tried: sed 's/\%([^]]*)\%/{\1}/g'
But that resulted in:
This {is% a %test}
Try this:
$ echo "This %is% a %test%" | sed -e 's/%\([^%]*\)%/{\1}/g'
This {is} a {test}
you need to escape the groups: \(...\) (otherwise you get invalid reference \1 on 's' command's RHS)
use [^%]* to match anything but %
you don't need to escape % (but it works with \% aswell).
I would suggest using awk instead:
s='This %is% a %test%'
awk -F'%' '{for (i=1; i<NF; i++) p = p $i (i%2 ? "{" : "}"); print p $NF}' <<< "$s"
This {is} a {test}

Modify content inside quotation marks, BASH

Good day to all,
I was wondering how to modify the content inside quotation marks and left unmodified the outside.
Input line:
,,,"Investigacion,,, desarrollo",,,
Output line:
,,,"Investigacion, desarrollo",,,
Initial try:
sed 's/\"",,,""*/,/g'
But nothing happens, thanks in advance for any clue
The idiomatic awk way to do this is simply:
$ awk 'BEGIN{FS=OFS="\""} {sub(/,+/,",",$2)} 1' file
,,,"Investigacion, desarrollo",,,
or if you can have more than one set of quoted strings on each line:
$ cat file
,,,"Investigacion,,, desarrollo",,,"foo,,,,bar",,,
$ awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) sub(/,+/,",",$i)} 1' file
,,,"Investigacion, desarrollo",,,"foo,bar",,,
This approach works because everything up to the first " is field 1, and everything from there to the second " is field 2 and so on so everything between "s is the even-numbered fields. It can only fail if you have newlines or escaped double quotes inside your fields but that'd affect every other possible solution too so you'd need to add cases like that to your sample input if you want a solution that handles it.
Using a language that has built-in CSV parsing capabilities like perl will help.
perl -MText::ParseWords -ne '
print join ",", map { $_ =~ s/,,,/,/; $_ } parse_line(",", 1, $_)
' file
,,,"Investigacion, desarrollo",,,
Text::ParseWords is a core module so you don't need to download it from CPAN. Using the parse_line method we set the delimiter and a flag to keep the quotes. Then just do simple substitution and join the line to make your CSV again.
Using egrep, sed and tr:
s=',,,"Investigacion,,, desarrollo",,,'
r=$(egrep -o '"[^"]*"|,' <<< "$s"|sed '/^"/s/,\{2,\}/,/g'|tr -d "\n")
echo "$r"
,,,"Investigacion, desarrollo",,,
Using awk:
awk '{ p = ""; while (match($0, /"[^"]*,{2,}[^"]*"/)) { t = substr($0, RSTART, RLENGTH); gsub(/,+/, ",", t); p = p substr($0, 1, RSTART - 1) t; $0 = substr($0, RSTART + RLENGTH); }; $0 = p $0 } 1'
Test:
$ echo ',,,"Investigacion,,, desarrollo",,,' | awk ...
,,,"Investigacion, desarrollo",,,
$ echo ',,,"Investigacion,,, desarrollo",,,",,, "' | awk ...
,,,"Investigacion, desarrollo",,,", "

Awk consider double quoted string as one token and ignore space in between

Data file - data.txt:
ABC "I am ABC" 35 DESC
DEF "I am not ABC" 42 DESC
cat data.txt | awk '{print $2}'
will result the "I" instead of the string being quoted
How to make awk so that it ignore the space within the quote and think that it is one single token?
Another alternative would be to use the FPAT variable, that defines a regular expression describing the contents of each field.
Save this AWK script as parse.awk:
#!/bin/awk -f
BEGIN {
FPAT = "([^ ]+)|(\"[^\"]+\")"
}
{
print $2
}
Make it executable with chmod +x ./parse.awk and parse your data file as ./parse.awk data.txt:
"I am ABC"
"I am not ABC"
Yes, this can be done nicely in awk. It's easy to get all the fields without any serious hacks.
(This example works in both The One True Awk and in gawk.)
{
split($0, a, "\"")
$2 = a[2]
$3 = $(NF - 1)
$4 = $NF
print "and the fields are ", $1, "+", $2, "+", $3, "+", $4
}
Try this:
$ cat data.txt | awk -F\" '{print $2}'
I am ABC
I am not ABC
The top answer for this question only works for lines with a single quoted field. When I found this question I needed something that could work for an arbitrary number of quoted fields.
Eventually I came upon an answer by Wintermute in another thread, and he provided a good generalized solution to this problem. I've just modified it to remove the quotes. Note that you need to invoke awk with -F\" when running the below program.
BEGIN { OFS = "" } {
for (i = 1; i <= NF; i += 2) {
gsub(/[ \t]+/, ",", $i)
}
print
}
This works by observing that every other element in the array will be inside of the quotes when you separate by the "-character, and so it replaces the whitespace dividing the ones not in quotes with a comma.
You can then easily chain another instance of awk to do whatever processing you need (just use the field separator switch again, -F,).
Note that this might break if the first field is quoted - I haven't tested it. If it does, though, it should be easy to fix by adding an if statement to start at 2 rather than 1 if the first character of the line is a ".
I've scrunched up together a function that re-splits $0 into an array called B. Spaces between double quotes are not acting as field separators. Works with any number of fields, a mix of quoted and unquoted ones. Here goes:
#!/usr/bin/gawk -f
# Resplit $0 into array B. Spaces between double quotes are not separators.
# Single quotes not handled. No escaping of double quotes.
function resplit( a, l, i, j, b, k, BNF) # all are local variables
{
l=split($0, a, "\"")
BNF=0
delete B
for (i=1;i<=l;++i)
{
if (i % 2)
{
k=split(a[i], b)
for (j=1;j<=k;++j)
B[++BNF] = b[j]
}
else
{
B[++BNF] = "\""a[i]"\""
}
}
}
{
resplit()
for (i=1;i<=length(B);++i)
print i ": " B[i]
}
Hope it helps.
Okay, if you really want all three fields, you can get them, but it takes a lot of piping:
$ cat data.txt | awk -F\" '{print $1 "," $2 "," $3}' | awk -F' ,' '{print $1 "," $2}' | awk -F', ' '{print $1 "," $2}' | awk -F, '{print $1 "," $2 "," $3}'
ABC,I am ABC,35
DEF,I am not ABC,42
By the last pipe you've got all three fields to do whatever you'd like with.
Here is something like what I finally got working that is more generic for my project.
Note it doesn't use awk.
someText="ABC \"I am ABC\" 35 DESC '1 23' testing 456"
putItemsInLines() {
local items=""
local firstItem="true"
while test $# -gt 0; do
if [ "$firstItem" == "true" ]; then
items="$1"
firstItem="false"
else
items="$items
$1"
fi
shift
done
echo "$items"
}
count=0
while read -r valueLine; do
echo "$count: $valueLine"
count=$(( $count + 1 ))
done <<< "$(eval putItemsInLines $someText)"
Which outputs:
0: ABC
1: I am ABC
2: 35
3: DESC
4: 1 23
5: testing
6: 456

Resources