Replace "\n" with newline in awk - bash

I'm tailing logs and they output \n instead of newlines.
I thought I'd pipe the tail to awk and do a simple replace, however I cannot seem to escape the newline in the regex. Here I'm demonstrating my problem with cat instead of tail:
test.txt:
John\nDoe
Sara\nConnor
cat test.txt | awk -F'\\n' '{ print $1 "\n" $2 }'
Desired output:
John
Doe
Sara
Connor
Actual output:
John\nDoe
Sara\nConnor
So it looks like \\n does not match the \n between the first and last names in test.txt but instead the newline at the end of each line.
It looks like \\n is not the right way of escaping in the terminal right? This way of escaping works fine in e.g. Sublime Text:

How about this?
$ cat file
John\nDoe
Sara\nConnor
$ awk '{gsub(/\\n/,"\n")}1' file
John
Doe
Sara
Connor

Using GNU's sed, the solution is pretty simple as #hek2mgl already answered (and that IMHO is the way it should work everywhere, but unfortunately doesn't).
But it's bit tricky when doing it on Mac OS X and other *BSD UNIXes.
The best way looks like this:
sed 's/\\n/\'$'\n''/g' <<< 'ABC\n123'
Then of course there's still AWK, #AvinashRaj has the correct answer if you'd like to use that.

Why use either awk or sed for this? Use perl!
perl -pe 's/\\n/\n/g' file
By using perl you avoid having to think about posix compliance, and it will typically give better performance, and it will be consistent across all (most) platforms.

This will work with any sed on any system as it is THE portable way to use newlines in sed:
$ sed 's/\\n/\
/' file
John
Doe
Sara
Connor
If it is possible for your input to contain a line like foo\\nbar and the \\ is intended to be an escaped backslash then you cannot use a simple substitution approach like you've asked for.

I have struggled with this problem before, but I discovered the cleanest way is to use the builtin printf
printf "$(cat file.txt)" | less
Here is a real world example dealing with aws iam embeded json policy in the output, the file file.txt contains:
{
"registryId": "111122223333",
"repositoryName": "awesome-repo",
"policyText": "{\n \"Version\" : \"2008-10-17\",\n \"Statement\" : [ {\n \"Sid\" : \"AllowPushPull\",\n \"Effect\" : \"Allow\",\n \"Principal\" : {\n \"AWS\" : [ \"arn:aws:iam::444455556666:root\", \"arn:aws:iam::444455556666:user/johndoe\" ]\n },\n \"Action\" : [ \"ecr:BatchCheckLayerAvailability\", \"ecr:BatchGetImage\", \"ecr:CompleteLayerUpload\", \"ecr:DescribeImages\", \"ecr:DescribeRepositories\", \"ecr:GetDownloadUrlForLayer\", \"ecr:InitiateLayerUpload\", \"ecr:PutImage\", \"ecr:UploadLayerPart\" ]\n } ]\n}"
}
after applying the above (without the less) you get:
{
"registryId": "111122223333",
"repositoryName": "awesome-repo",
"policyText": "{
"Version" : "2008-10-17",
"Statement" : [ {
"Sid" : "AllowPushPull",
"Effect" : "Allow",
"Principal" : {
"AWS" : [ "arn:aws:iam::444455556666:root", "arn:aws:iam::444455556666:user/johndoe" ]
},
"Action" : [ "ecr:BatchCheckLayerAvailability", "ecr:BatchGetImage", "ecr:CompleteLayerUpload", "ecr:DescribeImages", "ecr:DescribeRepositories", "ecr:GetDownloadUrlForLayer", "ecr:InitiateLayerUpload", "ecr:PutImage", "ecr:UploadLayerPart" ]
} ]
}"
}
Note that the value for "policyText" is itself a string containing json.

I would use sed:
sed 's/\\n/\n/g' file

In addition to the accepted answer, OP asked about tail, and on some unix variants, eg ubuntu you need to add -W interactive to awk
tail -f error.log | awk -W interactive '{gsub(/\\n/,"\n")}1'

Related

Remove one-character words that don't contain "a, i, and o"

I'm creating a shell script that will convert stdin and output to stdout. Currently I have it converting everything to lowercase. I need to also remove single character words that are not "a", "i", or "o".
Here's what I've tried:
grep -o '[a-z]\{2,\}' | while read WORD
This successfully removes all single letter words.
Here's the desire,
./file.sh < myText.txt
Given that myText.txt has something like,
"Sample text t I o im"
Output:
sample text i o im
If awk, sed or other bash built-ins work better, I'd love to hear it.
Any help is appreciated, just trying to learn.
Here is one solution if one word per line is ok (like your going-in example did):
echo "Sample text t I o im" | \
tr '[:upper:]' '[:lower:]' | \
grep -o '\([a-z]\{2,\}\|i\|o\|a\)'
If perl is your option, how about:
perl -lane 'print join(" ", grep {tr /A-Z/a-z/; !/^[^aio]$/} #F)' myText.txt
Output:
sample text i o im
Using only GNU sed, if that's allowed,
sed -E '
s/.*/\L&/
s/(^|\s)[bcdefghjklmnpqrstuvwxyz]($|\s)/ /g
' <<< 'Sample text t I o im'
prints
sample text i o im

Need help consolidating three sed calls into one

I have a variable called TR_VERSION that is a JSON list of version numbers that looks something like this:
[
"1.0.1",
"1.0.2",
"1.0.3"
]
I would like to strip all of the JSON specific characters - [, ", , and ]. The following code works but it would be great to consolidate to one sed call instead of three.
TR_VERSION=$(echo $VERSION \
| sed 's|[",]||g' \
| sed 's/\[//' \
| sed 's/\]//')
Thanks for the answers!
Never ever use sed to parse json.
This is the way to go:
$ jq -r '.[]' < file.json
Output as expected
1.0.1
1.0.2
1.0.3
If you just want to remove all ", ,, [ and ] chars you may use
TR_VERSION=$(echo "$VERSION" | sed 's/[][",]//g')
Or,
TR_VERSION=$(sed 's/[][",]//g' <<< "$VERSION")
The [][",] pattern matches ], [, " or , chars.
If you really want to avoid a JSON parer, there is still no need to use sed. You could also do it by
TR_VERSION=$(tr -d '[]",' <<<$VERSION)
which, IMHO, is slightly better readable than the sed counterpart.

Convert first character to capital along with special character separator

I would like to convert first character to capital and character coming after dash(-) needs to be converted to capital using bash.
I can split individual elements using - ,
echo "string" | tr [:lower:] [:upper:]
and join all but that doesn't seem effect. Is there any easy way to take care of this using single line?
Input string:
JASON-CONRAD-983636
Expected string:
Jason-Conrad-983636
I recommend using Python for this:
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
Usage:
capitalize() {
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
}
echo JASON-CONRAD-983636 | capitalize
Output:
Jason-Conrad-983636
In pure bash (v4+) without any third party utils
str=JASON-CONRAD-983636
IFS=- read -ra raw <<<"$str"
final=()
for str in "${raw[#]}"; do
first=${str:0:1}
rest=${str:1}
final+=( "${first^^}${rest,,}" )
done
and print the result
( IFS=- ; printf '%s\n' "${final[*]}" ; )
This might work for you (GNU sed):
sed 's/.*/\L&/;s/\b./\u&/g' file
Lowercase everything. Uppercase first characters of words.
Alternative:
sed -E 's/\b(.)((\B.)*)/\u\1\L\2/g' file
Could you please try following(in case you are ok with awk).
var="JASON-CONRAD-983636"
echo "$var" | awk -F'-' '{for(i=1;i<=NF;i++){$i=substr($i,1,1) tolower(substr($i,2))}} 1' OFS="-"
Although the party is mostly over, please let me join with a perl solution:
perl -pe 's/(^|-)([^-]+)/$1 . ucfirst lc $2/ge' <<<"JASON-CONRAD-983636"
It may be cunning to use the ucfirst function :)

Replace a multiline pattern using Perl, sed, awk

I need to concatenate multiple JSON files, so
...
"tag" : "description"
}
]
[
{
"tag" : "description"
...
into this :
...
"tag" : "description"
},
{
"tag" : "description"
...
So I need to replace the pattern ] [ with ,, but the new line character makes me crazy...
I used several methods, I list some of them:
sed
sed -i '/]/,/[/{s/./,/g}' file.json
but I get this error:
sed: -e expression #1, char 16: unterminated address regex
I tried to delete all the newlines
following this example
sed -i ':a;N;$!ba;s/\n/ /g' file.json
and the output file has "^M". Although I modified this file in unix, I used the dos2unix command on this file but nothing happens. I tried then to include the special character "^M" on the search but with worse results
Perl
(as proposed here)
perl -i -0pe 's/]\n[/\n,/' file.json
but I get this error:
Unmatched [ in regex; marked by <-- HERE in m/]\n[ <-- HERE / at -e line 1.
I would like to concatenate several JSON files.
If I understand correctly, you have something like the following (where letters represent valid JSON values):
to_combine/file1.json: [a,b,c]
to_combine/file2.json: [d,e,f]
And from that, you want the following:
combined.json: [a,b,c,d,e,f]
You can use the following to achieve this:
perl -MJSON::XS -0777ne'
push #data, #{ decode_json($_) };
END { print encode_json(\#data); }
' to_combine/*.json >combined.json
As for the problem with your Perl solution:
[ has a special meaning in regex patterns. You need to escape it.
You only perform one replacement.
-0 doesn't actually turn on slurp mode. Use -0777.
You place the comma after the newline, when it would be nicer before the newline.
Fix:
cat to_combine/*.json | perl -0777pe's/\]\n\[/,\n/g' >combined.json
Note that a better way to combine multiple JSON files is to parse them all, combine the parsed data structure, and reencode the result. Simply changing all occurrences of ][ to a comma , may alter data instead of markup
sed is a minimal program that will operate only on a single line of a file at a time. Perl encompasses everything that sed or awk will do and a huge amount more besides, so I suggest you stick with it
To change all ]...[ pairs in file.json (possibly separated by whitespace) to a single comma, use this
perl -0777 -pe "s/\]\s*\[/,/g" file.json > file2.json
The -0 option specifies an octal line separator, and giving it the value 777 makes perl read the entire file at once
One-liners are famously unintelligible, and I always prefer a proper program file, which would look like this
join_brackets.pl
use strict;
use warnings 'all';
my $data = do {
local $/;
<>;
}
$data =~ s/ \] \s* \[ /,/gx;
print $data;
and you would run it as
perl join_brackets.pl file.json > joined.json
I tried with example in your question.
$ sed -rn '
1{$!N;$!N}
$!N
/\s*}\s*\n\s*]\s*\n\s*\[\s*\n\s*\{\s*/M {
s//\},\n\{/
$!N;$!N
}
P;D
' file
...
"tag" : "description"
},
{
"tag" : "description"
...
...
"tag" : "description"
},
{
"tag" : "description"
...

Capitalize strings in sed or awk

I have three types of strings that I'd like to capitalize in a bash script. I figured sed/awk would be my best bet, but I'm not sure. What's the best way given the following requirements?
single word
e.g. taco -> Taco
multiple words separated by hyphens
e.g. my-fish-tacos -> My-Fish-Tacos
multiple words separated by underscores
e.g. my_fish_tacos -> My_Fish_Tacos
There's no need to use capture groups (although & is a one in a way):
echo "taco my-fish-tacos my_fish_tacos" | sed 's/[^ _-]*/\u&/g'
The output:
Taco My-Fish-Tacos My_Fish_Tacos
The escaped lower case "u" capitalizes the next character in the matched sub-string.
Using awk:
echo 'test' | awk '{
for ( i=1; i <= NF; i++) {
sub(".", substr(toupper($i), 1,1) , $i);
print $i;
# or
# print substr(toupper($i), 1,1) substr($i, 2);
}
}'
Try the following:
sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g'
It works for me using GNU sed, but I don't think BSD sed supports \U and \L.
Here is a solution that does not use the \u, that is not common to all seds.
Save this file into capitalize.sed, then run sed -i -f capitalize.sed FILE
s:^:.:
h
y/qwertyuiopasdfghjklzxcvbnm/QWERTYUIOPASDFGHJKLZXCVBNM/
G
s:$:\n:
:r
/^.\n.\n/{s:::;p;d}
/^[^[:alpha:]][[:alpha:]]/ {
s:.\(.\)\(.*\):x\2\1:
s:\n\(..\):\nx:
tr
}
/^[[:alpha:]][[:alpha:]]/ {
s:\n.\(.\)\(.*\)$:\nx\2\1:
s:..:x:
tr
}
/^[^\n]/ {
s:^.\(.\)\(.*\)$:.\2\1:
s:\n..:\n.:
tr
}
alinsoar's mind-blowing solution doesn't work at all in Plan9 sed, or correctly in busybox sed. But you should still try to figure out how it's supposed to do its thing: you will learn a lot about sed.
Here's a not-as-clever but easier to understand version which works in at least Plan9, busybox, and GNU sed (and probably BSD and MacOS). Plan9 sed needs backslashes removed in the match part of the s command.
#! /bin/sed -f
y/PYFGCRLAOEUIDHTNSQJKXBMWVZ/pyfgcrlaoeuidhtnsqjkxbmwvz/
s/\(^\|[^A-Za-z]\)a/\1A/g
s/\(^\|[^A-Za-z]\)b/\1B/g
s/\(^\|[^A-Za-z]\)c/\1C/g
s/\(^\|[^A-Za-z]\)d/\1D/g
s/\(^\|[^A-Za-z]\)e/\1E/g
s/\(^\|[^A-Za-z]\)f/\1F/g
s/\(^\|[^A-Za-z]\)g/\1G/g
s/\(^\|[^A-Za-z]\)h/\1H/g
s/\(^\|[^A-Za-z]\)i/\1I/g
s/\(^\|[^A-Za-z]\)j/\1J/g
s/\(^\|[^A-Za-z]\)k/\1K/g
s/\(^\|[^A-Za-z]\)l/\1L/g
s/\(^\|[^A-Za-z]\)m/\1M/g
s/\(^\|[^A-Za-z]\)n/\1N/g
s/\(^\|[^A-Za-z]\)o/\1O/g
s/\(^\|[^A-Za-z]\)p/\1P/g
s/\(^\|[^A-Za-z]\)q/\1Q/g
s/\(^\|[^A-Za-z]\)r/\1R/g
s/\(^\|[^A-Za-z]\)s/\1S/g
s/\(^\|[^A-Za-z]\)t/\1T/g
s/\(^\|[^A-Za-z]\)u/\1U/g
s/\(^\|[^A-Za-z]\)v/\1V/g
s/\(^\|[^A-Za-z]\)w/\1W/g
s/\(^\|[^A-Za-z]\)x/\1X/g
s/\(^\|[^A-Za-z]\)y/\1Y/g
s/\(^\|[^A-Za-z]\)z/\1Z/g
This might work for you (GNU sed):
echo "aaa bbb ccc aaa-bbb-ccc aaa_bbb_ccc aaa-bbb_ccc" | sed 's/\<.\|_./\U&/g'
Aaa Bbb Ccc Aaa-Bbb-Ccc Aaa_Bbb_Ccc Aaa-Bbb_Ccc

Resources