replacing specific characters in a line shell script - bash

I have the following contents in a file
{"Hi","Hello","unix":["five","six"]}
I would like to replace comma within the square brackets only to semi colon. Rest of the comma's in the line should not be changed.
Output should be
{"Hi","Hello","unix":["five";"six"]}
I have tried using sed but it is not working. Below is the command I tried. Kindly help.
sed 's/:\[*\,*\]/;/'
Thanks

If your Input_file is same as sample shown then following may help you in same.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
Output will be as follows.
{"Hi","Hello","unix":["five";"six"]}
EDIT: Adding explanation also for same now, it should be only taken for explanation purposes, one should run above code only for getting the output.
sed 's/\([^[]*\)\([^,]*\),\(.*\)/\1\2;\3/g' Input_file
s ##is for substitution in sed.
\([^[]*\) ##Creating the first memory hold which will have the contents from starting to before first occurrence of [ and will be obtained by 1 later in code.
\([^,]*\) ##creating second memory hold which will have everything from [(till where it stopped yesterday) to first occurrence of ,
, ##Putting , here in the line of Input_file.
\(.*\) ##creating third memory hold which will have everything after ,(comma) to till end of current line.
/\1\2;\3/g ##Now mentioning the memory hold by their number \1\2;\3/g so point to be noted here between \2 and \3 have out ;(semi colon) as per OP's request it needed semi colon in place of comma.

Awk would also be useful here
awk -F'[][]' '{gsub(/,/,";",$2); print $1"["$2"]"$3}' file
by using gsub, you can replace all occurrences of matched symbol inside a specific field
Input File
{"Hi","Hello","unix":["five","six"]}
{"Hi","Hello","unix":["five","six","seven","eight"]}
Output
{"Hi","Hello","unix":["five";"six"]}
{"Hi","Hello","unix":["five";"six";"seven";"eight"]}

You should definitely use RavinderSingh13's answer instead of mine (it's less likely to break or exhibit unexpected behavior given very complex input) but here's a less robust answer that's a little easier to explain than his:
sed -r 's/(:\[.*),(.*\])/\1;\2/g' test
() is a capture group. You can see there are two in the search. In the replace, they are refered to as \1 and \2. This allows you to put chunks of your search back in the replace expression. -r keeps the ( and ) from needing to be escaped with a backslash. [ and ] are special and need to be escaped for literal interpretation. Oh, and you wanted .* not *. The * is a glob and is used in some places in bash and other shells, but not in regexes alone.
edit: and /g allows the replacement to happen multiple times.

Related

Using shell scripts to remove all commas except for the first on each line

I have a text file consisting of lines which all begin with a numerical code, followed by one or several words, a comma, and then a list of words separated by commas. I need to delete all commas in every line apart from the first comma. For example:
1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary
which should be changed to
1.2.3 Example question, a question that is hopefully not too rudimentary
I have tried using sed and shell scripts to solve this, and I can figure out how to delete the first comma on each line (1) and how to delete all commas (2), but not how to delete only the commas after the first comma on each line
(1)
while read -r line
do
echo "${line/,/}"
done <"filename.txt" > newfile.txt
mv newfile.txt filename.txt
(2)
sed 's/,//g' filename.txt > newfile.txt
You need to capture the first comma, and then remove the others. One option is to change the first comma into some otherwise unused character (Control-A for example), then remove the remaining commas, and finally replace the replacement character with a comma:
sed -e $'s/,/\001/; s/,//g; s/\001/,/'
(using Bash ANSI C quoting — the \001 maps to Control-A).
An alternative mechanism uses sed's labels and branches, as illustrated by Wiktor Stribiżew's answer.
If using GNU sed, you can specify a number in the flags of sed's s/// command along with g to indicate which match to start replacing at:
$ sed 's/,//2g' <<<'1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary'
1.2.3 Example question, a question that is hopefully not too rudimentary
Its manual says:
Note: the POSIX standard does not specify what should happen when you mix the g and NUMBER modifiers, and currently there is no widely agreed upon meaning across sed implementations. For GNU sed, the interaction is defined to be: ignore matches before the NUMBERth, and then match and replace all matches from the NUMBERth on.
so if you're using a different sed, your mileage may vary. (OpenBSD and NetBSD seds raise an error instead, for example).
You can use
sed ':a; s/^\([^,]*,[^,]*\),/\1/;ta' filename.txt > newfile.txt
Details
:a - sets an a label
s/^\([^,]*,[^,]*\),/\1/ - finds 0+ non-commas at the start of string, a comma and again 0+ non-commas, capturing this substring into Group 1, and then just matching a , and replacing the match with the contents of Group 1 (removes the non-first comma)
ta - upon a successful replacement, jumps back to the a label location.
See an online sed demo:
s='1.2.3 Example question, a, question, that, is, hopefully, not, too, rudimentary'
sed ':a; s/^\([^,]*,[^,]*\),/\1/;ta' <<< "$s"
# => 1.2.3 Example question, a question that is hopefully not too rudimentary
awk 'NF>1 {$1=$1","} 1' FS=, OFS= filename.txt
sed ':a;s/,//2;t a' filename.txt
sed 's/,/\
/;s/,//g;y/\n/,/' filename.txt
This might work for you (GNU sed):
sed 's/,/&\n/;h;s/,//g;H;g;s/\n.*\n//' file
Append a newline to the first comma.
Copy the current line to the hold space.
Remove all commas in the current line.
Append the current line to the hold space.
Swap the current line for the hold space.
Remove everything between the introduced newlines.

unterminated address regex while using sed

I am trying to use the sed command to find and print the number that appears between "\MP2=" and "\" in a portion of a line that appears like this in a large .log file
\MP2=-193.0977448\
I am using the command below and getting the following error:
sed "/\MP2=/,/\/p" input.log
sed: -e expression #1, char 12: unterminated address regex
Advice on how to alter this would be greatly appreciated!
Superficially, you just need to double up the backslashes (and it's generally best to use single quotes around the sed program):
sed '/\\MP2=/,/\\/p' input.log
Why? The double-backslash is necessary to tell sed to look for one backslash. The shell also interprets backslashes inside double quoted strings, which complicates things (you'd need to write 4 backslashes to ensure sed sees 2 and interprets it as 'look for 1 backslash') — using single quoted strings avoids that problem.
However, the /pat1/,/pat2/ notation refers to two separate lines. It looks like you really want:
sed -n '/\\MP2=.*\\/p' input.log
The -n suppresses the default printing (probably a good idea on the first alternative too), and the pattern looks for a single line containing \MP2= followed eventually by a backslash.
If you want to print just the number (as the question says), then you need to work a little harder. You need to match everything on the line, but capture just the 'number' and remove everything except the number before printing what's left (which is just the number):
sed -n '/.*\\MP2=\([^\]*\)\\.*/ s//\1/p' input.log
You don't need the double backslash in the [^\] (negated) character class, though it does no harm.
If the starting and ending pattern are on the same line, you need a substitution. The range expression /r1/,/r2/ is true from (an entire) line which matches r1, through to the next entire line which matches r2.
You want this instead;
sed -n 's/.*\\MP2=\([^\\]*\)\\.*/\1/p' file
This extracts just the match, by replacing the entire line with just the match (the escaped parentheses create a group which you can refer back to in the substitution; this is called a back reference. Some sed dialects don't want backslashes before the grouping parentheses.)
awk is a better tool for this:
awk -F= '$1=="MP2" {print $2}' RS='\' input.log
Set the record separator to \ and the field separator to '=', and it's pretty trivial.

Remove rows with too many delimiters

I have a file with fields separated by the '`' character. But sometimes the actual data also contains this character. How can I remove all the erroneous rows and retain only the good quality data.
Sample Row as below . Towards the end 'fff`ff' this is the erroneous column . in such case The row should be eliminated.
xxx`1000165811`2012`2012_q2`05/09/2012 22:02:00`1343`04/07/2004 00:00:00`05/09/2012 00:00:00````F`1`1.000000`9.620000`1.0000````fff`Not`Free`Free`1.000000`9.620000`0.000000`1.0000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`56565666`255.590000`21`0`0.000000```ddd`dddd`FA May 2012 ddd`0.000000`0.000000`0.000000`0.000000`0.000000`05/30/2012 00:00:00`05/30/2012 00:00:00`1.000000`ddd`ddd`OW`DL`dd dd dd`ddd`dd`dd dd`dd dd`0.000000`0.000000``````````0.000000`````````Non_Mobile`9.620000`1.000000`1`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`0.000000`9.620000`9.620000`0.000000`0.000000`0.000000`0.000000`28.590000`6.990000`**fff`ff**`````````9.620000`1.000000`1
You need to know what the correct number of delimiters in a line is. You need to count the actual number of delimiters in each line, and reject those lines where the actual count is not the correct number.
Assuming the the correct number of separators is n=5, then you could try:
n=5
grep -E '^[^`]*(`[^`]*){'"$n"'}$' data
The regex uses extended regular expressions (-E). The regex matches the start of the line, zero or more non-back-ticks, then a sequence of n occurrences of a back tick followed by zero or more non-back-ticks, followed by the end of line. Because the back-tick is a shell metacharacter, it is best to enclose most of the regular expression in single quotes. The variable $n could be used without the double quotes around it, but it's generally best to enclose variables in double quotes. Clearly, you can also use this version too:
grep -E '^([^`]*`){'"$n"'}[^`]*$' data
Given a data file data:
AA`BB`CC`DD`EE`FF
AABB`CC`DD`EE`FF
A`A`BB`CC`DD`EE`FF
`BB`CC`DD`EE`FF
`BB`CC`DD`EE`
``CC`DD`EE`
``CC``EE`
````EE`
`BB```EE`
`````
``````
````
Welcome`to`the`land`of`insanity
The output of the command is:
AA`BB`CC`DD`EE`FF
`BB`CC`DD`EE`FF
`BB`CC`DD`EE`
``CC`DD`EE`
``CC``EE`
````EE`
`BB```EE`
`````
Welcome`to`the`land`of`insanity
grep -v "[^`]`[^`]`[^`]`"
you need to have one more times that the correct lines would have
In the spirit of "Be careful what you ask for", here is a "one-liner" (spread over three lines for readability) that will do what was asked, using only awk and assuming that $FILE is the relevant filename.
awk -F'`' -v file="$FILE" '
BEGIN{ while(getline<file){if (min==""||NF<min){min=NF}}}
NF==min' "$FILE"
This incantation first determines the minimum number of delimiters per line (without sorting the file), and then rejects all lines with more than that many.
(This is similar to Ed Morton's proposal, but without the bug :-)

use sed to merge lines and add comma

I found several related questions, but none of them fits what I need, and since I am a real beginner, I can't figure it out.
I have a text file with entries like this, separated by a blank line:
example entry &with/ special characters
next line (any characters)
next %*entry
more words
I would like the output merge the lines, put a comma between, and delete empty lines. I.e., the example should look like this:
example entry &with/ special characters, next line (any characters)
next %*entry, more words
I would prefer sed, because I know it a little bit, but am also happy about any other solution on the linux command line.
Improved per Kent's elegant suggestion:
awk 'BEGIN{RS="";FS="\n";OFS=","}{$1=$1}7' file
which allows any number of lines per block, rather than the 2 rigid lines per block I had. Thank you, Kent. Note: The 7 is Kent's trademark... any non-zero expression will cause awk to print the entire record, and he likes 7.
You can do this with awk:
awk 'BEGIN{RS="";FS="\n";OFS=","}{print $1,$2}' file
That sets the record separator to blank lines, the field separator to newlines and the output field separator to a comma.
Output:
example entry &with/ special characters,next line (any characters)
next %*entry,more words
Simple sed command,
sed ':a;N;$!ba;s/\n/, /g;s/, , /\n/g' file
:a;N;$!ba;s/\n/, /g -> According to this answer, this code replaces all the new lines with ,(comma and space).
So After running only the first command, the output would be
example entry &with/ special characters, next line (any characters), , next %*entry, more words
s/, , /\n/g - > Replacing , , with new line in the above output will give you the desired result.
example entry &with/ special characters, next line (any characters)
next %*entry, more words
This might work for you (GNU sed):
sed ':a;$!N;/.\n./s/\n/, /;ta;/^[^\n]/P;D' file
Append the next line to the current line and if there are characters either side of the newline substitute the newline with a comma and a space and then repeat. Eventually an empty line or the end-of-file will be reached, then only print the next line if it is not empty.
Another version but a little more sofisticated (allowing for white space in the empty line) would be:
sed ':a;$!N;/^\s*$/M!s/\n/, /;ta;/\`\s*$/M!P;D' file
sed -n '1h;1!H
$ {x
s/\([^[:cntrl:]]\)\n\([^[:cntrl:]]\)/\1, \2/g
s/\(\n\)\n\{1,\}/\1/g
p
}' YourFile
change all after loading file in buffer. Could be done "on the fly" while reading the file and based on empty line or not.
use -e on GNU sed

Ignoring lines with blank or space after character using sed

I am trying to use sed to extract some assignments being made in a text file. My text file looks like ...
color1=blue
color2=orange
name1.first=Ahmed
name2.first=Sam
name3.first=
name4.first=
name5.first=
name6.first=
Currently, I am using sed to print all the strings after the name#.first's ...
sed 's/name.*.first=//' file
But of course, this also prints all of the lines with no assignment ...
Ahmed
Sam
# I'm just putting this comment here to illustrate the extra carriage returns above; please ignore it
Is there any way I can get sed to ignore the lines with blank or whitespace only assignments and store this to an array? The number of assigned name#.first's is not known, nor are the number of assignments of each type in general.
This is a slight variation on sputnick's answer:
sed -n '/^name[0-9]\.first=\(.\+\)/ s//\1/p'
The first part (/^name[0-9]\.first=\(.\+\)/) selects the lines you want to pass to the s/// command. The empty pattern in the s command re-uses the previous regular expression and the replacement portion (\1) replaces the entire match with the contents of the first parenthesized part of the regex. Use the -n and p flags to control which lines are printed.
sed -n 's/^name[0-9]\.\w\+=\(\w\+\)/\1/p' file
Output
Ahmed
Sam
Explainations
the -n switch suppress the default behavior of sed : printing all lines
s/// is the skeleton for a substitution
^ match the beginning of a line
name literal string
[0-9] a digit alone
\.\w\+ a literal dot (without backslash means any character) followed by a word character [a-zA-Z0-9_] al least one : \+
( ) is a capturing group and \1 is the captured group

Resources