Unix shell script to remove new lines preceded with specific characters - shell

First, thanks in advance for your helps.
I need to replace new lines (\n) by a space in an unix files when they are not preceded with ';'.
For example, if you have in an unix file something like :
TestFields;TestFields2
;TestFields3;TestFields4
The output should be :
TestFields;TestFields2 ;TestFields3;TestFields4
So I am using a sed command like that :
sed ':a;N;$!ba;s/[^;]\n/ /g'
The problem is that this command will replace also the character which is before \n so my outpu is like :
TestFields;TestFields ;TestFields3;TestFields4
I loose the '2' in the 'TestFields2' ..
Someone have an idea on how to keep my character but replace the \n ?

capture the matched char and use in replacement
$ sed -r ':a;N;$!ba;s/([^;])\n/\1 /g' file
TestFields;TestFields2 ;TestFields3;TestFields4
g suffix is probably not needed.

This might work for you (GNU sed):
sed ':a;N;/;\n/!s/\n/ /;ta;P;D' file
An alternative to slurping the whole file into memory and reads as the question was read i.e. if the character preceeding the newline is a ; do nothing otherwise replace the newline by a space.

Related

using BASH how can i replace all text between two patterns containing forward slashes?

I'm trying to write a script that replaces text in a text file between two patterns "opt/" and "/".
An example of the data in the file is:
daw9udiwa9diuoawdj098awd89a0909w opt/TEXTTOREPLACE/app-data/version.txt
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/TEXTTOREPLACE/app-data/package.txt
awdhaw9d8yawdf8uaw9f8uwafhiuhawf opt/TEXTTOREPLACE/bin/somefile/somefile
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/TEXTTOREPLACE/bin/someapp/somefile
I've looked at using the 'sed' command, but the pattern matching is confusing me.
I have tried:
sed -e 's!\/[^\/]*\/!\/CHANGE TO ME\/!'
This works - but I would like to add in the "opt" at the beginning to minimise errors
So I tried the following with no luck
sed -e 's!opt\/[^opt\/]*\/!opt\/CHANGE TO ME\/!'
I will be using a $VAR to replace text
so for example
VAR=CHANGED
sed -e 's!opt\/[^opt\/]*\/!opt\/$VAR\/!'
output:
daw9udiwa9diuoawdj098awd89a0909w opt/CHANGED/app-data/version.txt
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/app-data/package.txt
awdhaw9d8yawdf8uaw9f8uwafhiuhawf opt/CHANGED/bin/somefile/somefile
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/bin/someapp/somefile
help appreciated.
Thanks
A few issues with the current sed code:
to expand OS variables the sed script must be wrapped in double quotes
the use of ! as a delimiter may cause issues with some shells and/or shell configurations (eg, ! is a common shorthand for accessing the command line history)
escaping the / is only needed when the / also serves as the sed script delimiter (ie, no need to escape / if using a different delimiter)
One sed idea that addresses these issues:
sed -e "s|opt/[^/]*|opt/$VAR|" input.txt
Where:
opt/[^/]* - match on the string opt/ plus all characters that are not a \
opt/$VAR - replace with the string opt/ plus the contents of the OS VAR variable
This generates:
daw9udiwa9diuoawdj098awd89a0909w opt/CHANGED/app-data/version.txt
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/app-data/package.txt
awdhaw9d8yawdf8uaw9f8uwafhiuhawf opt/CHANGED/bin/somefile/somefile
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/bin/someapp/somefile
If you are open to using awk rather than sed, the following may work for you:
$ awk -v rep="CHANGED" -F/ 'BEGIN{OFS="/"} {$2=rep; print}' file1
daw9udiwa9diuoawdj098awd89a0909w opt/CHANGED/app-data/version.txt
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/app-data/package.txt
awdhaw9d8yawdf8uaw9f8uwafhiuhawf opt/CHANGED/bin/somefile/somefile
wdalkwhjf8aufwaoif98fawfojaw98f8 opt/CHANGED/bin/someapp/somefile
Split each line on the forward slash character and replace the second field with your desired replacement text. Then format the output with the output field separator (OFS) set as a forward slash.

Replace special character with sed

I'm trying to replace a special character with sed, the character are Þ to replace for ;
The lines of the file are, for example;
0370ÞA020Þ4000011600ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011601ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011602ÞRED USADOÞ0,00Þ20190414
Thanks!
Edit
Its worked and solved.
Thanks!!!
Try this - simple substitution work for me
sed 's/Þ/;/g'
That's the job tr was created to do but look at these results:
$ tr 'Þ' ';' < file
0370;;A020;;4000011600;;RED USADO;;0,00;;20190414
0370;;A020;;4000011601;;RED USADO;;0,00;;20190414
0370;;A020;;4000011602;;RED USADO;;0,00;;20190414
$ sed 's/Þ/;/g' < file
0370;A020;4000011600;RED USADO;0,00;20190414
0370;A020;4000011601;RED USADO;0,00;20190414
0370;A020;4000011602;RED USADO;0,00;20190414
tr seems to consider every Þ as being 2 duplicate characters - sed may think the same but while tr is converting a set of chars to a set of chars, sed is converting a regexp to a string and so even if it considers Þ to be 2 characters wide it'll still do what you want. So just an interesting warning about trying to use tr to replace non-ASCII characters - YMMV!
if your data in 'd' file, try gnu sed:
sed -E 'y/Þ/;/' d

Remove final comma or final character in a txt file using sed

I have a comma delimited file like so
info,someinfo,moreinfo,123,
I want to remove the final comma in each line.
I have tried many variations of:
sed -i 's/.$//' filename
The above command seems to be the prevailing opinion of the internet, but it is not changing my file in any way.
if every line contains , as last letter you can remove like this
str="info,someinfo,moreinfo,123,"
echo ${str::-1}
Output:
info,someinfo,moreinfo,123
Hope this helps you
You may have trailing spaces after the last comma. You could use this command instead so that it handles the optional spaces well:
sed -i -E 's/,[[:space:]]*$//' file
I am not sure if your intention is to remove the last character, regardless of whether it is a comma or not.
If it is a DOS \r\n line terminated file, remove the \r also:
$ cat file
info,someinfo,moreinfo,123,
$ unix2dos file
unix2dos: converting file file to DOS format...
$ sed 's/.\r$//' file
info,someinfo,moreinfo,123
In awk redefining record separator RS to conditionally accept the \r. It will be removed in output:
$ awk 'BEGIN{RS="\r?\n"}{sub(/.$/,"")}1' file
info,someinfo,moreinfo,123
If you are using GNU awk, you can add the \r (conditionally) to the output with ORS=RT; before the print, ie:
$ awk '
BEGIN { RS="\r?\n" } # set the record separator to accept \r if needed
{
ORS=RT # set the used separator as the output separator
sub(/.$/,"") # remove last char
}1' file # output
info,someinfo,moreinfo,123
Can you try this one:
sed 's/\(.\)$//' file_name > filename_output
Just redirecting the output to a new file filename_output.
The command removes the last character regardless of what it is. If you want to remove the last character only if it is a comma, try
sed -i 's/,$//' filename
This will leave the other lines alone, as the regex doesn't match.
(The -i option is not properly portable; on BSD and thus also MacOS, you will need -i ''.)

Remove prefix of each line in a file and output to another file using sed

I have a source code file in which comments are prefixed with // (ie. double slashes and an empty space), I want to convert the source code into a document so I tried to cat file.c and pipe it to sed, the thinking is to replace "double slash and a space" if a line starts with it, with empty string, but it looks like the slash has some special meaning in sed, so what's the best way of constructing the sed arguments?
Thanks!
If you want to remove the special meaning of / from sed then following may help you in same.
sed 's/^\/\/ //g' Input_file
So I am escaping / here by using \ before it, so it will be taken as a literal character rather than it's special meaning in code. Also if you are happy with above command's result then use -i to save the changes in Input_file itself. Hope this helps.
The slash only has meaning if you allow it.
sed 's#^// +##' < file.c

Sed substitution places characters after back reference at beginning of line

I have a text file that I am trying to convert to a Latex file for printing. One of the first steps is to go through and change lines that look like:
Book 01 Introduction
To look like:
\chapter{Introduction}
To this end, I have devised a very simple sed script:
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)/\\chapter{\1}/p'
This does the job, except, the closing curly bracket is placed where the initial backslash should be in the substituted output. Like so:
}chapter{Introduction
Any ideas as to why this is the case?
Your call to sed is fine; the problem is that your file uses DOS line endings (CRLF), but sed does not recognize the CR as part of the line ending, but as just another character on the line. The string Introduction\r is captured, and the result \chapter{Introduction\r} is printed by printing everything up to the carriage return (the ^ represents the cursor position)
\chapter{Introduction
^
then moving the cursor to the beginning of the line
\chapter{Introduction
^
then printing the rest of the result (}) over what has already been printed
}chapter{Introduction
^
The solution is to either fix the file to use standard POSIX line endings (linefeed only), or to modify your regular expression to not capture the carriage return at the end of the line.
sed -n -e 's/Book [[:digit:]]\{2\}\s*(.*)\r?$/\\chapter{\1}/p'
As an alternative to sed, awk using gsub might work well in this situation:
awk '{gsub(/Book [0-9]+/,"\\chapter"); print $1"{"$2"}"}'
Result:
\chapter{Introduction}
A solution is to modify the capture group. In this case, since all book chapter names consist only of alphabetic characters I was able to use [[:alpha:]]*. This gave a revised sed script of:
sed -n -e 's/Book [[:digit:]]\{2\}\s*\([[:alpha:]]*\)/\\chapter{\1}/p'.

Resources