Deleting lines beginning with a CR in a file directly - shell

I want to write a ksh script delete all lines of a file beginning by a carriage return. I want to specify that in the same script I want to reuse the modified file so I need to do the modification directly in the file.
For example here is my file in Notepad ++ (with the carriage return shown as CRLF as its a Windows format file):
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CRLF
CE3;CPr3;CRLF
CRLF
CRLF
and I want to obtain:
CE1;CPr1;CRLF
CE2;CPr2;CRLF
CE3;CPr3;CRLF
The script I wrote so far is:
sed -i '/^\n/d' ListeTable.lst
I also tried with \r and \R but nothing is working.
As I specify there is a following script that reuse the modified file that looks like (but there is more):
echo -n "(CE = '$(tail -n 1 ListeTable.lst | cut -d$';' -f1)'and CPr = '$(tail -n 1 ListeTable.lst | cut -d$';' -f2)')"

Ok, so I found a regex that works for this problem : '/^\s*$/d' (\s = match any whitespace character (newlines, spaces, tabs); * = the character may repeat any times or be absent; $ = to the end of the last \s character found)
So the working code is : sed -i '/^\s*$/d' ListeTable.lst

Related

Looping and grep writes output for the last line only

I am looping through the lines in a text file. And performing grep on each lines through directories. like below
while IFS="" read -r p || [ -n "$p" ]
do
echo "This is the field: $p"
grep -ilr $p * >> Result.txt
done < fields.txt
But the above writes the results for the last line in the file. And not for the other lines.
If i manually execute the command with the other lines, it works (which mean the match were found). Anything that i am missing here? Thanks
The fields.txt looks like this
annual_of_measure__c
attached_lobs__c
apple
When the file fields.txt
has DOS/Windows lineending convention consisting of two character (Carriage-Return AND Linefeed) and
that file is processed by Unix-Tools expecting Unix lineendings consisting of only one character (Linefeed)
then the line read by the read command and stored in the variable $p is in the first line annual_of_measure__c\r (note the additional \r for the Carriage-Return). Then grep will not find a match.
From your description in the question and the confirmation in the comments, it seems that the last line in fields.txt has no lineending at all, so the variable $p is the ordinary string apple and grep can find a match on the last line of the file.
There are tools for converting lineendings, e.g. see this answer or even more options in this answer.

Remove final comma or final character in a txt file using sed

I have a comma delimited file like so
info,someinfo,moreinfo,123,
I want to remove the final comma in each line.
I have tried many variations of:
sed -i 's/.$//' filename
The above command seems to be the prevailing opinion of the internet, but it is not changing my file in any way.
if every line contains , as last letter you can remove like this
str="info,someinfo,moreinfo,123,"
echo ${str::-1}
Output:
info,someinfo,moreinfo,123
Hope this helps you
You may have trailing spaces after the last comma. You could use this command instead so that it handles the optional spaces well:
sed -i -E 's/,[[:space:]]*$//' file
I am not sure if your intention is to remove the last character, regardless of whether it is a comma or not.
If it is a DOS \r\n line terminated file, remove the \r also:
$ cat file
info,someinfo,moreinfo,123,
$ unix2dos file
unix2dos: converting file file to DOS format...
$ sed 's/.\r$//' file
info,someinfo,moreinfo,123
In awk redefining record separator RS to conditionally accept the \r. It will be removed in output:
$ awk 'BEGIN{RS="\r?\n"}{sub(/.$/,"")}1' file
info,someinfo,moreinfo,123
If you are using GNU awk, you can add the \r (conditionally) to the output with ORS=RT; before the print, ie:
$ awk '
BEGIN { RS="\r?\n" } # set the record separator to accept \r if needed
{
ORS=RT # set the used separator as the output separator
sub(/.$/,"") # remove last char
}1' file # output
info,someinfo,moreinfo,123
Can you try this one:
sed 's/\(.\)$//' file_name > filename_output
Just redirecting the output to a new file filename_output.
The command removes the last character regardless of what it is. If you want to remove the last character only if it is a comma, try
sed -i 's/,$//' filename
This will leave the other lines alone, as the regex doesn't match.
(The -i option is not properly portable; on BSD and thus also MacOS, you will need -i ''.)

Newline is not '\n'

I have a text file which was created by Matlab (I don't have the source code), and was in the form:
a b c d
e f g h
I used
sed -i '' $'s/\t/\/g' filename
to replace all the tabs with commas and ended up have a file that looks like this:
a,b,c,d
e,f,g,h
then, I tried to remove all the line breaks using
tr '\n' ' ' < filename
It gave me only the last line, But when I manually edited the text file by placing the pointer to the end of the line and then pressing "del" and "enter" and re-ran the code it worked fine.
So, the newline in the text file is probably not symbolized by \n, what other chars are there to symbolize line breaks?
P.S If I run the tr line on the file before I remove the tabs I get an empty output.
Thank you.
Sounds like your newlines are \r\n (Windows-style ones). One option would be to remove them first using this command:
tr -s '\r\n' ' ' < file
The -s switch means each sequence of characters present in the input is only replaced by a single space. Thanks to glenn jackman for pointing this out.
Guessing your intention slightly, you may want to use something like this, to replace all spaces including line breaks with commas:
tr -s '[:space:]' ',' < file
You could then pipe this to sed to remove the trailing comma if you wanted.

bash script: write string with double quotes and blanks to file

I try to use sed to read a line from an ASCII file, parse it and write it slightly changed to a defined line number in an output file.
The line format in the input file is as follows:
linenumber:designator,"variable text content"
e.g.
3:string1,"this is text of string 1"
So the outfile should look as follows in line 3:
string1,"this is text of string 1"
The line includes the double quotes and the blanks. All old lines are moved one line down.
The user is responsible to provide a proper input file regarding the order of lines and has to consider that lines in the output file are moved down with each new line in the input file. The script does not know about any order except for the line number given in the input file.
A script shall read all lines and put the content of those lines into an outputfile at the given line numbers
including double quotes and blanks
without the line number part and the colon
The command I use successfully with the shell is e.g.:
sed -i '3istring1,"this is text of string 1"' outfile
No trouble with quotes, double quotes and blanks there.
Using the bash script
while read line
do
linenum=$(echo $line | cut -f1 -d:)
linestr=$(echo $line | cut -f2 -d:)
sedcmd="sed -i '"
sedcmd=${sedcmd}${linenum}
sedcmd=${sedcmd}i
sedcmd=${sedcmd}${linestr}
sedcmd=${sedcmd}"' outfile"
echo "---> $sedcmd"
$sedcmd
done < script/new_records.txt
shows exactly the same sed command with echo but returns with:
sed: -e expression #1, char 1: unknown command: `''
Apparently executing the sed command from within a bash script is different from executing it directly in the bash shell.
I tried a variety of escape sequences "\" before quotes, double quotes and blanks...but rather randomly, and neither of those was successful.
What do I have to do in order to write the string including blanks and double quotes to a specified line in a text file?
# Assuming OutFile exist and have enough line
while read ThisLine
do
LineNum=$(echo "${ThisLine}" | cut -f1 -d ":" )
echo "${ThisLine##*:}" > /tmp/LineContent.txt
sed -i -n "${LineNum} !{p;b;};r /tmp/LineContent.txt" OutFile
done < script/new_records.txt
Not the best thing because you assume lot of issue like enough line in outfile, no problem reading the line (what about escaped char in quoted string, ...) could occur
Okay, I'll give it a shot. If I understand what you're trying to do correctly, and if you're certain the code input file is not malformed, then
sed -i -f <(sed 's/:/i/' insertions.txt) datafile.txt
is the most straightforward way. This works because with an input specification of
number:text
all one has to do to is to replace the : with an i to get a sed command that says: "When handling line number, insert text". The <() bit is bash-style command substitution that expands to the name of a FIFO from which the output of the command can be read.
It might be prudent to guard against mistakes by saying something like
sed -i -f <(sed '/^[0-9]\+:/!d; s/:/i/' insertions.txt) datafile.txt
This removes all lines from insertions.txt that don't begin with a number followed by a colon because those are obviously broken.
Note that this all-in-one-go approach treats line numbers as they were in the input file. That is to say, given an insertions file with content
2:foo,"bar "
4:baz,"qux "
baz,"qux " will appear in line 5 of the output (before line 4 of the input). If this is not desired, sed will have to be called multiple times to handle each insertion individually, as in
while read insertion; do
sed -i "${insertion/:/i}" datafile.txt
done < insertions.txt
${insertion/:/i} is another bashism that replaces the first : in a shell variable with i and expands to the result, i.e., if insertion=1:2:3, then ${insertion/:/i} is 1i2:3.

Shell Scripting unwanted '?' character at the end of file name

I get an unwanted '?' at the end of my file name while doing this:
emplid=$(grep -a "Student ID" "$i".txt | sed 's/(Student ID: //g' | sed 's/)Tj//g' )
#gets emplid by doing a grep from some text file
echo "$emplid" #prints employee id correctly
cp "$i" "$emplid".pdf #getting an extra '?' character after emplid and before .pdf
i.e instead of getting the file name like 123456.pdf , I get 123456?.pdf .
Why is this happening if the echo prints correctly?
How can I remove trailing question mark characters ?
It sounds like your script file has DOS-style line endings (\r\n) instead of unix-style (just \n) -- when a script in this format, the \r gets treated as part of the commands. In this instance, it's getting included in $emplid and therefore in the filename.
Many platforms support the dos2unix command to convert the file to unix-style line endings. And once it's converted, stick to text editors that support unix-style text files.
EDIT: I had assumed the problem line endings were in the shell script, but it looks like they're in the input file ("$i".txt) instead. You can use dos2unix on the input file to clean it and/or add a cleaning step to the sed command in your script. BTW, you can have a single instance of sed apply several edits with the -e option:
emplid=$(grep -a "Student ID" "$i".txt | sed '-e s/(Student ID: //g' -e 's/)Tj//g' -e $'s/\r$//' )
I'd recommend against using sed 's/.$//' -- if the file is in unix format, that'll cut off the last character of the filename.
using the file command to detect if it is pure unix or mixed with DOS.
DOS file: ASCII text, with CRLF line terminators
Unix file is pure ASCII file.

Resources