cygwin awk print adds strange character to filename - bash

I'm using Cygwin on a Windows machine to grab some information from a remote Linux machine and write the result to a file. Here is my command:
ssh user#remotemachine ps -aef | grep vnc | grep -v grep | awk '{print "<size>"$11"<\/size>""\n""<colorDepth>"$13"<\/colorDepth>"}' > myfile.txt
However, when I then run
ls -l
on the directory where myfile.txt was written, it shows that the name of the file is actually myfile.txt? (with the added question mark). Where did that extra character come from and how can I get the print code to name the file correctly as simply myfile.txt
I would just run another command such as
mv myfile.txt? myfile.txt
or
mv myfile.txt^M myfile.txt
but in my bash script neither seems to find the file to rename it (though interestingly I can from the terminal (not in the script) start typing
mv myf
and then tab to complete the finding of the file, then finish the line with a new file name and that successfully renames the file.

Most likely your script uses Windows-style line endings. The end of the line looks like
... myfile.txt
but it's really:
... myfile.txt\r\n
where \r\n is the Windows CR-LF line ending. Which is how lines in Windows text files are supposed to end, but the shell doesn't recognize Windows-style line endings. It sees a valid line of text, but it sees the CR character as part of the line. So it treats "myfile.txt\r" as the file name.
How did you create the bash script file? If you used a Windows native editor, that explains the line endings.
Many editors (vim included) will automatically adapt to the line endings of a file, so you may not be able to delete the extra \r from your editor.
And ls displays non-printable characters like CR as ?.
Running file on the script will probably tell you about the line endings.
Filter the script through the dos2unix command. (Read the man page first; unlike most text filters, dos2unix updates its input file rather than writing to stdout.)
This should also work:
mv foo.sh foo.sh.bad
tr -d '\r' < foo.sh.bad > foo.sh
chmod +x foo.sh
(I created a backup copy first just in case something goes wrong, so you don't clobber your script.)

This:
ssh user#remotemachine ps -aef | grep vnc | grep -v grep | awk '{print "<size>"$11"<\/size>""\n""<colorDepth>"$13"<\/colorDepth>"}' > myfile.txt
can be rewritten to:
ssh user#remotemachine ps -aef | awk '/[v]nc/ {print "<size>"$11"<\/size>""\n""<colorDepth>"$13"<\/colorDepth>"}' > myfile.txt
To prevent grep to not find it self in ps use [first letter]
grep vnc | grep -v grep
grep [v]nc
and since awk can do this
awk '/[v]nc/ {some code}'

Related

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Echoing awk output to file to remove duplicates has strange output

I made a small shell script to try to remove duplicate entries (lines) from a text file. When the script is ran and the file has three lines, all identical, a strange output occurs.
The shell script is ran on an Ubuntu distribution.
The contents of my text file:
one
one
one
The script I am running to remove duplicates:
echo -e $(awk '!a[$0]++' /test/test.txt) > /test/test.txt
The awk is intended to delete duplicates, while the echo is intended to output it to a file.
Upon running my script, I receive the following output in the file:
one
one
It should also be noted that there is an additional newline after the second line, and a space at the start of the second line.
Writing to a file at the same time that you are reading from it usually leads to disaster.
If you have GNU awk, then use the -i inplace option:
$ cat text
one
one
one
$ gawk -i inplace '!a[$0]++' text
$ cat text
one
If you have BSD awk, then use:
awk '!a[$0]++' text >tmp && mv tmp text
Alternatively, if you have sponge installed:
awk '!a[$0]++' text | sponge text
sponge does not update the file until the pipeline has finished reading and processing it.

bash script prepending ? to file name

I am using the below script. When I have it echo $f as shown below, it gives the correct result:
#/bin/bash
var="\/home\/"
while read p; do
f=$(echo $p | sed "s/${var}/\\n/g")
f=${f%.sliced.bam}.fastq
echo $f
~/bin/samtools view $p | awk '{print "#"$1"\n"$10"\n+\n"$11}' > $f
./run.sh $f ${f%.fastq}
rm ${f%.sliced.bam}.fastq
done < $1
I get the output as expected
test.fastq
But the file being created by awk > $f has the name
?test.fastq
Note that the overall goal here is to run this loop on every file listed in a file with absolute paths but then write locally (which is what the sed call is for)
edit: Run directly on the command line (without variables) the samtools | awk line runs correctly.
Awk cannot possibly have anything to do with your problem. The shell is completely responsible for file redirection, so f MUST have a weird character in it.
Most likely whatever you are sending to this script has a special character in it (e.g. perhaps a UTF character, and your terminal is showing ASCII only). When you do the echo, the shell doesn't know how to display the char, and probably just shows it as whitespace, and when you send it through ls (which might be doing things like colorization) it combines in a strange way and ends up showing the ?.
Oh wait...why are you putting a newline into the filename with sed??? That is possibly your problem...try just:
sed "s/${var}//g"

Adding a tab character before external script output

So, i've got a shell script to automate some SVN commands. I output to both a logfile and stdout during the script, and direct the SVN output to /dev/null. Now i'd like to include the SVN output in my logging, but to seperate it from my own output i'd like to prepend a \t to each line of the SVN output. Can this be done with shell scripting?
Edit
Is this something i could use AWK for? I'll investigate!
Edit
So, using AWK seems to do the trick. Sadly i can't get it to work with the svn commands though.
svn add * | awk '{ print "\t"$0 }'
Outputs without the prepended tab character. But if i run for example ls
ls -l | awk '{ print "\t"$0 }'
The directory is listed with a tab character in front of each line.
Edit
Thanks #daniel! I ended up with this
svn add * 2>&1 | sed 's/^/\t/'
Might aswell note that awk works well for this, when used correctly
svn add * 2>&1 | awk '{print "\t"$0 }'
You can use Sed. Instead of redirecting the output of your SVN command to /dev/null, you can pipe it to Sed.
svn ls https://svn.example.com 2>&1 | sed 's/^/ /'

Extracting all lines from a file that are not commented out in a shell script

I'm trying to extract lines from certain files that do not begin with # (commented out). How would I run through a file, ignore everything with a # in front of it, but copy each line that does not start with a # into a different file.
Thanks
Simpler: grep -v '^[[:space:]]*#' input.txt > output.txt
This assumes that you're using Unix/Linux shell and the available Unix toolkit of commands AND that you want to keep a copy of the original file.
cp file file.orig
mv file file.fix
sed '/^[ ]*#/d' file.fix > file
rm file.fix
Or if you've got a nice shiny new GNU sed that all be summarized as
cp file file.orig
sed -i '/^[ ]*#/d' file
In both cases, the regexp in the sed command is meant to be [spaceCharTabChar]
So you saying, delete any line that begins with an (optional space or tab chars) #, but print everything else.
I hope this helps.
grep -v ^\# file > newfile
grep -v ^\# file | grep -v ^$ > newfile
Not fancy regex, but I provide this method to Jr. Admins as it helps with understanding of pipes and redirection.

Resources