How to ensure file written with sed w command is closed - shell

I'm using the sed 'w' command to get the labels from a TeX document using:
/\\label{[a-zA-Z0-9]*}/w labels.list
This script is part of a pipeline in which, later on, awk reads the file that sed has just written. e.g
cat bob | sed -f sedScript | awk -f awkScript labels.list -
Sometimes the pipeline produces the correct output, sometimes it doesn't (for exactly the same input file 'bob'). It's random.
I can only conclude that sometimes awk tries to read the file before sed has closed it properly. Is there anyway I can force sed to close the file at the end of the script, or any other suggestions as to what the problem may be?

All stages in a pipeline run in parallel. This is an extremely important and defining feature of pipes, and there is nothing you can or should attempt to do in order to prevent or circumvent that.
Instead, you should rewrite your script so that all data dependencies are executed and finished in the order you need them to be. In the general case, you'd do
cat bob | sed -f sedScript > tempfile
cat tempfile | awk -f awkScript labels.list -
or equivalently in your case:
grep '\\label{[a-zA-Z0-9]*}' bob > labels.list
awk -f awkScript labels.list bob

Related

2 tail -f with different command

I'm trying to make a shell script to monitor a log file, but I have a problem which is I can't do two tail at the same time.
the script basically is to search for a word if it matches it will redirect the 3 lines include the matched word into a file then I will pruning the useless information to extract what I want.
I tried the commands below and it's working fine but when I mirage it in a file it doesn't work
please advise :)
below is a part of the script,
#!/bin/bash
#grep error log
tail -f /FileLogging.log | grep 'error' >>/home/hello/tech.txt
#pruning useless information
tail -f /home/hello/tech.txt perl -nle "print $1 if /sam-(.+?)\",\"jack/" >>/home/hello/non.txt
Now I detected, the there only one source is watched. So this command should combine both of your example:
tail -f /FileLogging.log | grep 'error' | tee -a /home/hello/tech.txt | perl ... >>/home/hello/non.txt

Unexpected or empty output from tee command [duplicate]

This question already has answers here:
Why does reading and writing to the same file in a pipeline produce unreliable results?
(2 answers)
Closed 3 years ago.
echo "hello" | tee test.txt
cat test.txt
sudo sed -e "s|abc|def|g" test.txt | tee test.txt
cat test.txt
Output:
The output of 2nd command and last command are different, where as the command is same.
Question:
The following line in above script gives an output, but why it is not redirected to output file?
sudo sed -e "s|abc|def|g" test.txt
sudo sed -e "s|abc|def|g" test.txt | tee test.txt
Reading from and writing to test.txt in the same command line is error-prone. sed is trying to read from the file at the same time that tee wants to truncate it and write to it.
You can use sed -i to modify a file in place. There's no need for tee. (There's also no need for sudo. You made the file, no reason to ask for root access to read it.)
sed -e "s|abc|def|g" -i test.txt
You shouldn't use the same file for both input and output.
tee test.txt is emptying the output file when it starts up. If this happens before sed reads the file, sed will see an empty file. Since you're running sed through sudo, it's likely to take longer to start up, so this is very likely.

How to delete a line (matching a pattern) from a text file? [duplicate]

How would I use sed to delete all lines in a text file that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
There are many other ways to delete lines with specific string besides sed:
AWK
awk '!/pattern/' file > temp && mv temp file
Ruby (1.9+)
ruby -i.bak -ne 'print if not /test/' file
Perl
perl -ni.bak -e "print unless /pattern/" file
Shell (bash 3.2 and later)
while read -r line
do
[[ ! $line =~ pattern ]] && echo "$line"
done <file > o
mv o file
GNU grep
grep -v "pattern" file > temp && mv temp file
And of course sed (printing the inverse is faster than actual deletion):
sed -n '/pattern/!p' file
You can use sed to replace lines in place in a file. However, it seems to be much slower than using grep for the inverse into a second file and then moving the second file over the original.
e.g.
sed -i '/pattern/d' filename
or
grep -v "pattern" filename > filename2; mv filename2 filename
The first command takes 3 times longer on my machine anyway.
The easy way to do it, with GNU sed:
sed --in-place '/some string here/d' yourfile
You may consider using ex (which is a standard Unix command-based editor):
ex +g/match/d -cwq file
where:
+ executes given Ex command (man ex), same as -c which executes wq (write and quit)
g/match/d - Ex command to delete lines with given match, see: Power of g
The above example is a POSIX-compliant method for in-place editing a file as per this post at Unix.SE and POSIX specifications for ex.
The difference with sed is that:
sed is a Stream EDitor, not a file editor.BashFAQ
Unless you enjoy unportable code, I/O overhead and some other bad side effects. So basically some parameters (such as in-place/-i) are non-standard FreeBSD extensions and may not be available on other operating systems.
I was struggling with this on Mac. Plus, I needed to do it using variable replacement.
So I used:
sed -i '' "/$pattern/d" $file
where $file is the file where deletion is needed and $pattern is the pattern to be matched for deletion.
I picked the '' from this comment.
The thing to note here is use of double quotes in "/$pattern/d". Variable won't work when we use single quotes.
You can also use this:
grep -v 'pattern' filename
Here -v will print only other than your pattern (that means invert match).
To get a inplace like result with grep you can do this:
echo "$(grep -v "pattern" filename)" >filename
I have made a small benchmark with a file which contains approximately 345 000 lines. The way with grep seems to be around 15 times faster than the sed method in this case.
I have tried both with and without the setting LC_ALL=C, it does not seem change the timings significantly. The search string (CDGA_00004.pdbqt.gz.tar) is somewhere in the middle of the file.
Here are the commands and the timings:
time sed -i "/CDGA_00004.pdbqt.gz.tar/d" /tmp/input.txt
real 0m0.711s
user 0m0.179s
sys 0m0.530s
time perl -ni -e 'print unless /CDGA_00004.pdbqt.gz.tar/' /tmp/input.txt
real 0m0.105s
user 0m0.088s
sys 0m0.016s
time (grep -v CDGA_00004.pdbqt.gz.tar /tmp/input.txt > /tmp/input.tmp; mv /tmp/input.tmp /tmp/input.txt )
real 0m0.046s
user 0m0.014s
sys 0m0.019s
Delete lines from all files that match the match
grep -rl 'text_to_search' . | xargs sed -i '/text_to_search/d'
SED:
'/James\|John/d'
-n '/James\|John/!p'
AWK:
'!/James|John/'
/James|John/ {next;} {print}
GREP:
-v 'James\|John'
perl -i -nle'/regexp/||print' file1 file2 file3
perl -i.bk -nle'/regexp/||print' file1 file2 file3
The first command edits the file(s) inplace (-i).
The second command does the same thing but keeps a copy or backup of the original file(s) by adding .bk to the file names (.bk can be changed to anything).
You can also delete a range of lines in a file.
For example to delete stored procedures in a SQL file.
sed '/CREATE PROCEDURE.*/,/END ;/d' sqllines.sql
This will remove all lines between CREATE PROCEDURE and END ;.
I have cleaned up many sql files withe this sed command.
echo -e "/thing_to_delete\ndd\033:x\n" | vim file_to_edit.txt
Just in case someone wants to do it for exact matches of strings, you can use the -w flag in grep - w for whole. That is, for example if you want to delete the lines that have number 11, but keep the lines with number 111:
-bash-4.1$ head file
1
11
111
-bash-4.1$ grep -v "11" file
1
-bash-4.1$ grep -w -v "11" file
1
111
It also works with the -f flag if you want to exclude several exact patterns at once. If "blacklist" is a file with several patterns on each line that you want to delete from "file":
grep -w -v -f blacklist file
to show the treated text in console
cat filename | sed '/text to remove/d'
to save treated text into a file
cat filename | sed '/text to remove/d' > newfile
to append treated text info an existing file
cat filename | sed '/text to remove/d' >> newfile
to treat already treated text, in this case remove more lines of what has been removed
cat filename | sed '/text to remove/d' | sed '/remove this too/d' | more
the | more will show text in chunks of one page at a time.
Curiously enough, the accepted answer does not actually answer the question directly. The question asks about using sed to replace a string, but the answer seems to presuppose knowledge of how to convert an arbitrary string into a regex.
Many programming language libraries have a function to perform such a transformation, e.g.
python: re.escape(STRING)
ruby: Regexp.escape(STRING)
java: Pattern.quote(STRING)
But how to do it on the command line?
Since this is a sed-oriented question, one approach would be to use sed itself:
sed 's/\([\[/({.*+^$?]\)/\\\1/g'
So given an arbitrary string $STRING we could write something like:
re=$(sed 's/\([\[({.*+^$?]\)/\\\1/g' <<< "$STRING")
sed "/$re/d" FILE
or as a one-liner:
sed "/$(sed 's/\([\[/({.*+^$?]\)/\\\1/g' <<< "$STRING")/d"
with variations as described elsewhere on this page.
cat filename | grep -v "pattern" > filename.1
mv filename.1 filename
You can use good old ed to edit a file in a similar fashion to the answer that uses ex. The big difference in this case is that ed takes its commands via standard input, not as command line arguments like ex can. When using it in a script, the usual way to accomodate this is to use printf to pipe commands to it:
printf "%s\n" "g/pattern/d" w | ed -s filename
or with a heredoc:
ed -s filename <<EOF
g/pattern/d
w
EOF
This solution is for doing the same operation on multiple file.
for file in *.txt; do grep -v "Matching Text" $file > temp_file.txt; mv temp_file.txt $file; done
I found most of the answers not useful for me, If you use vim I found this very easy and straightforward:
:g/<pattern>/d
Source

Optimize sed for multiple replacements

I have a file, users.txt, with words like,
user1
user2
user3
I want to find these words in another file, data.txt and add a prefix to it. data.txt has nearly 500K lines. For example, user1 should be replaced with New_user1 and so on. I have written simple shell script like
for user in `cat users.txt`
do
sed -i 's/'${user}'/New_&/' data.txt
done
For ~1000 words, this program is taking minutes to process, which surprised me because sed is very fast when to comes to find and replace. I tried to refer to Optimize shell script for multiple sed replacements, but still not much improvement was observed.
Is there any other way to make this process faster?
Sed is known to be very fast (probably only worse than C).
Instead of sed 's/X/Y/g' input.txt, try sed '/X/ s/X/Y/g' input.txt. The latter is known to be faster.
Since you only have a "one line at a time semantics", you could run it with parallel (on multi-core cpu-s) like this:
cat huge-file.txt | parallel --pipe sed -e '/xxx/ s/xxx/yyy/g'
If you are working with plain ascii files, you could speed it up by using "C" locale:
LC_ALL=C sed -i -e '/xxx/ s/xxx/yyy/g' huge-file.txt
You can turn your users.txt into sed commands like this:
$ sed 's|.*|s/&/New_&/|' users.txt
s/user1/New_user1/
s/user2/New_user2/
s/user3/New_user3/
And then use this to process data.txt, either by writing the output of the previous command to an intermediate file, or with process substitution:
sed -f <(sed 's|.*|s/&/New_&/|' users.txt) data.txt
Your approach goes through all of data.txt for every single line in users.txt, which makes it slow.
If you can't use process substitution, you can use
sed 's|.*|s/&/New_&/|' users.txt | sed -f - data.txt
instead.
Or.. in one go, we can do something like this. Let us say, we have a data file with 500k lines.
$>
wc -l data.txt
500001 data.txt
$>
ls -lrtha data.txt
-rw-rw-r--. 1 gaurav gaurav 16M Oct 5 00:25 data.txt
$>
head -2 data.txt ; echo ; tail -2 data.txt
0|This is a test file maybe
1|This is a test file maybe
499999|This is a test file maybe
500000|This is a test file maybe
Let us say that our users.txt has 3-4 keywords, which are to be prefixed with "ab_", in the file "data.txt"
$>
cat users.txt
file
maybe
test
So we want to read users.txt and for every word, we want to change that word to a new word. For ex., "file" to "ab_file", "maybe" to "ab_maybe"..
We can run a while loop, read the input words to be prefixed one by one, and then we run a perl command over the file with the input word stored in a variable. In below example, read word is passed to perl command as $word.
I timed this task and this happens fairly quickly. Did it on my VM hosted on my windows 10 (using Centos7).
time cat users.txt |while read word; do perl -pi -e "s/${word}/ab_${word}/g" data.txt; done
real 0m1.973s
user 0m1.846s
sys 0m0.127s
$>
head -2 data.txt ; echo ; tail -2 data.txt
0|This is a ab_test ab_file ab_maybe
1|This is a ab_test ab_file ab_maybe
499999|This is a ab_test ab_file ab_maybe
500000|This is a ab_test ab_file ab_maybe
In above code, we read the words: test, file, maybe and changed it to ab_test, ab_file, ab_maybe in the data.txt file. head and tail count confirms our operation.
cheers,
Gaurav

log parsing with sed or grep

I want to grab data from this kind of log.
Nov 12 13:46:14 Home cxxd[8892]: 208 11/12 13:46:14| qc=IN (1), qt=A (1), query="www.yahoo.com."
Implemented this which gives me the URL. But does not work with "TAIL -F" so that I could monitor live just the urls.
tail -100 /var/log/system.log | grep "query=" | sed -e "s/.*query=//" | sed -e "s/\"//g" | sed -e "s/.$/ /"
Please suggest or enhance
I expect your multiple sed scripts do work with tail -F output, just not as you expect.
The C standard IO libraries will perform buffering to improve performance. The IO library can do (a) no buffering (b) line-buffering (c) block-buffering. The line-buffering is normally chosen if the output is going to a terminal. But if the output is going to a file or pipe, then block buffering is normally chosen. (It's more complicated than this -- the behavior changes if the file descriptor in question is being used for stdout or stderr or another file. See setvbuf(3) for full details.)
So, while the block-buffering you're seeing now is probably better for performance, it does mean you can wait a while before ever seeing any output, as each command will eventually accumulate a block of data. At least grep(1) allows the --line-buffered command line option to use line-buffering -- and sed(1) allows the --unbuffered command line option to flush output buffers more often. So try this:
tail -f /var/log/system.log | grep --line-buffered "query=" | sed -u -e "s/.*query=//" | sed -u -e "s/\"//g" | sed -u -e "s/.$/ /"
(I didn't find any similar options for tail(1), but even if it sends blocks of data to the others, the changes to grep(1) and sed(1) will drastically help.)
Try reducing the number of pipes by replacing multiple calls to grep and sed to one with awk:
tail -f /var/log/system.log | awk -F'=' '/query=/ { sub(/^"/, "", $NF); sub(/."$/, "", $NF); print $NF }'
...which takes every line matching "query=" and grabs everything after the last '=', replaces the first '"' and the trailing '."' and prints the result.
Try the tail -f and grep argument --line-buffered

Resources