To process a bunch of data and get it ready to be inserted into our database, we generate a bunch of shell scripts. Each of them has about 15 lines, one for each table that the data is going. One a recent import batch, some of the import files failed going into one particular table. So, I have a bunch of shell scripts (about 600) where I need to comment out the first 7 lines, then rerun the file. There are about 6000 shell scripts in this folder, and nothing about a particular file can tell me if it needs the edit. I've got a list of which files that I pulled from the database output.
So how do I write a bash script (or anything else that would work better) to take this list of file names and for each of them, comment out the first 7 lines, and run the script?
EDIT:
#!/usr/bin/env sh
cmd1
cmd2
cmd3
cmd4
cmd5
cmd6
cmd7
cmd8
Not sure how readable that is. Basically, the first 7 lines (not counting the first line) need to have a # added to the beginning of them. Note: the files have been edited to make each line shorter and partially cut off copying out of VIM. But in the main part of each file, there is a line starting with echo, then a line starting with sqlldr
Using sed, you can specify a line number range in the file to be changed.
#!/bin/bash
while read line
do
# add a comment to beginning of lines 1 - 7 and rename the script
sed '3,9 s/^/#/' $line > $line.new
exec $line.new
done < "filelist.txt"
You may wish to test this before running it on all of those scripts...
EDIT: changed the lines numbers to reflect comments.
Roughly speaking:
#!/bin/sh
for file in "$#"
do
out=/tmp/$file.$$
sed '2,8s/^/#/' < $file > $out
$SHELL $out
rm -f $out
done
Assuming you don't care about checking for race conditions etc.
ex seems made for what you want to do.
For instance, for editing one file, with a here document:
#!/bin/sh
ex test.txt << END
1,12s/^/#/
wq
END
That'll comment out the first 12 lines in "test.txt". For your example you could try "$FILE" or similar (including quotes!).
Then run them the usual way, i.e. ./"$FILE"
edit: $SHELL "$FILE" is probably a better approach to run them (from one of the above commenters).
Ultimately you're going to want to use the linux command sed. Whatever logic you need to place in the script, you know. But your script will ultimately call sed. http://lowfatlinux.com/linux-sed.html
Related
I have a while loop in a bash script:
Example:
while read LINE
do
echo $LINE >> $log_file
done < ./sample_file
My question is why when I delete the sample_file while the script is running the loop doesn't end and I see that the log_file is updating? How the loop is continuing while there is no input?
In unix, a file isn't truly deleted until the last directory entry for it is removed (e.g. with rm) and the last open file handle for it is closed. See this question (especially MarkR's answer) for more info. In the case of your script, the file is opened as stdin for the while read loop, and until that loop exits (or closes its stdin), rming the file will not actually delete it off disk.
You can see this effect pretty easily if you want. Open three terminal windows. In the first, run the command cat >/tmp/deleteme. In the second, run tail -f /tmp/deleteme. In the third, after running the other two commands, run rm /tmp/deleteme. At this point, the file has been unlinked, but both the cat and tail processes have open file handles for it, so it hasn't actually been deleted. You can prove this by typing into the first terminal window (running cat), and every time your hit return, tail will see the new line added to the file and display it in the second window.
The file will not actually be deleted until you end those two commands (Control-D will end cat, but you need Control-C to kill tail).
See "Why file is accessible after deleting in unix?" for an excellent explanation of what you are observing here.
In short...
Underlying rm and any other command that may appear to delete a file
there is the system call unlink. And it's called unlink, not remove or
deletefile or anything similar, because it doesn't remove a file. It
removes a link (a.k.a. directory entry) which is an association
between a file and a name in a directory.
You can use the function truncate to destroy the actual contents (or shred if you need to be more secure), which would immediately halt the execution of your example loop.
The moment shell executes the while loop, the sample_file contents have been read, and it does not matter whether the file exists or not after that point.
Test script:
$ cat test.sh
#!/bin/bash
while read line
do
echo $line
sleep 1
done < data_file
Test file:
$ seq 1 10 > data_file
Now, in one terminal you run the script, in another terminal, you go and delete the file data_file, you would still see the 1 to 10 numbers printed by the script.
I'm writing commands that do something like ./script > output.txt so that I can use the files in later scripts like ./script2 output.txt otherFile.txt > output2.txt. I remove them all at the end of the script, but when I'm testing certain things or debugging it's tricky to search through all my sub directories and files which have been created in the script.
Is the best option just to create a hidden file?
As always, there are numerous ways to do so. If you want to avoid files altogether, you can save the output (STDOUT) of a command in a variable and pass it to the next command as a file using the <() operator:
output=$(cat /usr/include/stdio.h)
cat <(echo "$output")
Alternatively, you can do so in a single command line:
cat <(cat /usr/include/stdio.h)
This assumes that the next command strictly requires a file for input.
I tend to avoid temporary files whenever possible to eliminate the need for a cleanup step that gets executed in all cases unless large amounts of data have to be processed.
I have a bash script that prints a line of text into a file, and then calls a second script that prints some more data into the same file. Lets call them script1.sh and script2.sh. The reason it's split into two scripts, is because I have different versions of script2.sh.
script1.sh:
rm -f output.txt
echo "some text here" > output.txt
source script2.sh
script2.sh:
./read_time >> output.txt
./run_program
./read_time >> output.txt
Variations on the three lines in script2.sh are repeated.
This seems to work most of the time, but every once in a while the file output.txt does not contain the line "some text here". At first I thought it was because I was calling script2.sh like this: ./script2.sh. But even using source the problem still occurs.
The problem is not reproducible, so even when I try to change something I don't know if it's actually fixed.
What could be causing this?
Edit:
The scripts are very simple. script1 is exactly as you see here, but with different file names. script 2 is what I posted, but then the same 3 lines repeated, and ./run_program can have different arguments. I did a grep for the output file, and for > but it doesn't show up anywhere unexpected.
The way these scripts are used is that script1 is created by a program (the only difference between the versions is the source script2.sh line. This script1.sh is then run on a different computer (linux on an FPGA actually) using ssh. Before that is done, the output file is also deleted using ssh. I don't know why, but I didn't write all of this. Also, I've checked the code running on the host. The only mention of the output file is when it is deleted using ssh, and when it is copied back to the host after the script1 is done.
Edit 2:
I finally managed to make the problem reproducible at a reasonable rate by stripping script2.sh of everything but a single line printing into the file. This also let me do the testing a bit faster. Once I had this I got the problem between 1 and 4 times for every 10 runs. Removing the command that was deleting the file over ssh before the script was run seems to have solved the problem. I will test it some more to be sure, but I think it's solved. Although I'm still not sure why it would be a problem. I thought that the ssh command would not exit before all the remove commands were executed.
It is hard to tell without seeing the real code. Most likely explanation is that you have a typo, > instead of >>, somewhere in one of the script2.sh files.
To verify this, set noclobber option with set -o noclobber. The shell will then terminate when trying to write to existing file with >.
Another possibility, is that the file is removed under certain rare conditions. Or it is damaged by some command which can have random access to it - look for commands using this file without >>. Or it is used by some command both as input and output which step on each other - look for the file used with <.
Lastly, you can have a racing condition with a command outputting to the file in background, started before that echo.
Can you grep all your scripts for 'output.txt'? What about scripts called inside read_time and run_program?
It looks like something in one of the script2.sh scripts must be either overwriting, truncating or doing a substitution on output.txt.
For example,there could be a '> output.txt' burried inside a conditional for a condition that rarely obtains. Just a guess, but it would explain why you don't always see it.
This is an interesting problem. Please post the solution when you find it!
I wrote the following code
var=0
cat $file | while read line do
var=$line
done
echo $var
Now as I understand it the pipe (|) will cause a sub shell to be created an therefore the variable var on line 1 will have the same value on the last line.
However this will solve it:
var=0
while read line do
var=$line
done < $file
echo $line
My question is why does the redirect not cause a subshell to be created, or if you like why does pipe cause one to be created?
Thanks
The cat command is a command which means it needs its own process and has its own STDIN and STDOUT. You're basically taking the STDOUT produced by the cat command and redirecting it into the process of the while loop.
When you use redirection, you're not using a separate process. Instead, you're merely redirecting the STDIN of the while loop from the console to the lines of the file.
Needless to say, the second way is more efficient. In the old Usenet days before all of you little whippersnappers got ahold of our Internet (_Hey you kids! Get off of my Internet!) and destroyed it with your fancy graphics and all them web page, some people use to give out the Useless Use of Cat award for people who contributed to the comp.unix.shell group and had a spurious cat command because the use of cat is almost never necessary and is usually more inefficient.
If you're using a cat in your code, you probably don't need it. The cat command comes from concatenate and is suppose to be used only to concatenate files together. For example, when we use to use SneakerNet on 800K floppies, we would have to split up long files with the Unix split command and then use cat to merge them back together.
A pipe is there to hook the stdout of one program to the stdin or another one. Two processes, possibly two shells. When you do redirection (> and <), all you're doing remapping stdin (or stdout) to a file. reading/writing a file can be done without another process or shell.
UPDATE: this is a repost of How to make shell scripts robust to source being changed as they run
This is a little thing that bothers me every now and then:
I write a shell script (bash) for a quick and dirty job
I run the script, and it runs for quite a while
While it's running, I edit a few lines in the script, configuring it for a different job
But the first process is still reading the same script file and gets all screwed up.
Apparently, the script is interpreted by loading each line from the file as it is needed. Is there some way that I can have the script indicate to the shell that the entire script file should be read into memory all at once? For example, Perl scripts seem to do this: editing the code file does not affect a process that's currently interpreting it (because it's initially parsed/compiled?).
I understand that there are many ways I could get around this problem. For example, I could try something like:
cat script.sh | sh
or
sh -c "`cat script.sh`"
... although those might not work correctly if the script file is large and there are limits on the size of stream buffers and command-line arguments. I could also write an auxiliary wrapper that copies a script file to a locked temporary file and then executes it, but that doesn't seem very portable.
So I was hoping for the simplest solution that would involve modifications only to the script, not the way in which it is invoked. Can I just add a line or two at the start of the script? I don't know if such a solution exists, but I'm guessing it might make use of the $0 variable...
The best answer I've found is a very slight variation on the solutions offered to How to make shell scripts robust to source being changed as they run. Thanks to camh for noting the repost!
#!/bin/sh
{
# Your stuff goes here
exit
}
This ensures that all of your code is parsed initially; note that the 'exit' is critical to ensuring that the file isn't accessed later to see if there are additional lines to interpret. Also, as noted on the previous post, this isn't a guarantee that other scripts called by your script will be safe.
Thanks everyone for the help!
Use an editor that doesn't modify the existing file, and instead creates a new file then replaces the old file. For example, using :set writebackup backupcopy=no in Vim.
How about a solution to how you edit it.
If the script is running, before editing it, do this:
mv script script-old
cp script-old script
rm script-old
Since the shell keep's the file open as long as you don't change the contents of the open inode everything will work okay.
The above works because mv will preserve the old inode while cp will create a new one. Since a file's contents will not actually be removed if it is opened, you can remove it right away and it will be cleaned up once the shell closes the file.
According to the bash documentation if instead of
#!/bin/bash
body of script
you try
#!/bin/bash
script=$(cat <<'SETVAR'
body of script
SETVAR)
eval "$script"
then I think you will be in business.
Consider creating a new bang path for your quick-and-dirty jobs. If you start your scripts with:
#!/usr/local/fastbash
or something, then you can write a fastbash wrapper that uses one of the methods you mentioned. For portability, one can just create a symlink from fastbash to bash, or have a comment in the script saying one can replace fastbash with bash.
If you use Emacs, try M-x customize-variable break-hardlink-on-save. Setting this variable will tell Emacs to write to a temp file and then rename the temp file over the original instead of editing the original file directly. This should allow the running instance to keep its unmodified version while you save the new version.
Presumably, other semi-intelligent editors would have similar options.
A self contained way to make a script resistant to this problem is to have the script copy and re-execute itself like this:
#!/bin/bash
if [[ $0 != /tmp/copy-* ]] ; then
rm -f /tmp/copy-$$
cp $0 /tmp/copy-$$
exec /tmp/copy-$$ "$#"
echo "error copying and execing script"
exit 1
fi
rm $0
# rest of script...
(This will not work if the original script begins with the characters /tmp/copy-)
(This is inspired by R Samuel Klatchko's answer)