Using Shell to Check Whether a File Exists, and only if it does, Execute a Set of Commands - shell

I have a few lines of code in Stata. I'd like the lines to be executed only if the .txt file to which the lines refer exist a priori. I am wondering whether there is a shell command that I can use for this that I can embed in an if statement.
For example might something like the following exist and be possible:
insheet using "file.txt" if ('file.txt')
My intent is to say insheet the file file.txt only if it exists. My concern is that the program would otherwise stop, fail, die, or whatever you call it due to a syntax error if I have that insheet statement but the file does not exist.

Immediate answer is No. There is nothing like that syntax for several reasons.
The if qualifier tests whether some condition is true separately for each observation and whether a file exists is not an appropriate condition for testing observation by observation.
The quite different if command tests once and once only whether something is true and might seem more appropriate. In practice it is not used for this purpose, but to learn more, see help ifcmd.
Stata has no special syntax based on paired identical single quotes ' '.
However, Stata provides a separate construct here
confirm file file.txt
In practice that is going to stop a do-file or program whenever the file does not exist and the file does not exist. A general scheme to catch the error is something like
capture confirm file file.txt
if _rc == 0 insheet using file.txt
else {
<code if the file does not exist>
}
capture is to be thought of as eating the return code from the confirm command. In general the return code _rc from any command is 0 if the command was valid and executed and some non-zero value otherwise. Sometimes one tests for a specific non-zero code. Experiment shows that file not found is return code 601. The main reason for looking up error codes (in [P] error) is to deliver official-looking error messages, but in practice knowing the zero/non-zero rule is the main detail under this heading.
The example here uses == to test for equality.
Note that insheet using file.txt is not strictly a syntax error if the file does not exist. As far as Stata's language is concerned, that is legal syntax. However, that is a fine distinction: it is an error in every ordinary sense.
(LATER) It would be possible to short-circuit the entire process
capture insheet using file.txt
if _rc != 0 {
<code if the file does not exist>
}
as in this case the non-existence of the file is the presumed explanation for any failure of the insheet command. If, however, the insheet call were more complicated, with a varlist and/or options, then failure of the command could arise for other reasons. So in general separating out a check for the existence of the file seems a better strategy.

The confirm command has what you're looking for.
capture confirm file "file.txt"
if !_rc { # if the file exists, confirm will return error code 0
insheet using "file.txt"
}
Alternatively, you could put a capture before the insheet command, which will catch the syntax error. Check the [P] manual for more on capture and confirm.

Related

Assign BASH variable from file with specific criteria

A config file that the last line contains data that I want to assign everything to the RIGHT of the = sign into a variable that I can display and call later in the script.
Example: /path/to/magic.conf:
foo
bar
ThisOption=foo.bar.address:location.555
What would be the best method in a bash shell script to read the last line of the file and assign everything to the right of the equal sign? In this case, foo.bar.address:location.555.
The last line always has what I want to target and there will only ever be a single = sign in the file that happens to be the last line.
Google and searching here yielded many close but non-relative results with using sed/awk but I couldn't come up with exactly what I'm looking for.
Use sed:
variable=$(sed -n 's/^ThisOption=//p' /path/to/magic.conf)
echo "The option is: $variable")
This works by finding and removing the ThisOption= marker at the start of the line, and printing the result.
IMPORTANT: This method absolutely requires that the file be trusted 100%. As mentioned in the comments, anytime you "eval" code without any sanitization there are grave risks (a la "rm -rf /" magnitude - don't run that...)
Pure, simple bash. (well...using the tail utility :-) )
The advantage of this method, is that it only requires you to know that it will be the last line of the file, it does not require you to know any information about that line (such as what the variable to the left of the = sign will be - information that you'd need in order to use the sed option)
assignment_line=$(tail -n 1 /path/to/magic.conf)
eval ${assignment_line}
var_name=${assignment_line%%=*}
var_to_give_that_value=${!var_name}
Of course, if the var that you want to have the value is the one that is listed on the left side of the "=" in the file then you can skip the last assignment and just use "${!var_name}" wherever you need it.

'no such file or directory' on a file that isn't accessed

I'm writing a small Ruby script that does a statistical analysis on a list of names generated by another script of mine.
When I run it with this command:
ruby [first script] [args] | ruby -- [second script] _
it throws this error:
./name_gen_test.rb:15:in `gets': No such file or directory # rb_sysopen - _ (Errno:ENOENT)
from .name_gen_test.rb:15:in `gets'
from .name_gen_test.rb:15:in `<main>'
(Apologies for typos; Powershell wouldn't let me copy/paste)
This is line 15:
until (cur_line = gets).nil?
Then there's the body of a loop, the rest of the code, etc. However, if I put this line:
gets
as the very first line, I get the same error. In fact, if I totally empty the file and have nothing but a call to gets, I get the error that the file '_' cannot be found.
How can I make it understand that '_' is a command line argument and not a file to be... read from, I guess? Why doesn't gets work like I expect it to (i.e. reading from the standard input)?
I'm running it with Powershell, if that makes a difference.
Sorry if this is a duplicate; simply Googling the error message leads to a dozen different issues and a dozen different solutions, none of which apply, and I couldn't figure out how to put this problem into a Google query.
STDIN.gets will do what you want. By default, gets is (pretty much) equivalent to ARGF.gets. ARGF reads from standard input if there are no ARGS, and from files that correspond to ARGS if there are.

using sylfilter with procmail

I have been using sylfilter for over a year now (it is available from http://sylpheed.sraoss.jp/sylfilter/) and it works great as a filtering tool (no complaints). However, I have been trying to use procmail with sylfilter, but have been having a lot of trouble.
The web page for the filter shows:
sylfilter ~/Mail/inbox/1234
as the example to classify a message.
The return values are as following:
0 junk (spam)
1 clean (non-spam)
2 uncertain
127 other errors
I have been trying to incorporate sylfilter with procmail but not with much success. The big issue as compared with some other spam tool like bogofilter is that sylfilter does not make any changes to the e-mail message itself
(unlike bogofilter, for which examples abound on the web, and which
puts in a X-Bogosity field in the message header). I want everything
that is classified as Junk to go to $HOME/Mail/Junk and everything that
is not to be further classified into folders such as procmail rules.
Perhaps the stuff that returns 2 can go to $HOME/Mail/uncertain.
Here is my latest attempt based on suggestions made in the Fedora mailing list.
:0 Wc
| /usr/bin/sylfilter /dev/stdin
:0 a
$HOME/Mail/Junk/.
However, this does not process the e-mail message using sylfilter (and
the logfile says "No input file." before going on to process the other
rules). So, I was wondering if anyone here knew of a similar case and knew the answer to this question.
I am not familiar with sylfilter, and the (somewhat vague) problem description makes me think there is something wrong with feeding it a message on standard input. But if you can make that work, the following is how you examine a program's exit code in Procmail.
:0
* ? sylfilter /dev/stdin
$HOME/Mail/Junk/.
# You should now have the exit code in $? if you want it for further processing
SYLSTATUS=$?
:0
* SYLSTATUS ?? ^^1^^
$HOME/Mail/INBOX/.
# ... etc
The condition succeeds if sylfilter returns a success (zero) exit code; if it fails, we fall through to subsequent recipes. We save $? to a named variable so that we can examine its value even if a subsequent recipe resets the system global $? by invoking some other external program.
By the by, you should not need to hard-code the path to sylfilter. If it's in a nonstandard location, amend the PATH at the beginning of your .procmailrc rather than littering your code with explicit paths to executables. So if it's in /usr/local/really/sf/sylfilter, you'd put
PATH=/usr/local/really/sf:$PATH
If you need the message in a temporary file, try something like this;
TMP=`mktemp -t sylf.XXXXXXXX`
TRAP='rm -f $TMP'
:0c
$TMP
:0
* ? sylfilter $TMP
$HOME/Mail/Junk/.
# etc as above
The mktemp command creates a unique temporary file. The TRAP assignment sets up a command sequence to run when Procmail terminates; this takes care of cleaning out the temporary file when we are done. Because we will be the only writer to this file, we don't care about locking while writing a copy of the message to this file.
For more nitty-gritty syntax details, see also http://www.iki.fi/era/procmail/quickref.html

Shell scripting return values not correct, why?

In a shell script I wrote to test how functions are returning values I came across an odd unexpected behavior. The code below assumes that when entering the function fnttmpfile the first echo statement would print to the console and then the second echo statement would actually return the string to the calling main. Well that's what I assumed, but I was wrong!
#!/bin/sh
fntmpfile() {
TMPFILE=/tmp/$1.$$
echo "This is my temp file dude!"
echo "$TMPFILE"
}
mainname=main
retval=$(fntmpfile "$mainname")
echo "main retval=$retval"
What actually happens is the reverse. The first echo goes to the calling function and the second echo goes to STDOUT. why is this and is there a better way....
main retval=This is my temp file dude!
/tmp/main.19121
The whole reason for this test is because I am writing a shell script to do some database backups and decided to use small functions to do specific things, ya know make it clean instead of spaghetti code. One of the functions I was using was this:
log_to_console() {
# arg1 = calling function name
# arg2 = message to log
printf "$1 - $2\n"
}
The whole problem with this is that the function that was returning a string value is getting the log_to_console output instead depending on the order of things. I guess this is one of those gotcha things about shell scripting that I wasn't aware of.
No, what's happening is that you are running your function, and it outputs two lines to stdout:
This is my temp file dude!
/tmp/main.4059
When you run it $(), bash will intercept the output and store it in the value. The string that is stored in the variable contains the first linebreak (the last one is removed). So what is really in your "retval" variable is the following C-style string:
"This is my temp file dude!\n/tmp/main.4059"
This is not really returning a string (can't do that in a shell script), it's just capturing whatever output your function returns. Which is why it doesn't work. Call your function normally if you want to log to console.

How to get Aruba to expand wildcards

I'm writing a simple command line gem.
The library that does the actual work was developed with rspec and so far that works.
I'm trying to test the command line portion with Aruba/Cucumber, but I've come across some strange behaviour.
Just to test this, I've got a the binary file to puts ARGV, and I've got test files in tmp/aruba
When I run bundle exec gem_name tmp/aruba/*.* I am presented with the list of shell expanded file names.
Now my features file has:
Given files to work on # I set up files in tmp/aruba in this step
When I run `gem_name *.*` # standard step
Then the output should contain "Wibble"
The last step is obviously going to fail, but it shows me a diff between what it expects and the actual output. Rather than seeing a list of shell expanded filenames, all I get is "*.*"
So I'm left in the position of having an app that actually works as expected, but I can't get the tests to pass. I could take the "." and generate the list of files from there, but then I'm writing extra production code just to get the app to work under test - which I don't think is the correct way to go about it. And all because shell expansion isn't happening.
If you look at my profile, you'll see that Ruby isn't my main bag, feel free to point me at any resources that I may have missed about this, but is this just me missing something, or expected behaviour that somebody knows how to work around?
After a little digging in the Aruba source I figured out that the When I run step ends up in a code block like this:
def run!(&block)
#process = ChildProcess.build(*shellwords(#cmd))
...
begin
#process.start
...
Further digging into ChildProcess ends up here:
def launch_process
...
begin
exec(*#args)
...
And therein lies the problem. exec does not do shell expansion when the argument list is split into multiple array elements:
If exec is given a single argument, that argument is
taken as a line that is subject to shell expansion before being
executed. If multiple arguments are given, the second and
subsequent arguments are passed as parameters to command with no
shell expansion.
However playing with shellwords a bit we find:
Shellwords.shellwords('gem_name *.*')
=> ["gem_name", "*.*"] # No good
Shellwords.shellwords('"gem_name *.*"')
=> ["gem_name *.*"] # Aha!
Therefore the solution might be as simple as:
When I run `"gem_name *.*"`
If that doesn't work then you are pretty much out of luck. I would suggest you expand the file names manually since you're not really testing shell expansion here - we know that works: you are testing multiple arguments.
Therefore you should instead do:
When I run `gem_name your_file1 your_file2 your_file3`

Resources