I am a trying to use taint mode. I want to open a file based on user input and open a file to read data. Below is my code
#!/usr/bin/perl -w
use strict;
use warnings;
my $name = $ARGV[0];
my $file = "/Desktop/data/$name";
open MYFILE, "$file" or die $!;
while (<MYFILE>) {
chomp;
print "$_\n";
}
close(MYFILE);
case 1) When I run file using
perl -w filename.pl input.txt
I am able to read data from the file.
case 2) When I change the
#!/usr/bin/perl -w
to
#!/usr/bin/perl -T
and run the file using
perl -T filename.pl input.txt
I am still able to read the data.
case 3)When I change file to open in write mode and run in tainted mode I get correct output as,
Insecure dependency in open while running with -t switch at test1.pl line 8.
What might be issue with case two scenarios? Or is that a correct behavior?
Is it allowed to open a file in taint mode for reading?
This is correct behaviour for taint mode. The documentation specifies:
You may not use data derived from outside your program to affect something else outside your program--at least, not by accident.
[...]
$arg = shift; # $arg is tainted
[...]
If you try to do something insecure, you will get a fatal error saying something like "Insecure dependency" or "Insecure $ENV{PATH}".
(edit: missed some stuff):
Tainted data may not be used directly or indirectly in any command that invokes a sub-shell, nor in any command that modifies files, directories, or processes, with the following exceptions:
Arguments to print and syswrite are not checked for taintedness.
(This is why the read-mode example doesn't complain about the file data.)
Command-line arguments are potentially insecure, and so are tainted until specified otherwise.
To determine whether data is tainted:
To test whether a variable contains tainted data, and whose use would thus trigger an "Insecure dependency" message, you can use the tainted() function of the Scalar::Util module, available in your nearby CPAN mirror, and included in Perl starting from the release 5.8.0.
To untaint data:
[...]the only way to bypass the tainting mechanism is by referencing subpatterns from a regular expression match. Perl presumes that if you reference a substring using $1, $2, etc., that you knew what you were doing when you wrote the pattern. That means using a bit of thought--don't just blindly untaint anything, or you defeat the entire mechanism. It's better to verify that the variable has only good characters (for certain values of "good") rather than checking whether it has any bad characters. That's because it's far too easy to miss bad characters that you never thought of.
(with a warning for use locale):
If you are writing a locale-aware program, and want to launder data with a regular expression containing \w, put no locale ahead of the expression in the same block. See SECURITY in perllocale for further discussion and examples.
This prevents the following from wiping out your hard drive:
perl script.pl '| rm -rf /'
Solution: Use the form of open that only accepts a file name.
open(my $fh, '<', $ARGV[0])
Related
I have a script that I call with an application, I can't run it from command line. I derive the directory where the script is called and in the next variable go up 1 level where my files are stored. From there I have 3 variables with the full path and file names (with wildcard), which I will refer to as "masks".
I need to find and "do something with" (copy/write their names to a new file, whatever else) to each of these masks. The do something part isn't my obstacle as I've done this fine when I'm working with a single mask, but I would like to do it cleanly in a single loop instead of duplicating loop and just referencing each mask separately if possible.
Assume in my $FILESFOLDER directory below that I have 2 existing files, aaa0.csv & bbb0.csv, but no file matching the ccc*.csv mask.
#!/bin/bash
SCRIPTFOLDER=${0%/*}
FILESFOLDER="$(dirname "$SCRIPTFOLDER")"
ARCHIVEFOLDER="$FILESFOLDER"/archive
LOGFILE="$SCRIPTFOLDER"/log.txt
FILES1="$FILESFOLDER"/"aaa*.csv"
FILES2="$FILESFOLDER"/"bbb*.csv"
FILES3="$FILESFOLDER"/"ccc*.csv"
ALLFILES="$FILES1
$FILES2
$FILES3"
#here as an example I would like to do a loop through $ALLFILES and copy anything that matches to $ARCHIVEFOLDER.
for f in $ALLFILES; do
cp -v "$f" "$ARCHIVEFOLDER" > "$LOGFILE"
done
echo "$ALLFILES" >> "$LOGFILE"
The thing that really spins my head is when I run something like this (I haven't done it with the copy command in place) that log file at the end shows:
filesfolder/aaa0.csv filesfolder/bbb0.csv filesfolder/ccc*.csv
Where I would expect echoing $ALLFILES just to show me the masks
filesfolder/aaa*.csv filesfolder/bbb*.csv filesfolder/ccc*.csv
In my "do something" area, I need to be able to use whatever method to find the files by their full path/name with the wildcard if at all possible. Sometimes my network is down for maintenance and I don't want to risk failing a change directory. I rarely work in linux (primarily SQL background) so feel free to poke holes in everything I've done wrong. Thanks in advance!
Here's a light refactoring with significantly fewer distracting variables.
#!/bin/bash
script=${0%/*}
folder="$(dirname "$script")"
archive="$folder"/archive
log="$folder"/log.txt # you would certainly want this in the folder, not $script/log.txt
shopt -s nullglob
all=()
for prefix in aaa bbb ccc; do
cp -v "$folder/$prefix"*.csv "$archive" >>"$log" # append, don't overwrite
all+=("$folder/$prefix"*.csv)
done
echo "${all[#]}" >> "$log"
The change in the loop to append the output or cp -v instead of overwrite is a bug fix; otherwise the log would only contain the output from the last loop iteration.
I would probably prefer to have the files echoed from inside the loop as well, one per line, instead of collect them all on one humongous line. Then you can remove the array all and instead simply
printf '%s\n' "$folder/$prefix"*.csv >>"$log"
shopt -s nullglob is a Bash extension (so won't work with sh) which says to discard any wildcard which doesn't match any files (the default behavior is to leave globs unexpanded if they don't match anything). If you want a different solution, perhaps see Test whether a glob has any matches in Bash
You should use lower case for your private variables so I changed that, too. Notice also how the script variable doesn't actually contain a folder name (or "directory" as we adults prefer to call it); fixing that uncovered a bug in your attempt.
If your wildcards are more complex, you might want to create an array for each pattern.
tmpspaces=(/tmp/*\ *)
homequest=($HOME/*\?*)
for file in "${tmpspaces[#]}" "${homequest[#]}"; do
: stuff with "$file", with proper quoting
done
The only robust way to handle file names which could contain shell metacharacters is to use an array variable; using string variables for file names is notoriously brittle.
Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
I need to rename the files inside the folder that has a space in it eg(Deco/main library/file1.txt )
code:
while IFS="," read orig new pat
do
mv -v $pat$new $pat$orig
done < new.csv
csv file:
newname,file1.txt,Deco/main\\\ library/
error:
mv: invalid option -- '\'
Welcome to Stackoverflow!
First: Use quotes around the use of variables. That means except in very rare occasions, you always should use "$foo" instead of $foo because if you are using the latter, the shell is supposed (and will) interpret spaces in the variables as word delimiters which you rarely want. Especially in your case you do not want it.
Second: Your CSV file seems to contain backslashes to quote the spaces. And some additional step seems to have added another level of quotation so than now you end up with three backslashes and a space for each original space. If this really is the case (please double check if what you wrote in your question is correct, otherwise my answer doesn't fit), you need to unquote this before you can use it.
There are security issues involved in using eval, so do not use it lightly (this disclaimer is necessary whenever proposing to use eval), but if you have trust in the input you are handling to not contain any nastinesses, then you can do this using this code:
while IFS="," read orig new pat
do
eval eval mv -v "$pat$new" "$pat$orig"
done < new.csv
Using this, two levels of quotation are evaluated (that's what eval does) before the mv command is executed.
I strongly suggest to do a dry run by adding echo before the mv first. Then instead of executing your commands they are merely printed first.
I am new to the scripting languages, and I have a task that says I need to extract the name from a given argument from the command line in erl.
I am calling the Perl script like this
./perl.plx file.txt
and I need to get only that file name, not the whole file.txt
The command line arguments to a Perl script appear in #ARGV, described in perldoc perlvar.
Parsing filenames seems trivial, but appearances may be misleading. However, Perl ships with a module called File::Basename that handles edge cases you might not immediately consider. One edge case that simple split wouldn't handle is the potential for dots to appear elsewhere in the filename aside from the final suffix.
You can review File::Basename's documentation by typing perldoc File::Basename at the command prompt.
Here is an example:
use stict;
use warnings;
use File::Basename qw(fileparse);
my ($fname, $dirs, $suffix) = fileparse($ARGV[0], qr/\.txt/);
print "Base file name is $fname\n";
print "Suffix is $suffix\n";
print "Path to $fname$suffix is $dirs\n";
Because this module ships with Perl, you don't need to install anything to use it. In taking advantage of the core Perl modules that ship with every Perl distribution, you leverage best practices and debugging embodied within these tools.
To get the name of the file, you need to extract it fromĀ #ARGV, and split it on the dot.
my $fileName = (split /\./, $ARGV[0])[0];
Explaination:
split /\./, $ARGV[0] splits on a dot (the character "." is a special character in regular expressions, it means 'any character'; to have it literally, you need to escape it, thus, /\./).
(...)[0]; takes the first element, i.e. the file name.
I'm setting up a regex learning environment purely in bash/tmux with a pane for the file containing a regex, a pane for a text-file-for-processing, and a pane for the bash shell. I'm at the start of "The Bastards Book of Ruby"-regex chapter.
The 'Bastard's Book' shows an example of a 'negative-lookahead' regex (perfect, lets learn), where perl is recommended over sed. As I'm going for a CLI approach-> Bash command: $ perl -p file_with_regex.pl test.txt
(This prints the lines from test.txt with the intended substitutions)
Question: How would I add a second regex (on a new line) of the regex.pl file, and have perl execute both the first and (next) this second instruction for processing the text file?
# regex.pl
s/^(?!Mr)/Ms./g
s/Ms./Mrs./g
(Adding the second regex results in "Execution of regex.pl aborted due to compilation errors.")
The overall aim here is to progress in Ruby, while testing Regular Expressions as concisely as possible. Picking up a bare minimum of sed/perl while doing so would be a plus, as a proper dive into perl would take time from Ruby (and when it's time for the perl dive, I'll have had some time with the basics). The more I look at this the more it seems necessary to just do it in Ruby, if there isn't a perl switch that would enable a command-line-with-files approach.
The basic answer is that you need a semicolon after each line.
Paraphrased from perlrun, -p reads all lines of input, runs the commands you specified, and then prints out the value in $_ (the implicit variable you're running your substitute commands on in this script).
So, removing the magic, -p transformed your code into:
LINE:
while (<>) {
# regex.pl
s/^(?!Mr)/Ms./g
s/Ms./Mrs./g
} continue {
print or die "-p destination: $!\n";
}
Perl requires a semicolon between statements (but a terminal semicolon at the end of a block is optional) hence the error.
I personally would recommend writing the whole script above into the file instead of using -p because it is far less magical, but you're welcome to do it either way.
If you were going to write the whole script, I would recommend something more like the following:
use strict;
use warnings;
while ( my $line = <ARGV> ) {
$line =~ s/^(?!Mr)/Ms./g;
print "After first subst: $line";
$line =~ s/Ms./Mrs./g;
print "After second subst: $line";
}
use strict and use warnings are the boilerplate you want at the top of any perl script (to catch typos and other common mistakes) and explicitly calling the variable $line gives you a better understanding of how the script is working ($_ is very magical for beginners and the source of many errors IMO, but great when you know what's what).
If you're wondering about <> vs. <ARGV> they are the same thing and mean "Read through all the lines of files provided as command-line arguments to this script or standard input if no files are provided"."
I tried to play with Strawberry Perl, and one of the things that stumped me was reading the files.
I tried to do:
open(FH, "D:\test\numbers.txt");
But it can not find the file (despite the file being there, and no permissions issues).
An equivalent code (100% of the script other than the filename was identical) worked fine on Linux.
As per Perl FAQ 5, you should be using forward slashes in your DOS/Windows filenames (or, as an alternative, escaping the backslashes).
Why can't I use "C:\temp\foo" in DOS paths? Why doesn't `C:\temp\foo.exe` work?
Whoops! You just put a tab and a formfeed into that filename! Remember that within double quoted strings ("like\this"), the backslash is an escape character. The full list of these is in Quote and Quote-like Operators in perlop. Unsurprisingly, you don't have a file called "c:(tab)emp(formfeed)oo" or "c:(tab)emp(formfeed)oo.exe" on your legacy DOS filesystem.
Either single-quote your strings, or (preferably) use forward slashes. Since all DOS and Windows versions since something like MS-DOS 2.0 or so have treated / and \ the same in a path, you might as well use the one that doesn't clash with Perl--or the POSIX shell, ANSI C and C++, awk, Tcl, Java, or Python, just to mention a few. POSIX paths are more portable, too.
So your code should be open(FH, "D:/test/numbers.txt"); instead, to avoid trying to open a file named "D:<TAB>est\numbers.txt"
As an aside, you could further improve your code by using lexical (instead of global named) filehandle, a 3-argument form of open, and, most importantly, error-checking ALL your IO operations, especially open() calls:
open(my $fh, "<", "D:/test/numbers.txt") or die "Could not open file: $!";
Or, better yet, don't hard-code filenames in IO calls (the following practice MAY have let you figure out a problem sooner):
my $filename = "D:/test/numbers.txt";
open(my $fh, "<", $filename) or die "Could not open file $filename: $!";
Never use interpolated strings when you don't need interpolation! You are trying to open a file name with a tab character and a newline character in it from the \t and the \n!
Use single quotes when you want don't need (or want) interpolation.
One of the biggest problems novice Perl programmers seem to run into is that they automatically use "" for everything without thinking. You need to understand the difference between "" and '' and you need to ALWAYS think before you type so that you choose the right one. It's a hard habit to get into, but it's vital if you're going to write good Perl.