random file name in perl generated with unusual characters - bash

Using this perl code below, I try to output some names in a random generated file. But the files are created with weird characters like this:
"snp-list-boo.dwjEUq5Wu^J.txt"
And, obviously when my code looks for these files it says not such file. Also, when I try open the files using "vi", they open like this
vi 'temporary-files/snp-list-boo.dwjEUq5Wu
.txt'
i.e. with a "new line" in the file name. Someone please help me understand and solve this weird issue. Thanks much!
code:
my $tfile = `mktemp boo.XXXXXXXXX`;
my $fh = "";
foreach my $keys (keys %POS_HASH){
open ($fh, '>>', "temporary-files/snp-list-$tfile.txt");
print $fh "$keys $POS_HASH{$keys}\n";
close $fh;
}

mktemp returns a line feed character in its output that you need to chop() or chomp() first.
Instead of using the external mktemp program, why don't you go with File::Temp instead?

Using external programs unnecessarily is a bad idea for a few reasons.
The external program that you use might not be available on all of the systems where your code runs. You are therefore making your program less portable.
Spawning a new sub-shell to run an external program is a lot slower than just doing the work in your current environment.
The values you get back from the external program are likely to have a newline character attached. And you might forget to remove it.
It's the last one that is burning you here. But the others still apply as well.
Perl's standard library has, for many, many years included the File::Temp module which creates temporary files for you without the need to use an external program.
use File::Temp qw/ tempfile /;
# It even opens it and gives you the filehandle.
($fh, $filename) = tempfile();

Related

Perl code doesn't run in a bash script with scheduling of crontab

I want to schedule my Perl code to be run every day at a specific time. so I put the below code in bash file:
Automate.sh
#!/bin/sh
perl /tmp/Taps/perl.pl
The schedule has been specified in below path:
10 17 * * * sh /tmp/Taps/Automate.sh > /tmp/Taps/result.log
When the time arrived to 17:10 the .sh file hasn't been running. however, when I run ./Automate.sh (manually) it is running and I see the result. I don't know what is the problem.
Perl Code
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
use XML::Dumper;
use TAP3::Tap3edit;
$Data::Dumper::Indent=1;
$Data::Dumper::Useqq=1;
my $dump = new XML::Dumper;
use File::Basename;
my $perl='';
my $xml='';
my $tap3 = TAP3::Tap3edit->new();
foreach my $file(glob '/tmp/Taps/X*')
{
$files= basename($file);
$tap3->decode($files) || die $tap3->error;
}
my $filename=$files.".xml\n";
$perl = $tap3->structure;
$dump->pl2xml($perl, $filename);
print "Done \n";
error:
No such file or directory for file X94 at /tmp/Taps/perl.pl line 22.
X94.xml
foreach my $file(glob 'Taps/X*') -- when you're running from cron, your current directory is /. You'll want to provide the full path to that Taps directory. Also specify the output directory for Out.xml
Cron uses a minimal environment and a short $PATH, which may not necessarily include the expected path to perl. Try specifying this path fully. Or source your shell settings before running the script.
There are a lot of things that can go wrong here. The most obvious and certain one is that if you use a glob to find the file in directory "Taps", then remove the directory from the file name by using basename, then Perl cannot find the file. Not quite sure what you are trying to achieve there. The file names from the glob will be for example Taps/Xfoo, a relative path to the working directory. If you try to access Xfoo from the working directory, that file will not be found (or the wrong file will be found).
This should also (probably) lead to a fatal error, which should be reported in your error log. (Assuming that the decode function returns a false value upon error, which is not certain.) If no errors are reported in your error log, that is a sign the program does not run at all. Or it could be that decode does not return false on missing file, and the file is considered to be empty.
I assume that when you test the program, you cd to /tmp and run it, or your "Taps" directory is in your home directory. So you are making assumptions about where your program looks for the files. You should be certain where it looks for files, probably by using only absolute paths.
Another simple error might be that crontab does not have permission to execute the file, or no read access to "Taps".
Edit:
Other complications in your code:
You include Data::Dumper, but never actually use that module.
$xml variable is not used.
$files variable not declared (this code would never run with use strict)
Your $files variable is outside your foreach loop, which means it will only run once. Since you use glob I assumed you were reading more than one file, in which case this solution will probably not do what you want. It is also possible that you are using a glob because the file name can change, e.g. X93, X94, etc. In that case you will read the last file name returned by the glob. But this looks like a weak link in your logic.
You add a newline \n to a file name, which is strange.

How to remove a reoccuring word/character and what comes after, from the filenames of multiple files?

I have several folders of video files where, due to the download manager I use, they are all named in the following format "FILENAME.mp4; filename= FILENAME.mp4" All I've been trying to do is to remove everything after (and including) ".mp4; filename". However, I haven't found a way to do this.
I have tried some free software (such as Renamer, Namechanger, Name Munger for Mac, Transnomino) but I failed to do what I need to.
I'm working on Mac OSX 10.13.6.
Any help with this issue would be appreciated.
You can achieve it using Terminal. Go to the folder where you want to rename files using this cd command, for example:
cd ~/Documents/Videos
And run this command to rename all files recursively:
find . -iname "*.mp4;*" | sed -E 's/(\.[^\.]*)(\.mp4)(.*)/mv "\1\2\3" "\1\2"/' | sh
This command will keep only FILENAME.mp4 part from FILENAME.mp4; filename= FILENAME.mp4 file name
I used to extensively use a windows Rename tool called Renamer 6.0, and it had a "pattern rename" facility called "Multi change" that could have handled this.
In the context of that tool it would be asking for a source pattern like %a= %b and a destination pattern (like %b), everything after the = would be stored in %b variable and then renaming the file to just %b would lose everything after the =
See if your preferred rename tool has a similar facility?
If your tool supports regex, then find: .*?=(.*) and replace with $1
I'm also minded that asking this question on https://unix.stackexchange.com/ might elicit some help crafting a shell script that will perform this rename (though also plenty of shell capable people here, one of them may see it - it's just that it's not quite as hardcore programmer-y a question as most).
If you're willing to learn/use java, then that could be another good way to get the problem solved. It would (at a guess) look something like this:
for (final File f : new File("C:\\temp").listFiles()) {
if (f.isFile()) {
string n = f.getName();
if (n.contains("=")) {
f.renameTo(new File(n.substring(n.indexOf("=")+1));
}
}
}

Detecting that files are being copied in a folder

I am running a script which copies one folder from a specific location if it does not exist( or is not consistent). The problems appears when I run concurently the script 2+ times. As the first script is trying to copy the files, the second comes and tryes the same thing resulting in a mess. How could I avoid this situation? Something like system wide mutex.
I tryed a simple test with -w, I manually copied the folder and while the folder was copying I run the script:
use strict;
use warnings;
my $filename = 'd:\\folder_to_copy';
if (-w $filename) {
print "i can write to the file\n";
} else {
print "yikes, i can't write to the file!\n";
}
Of course this won't work, cuz I still have write acces to that folder.
Any ideea of how could I check if the folder is being copied in Perl or usingbatch commands?
Sounds like a job for a lock file. There are myriads of CPAN modules that implement lock files, but most of them don't work on Windows. Here are a few that seem to support Windows according to CPAN Testers:
File::Lockfile
File::TinyLock
File::Flock::Tiny
After having a quick view at the source code, the only module I can recommend is File::Flock::Tiny. The others seem racy.
If you need a systemwide mutex, then one "trick" is to (ab)use a directory. The command mkdir is usually atomic and either works or doesn't (if the directory already exists).
Change your script as follows:
my $mutex_dir = '/tmp/my-mutex-dir';
if ( mkdir $mutex_dir ) {
# run your copy-code here
# when finished:
rmdir $mutex_dir;
} else {
print "another instance is already running.\n";
}
The only thing you need to make sure is that you're allowed to create a directory in /tmp (or wherever).
Note that I intentionally do NOT firstly test for the existence of $mutex_dir because between the if (not -d $mutex_dir) and the mkdir someone else could create the directory and the mkdir would fail anyway. So simply call mkdir. If it worked then you can do your stuff. Don't forget to remove the $mutex_dir after you're done.
That's also the downside of this approach: If your copy-code crashes and the script prematurely dies then the directory isn't deleted. Presumably the lock file mechanism suggested in nwellnhof's answer behaves better in that case and automatically unlocks the file.
As the first script is trying to copy the files, the second comes and
tries the same thing resulting in a mess
A simplest approach would be to create a file which will contain 1 if another instance of script is running. Then you can add a conditional based on that.
{local $/; open my $fh, "<", 'flag' or die $!; $data = <$fh>};
die "another instance of script is running" if $data == 1;
Another approach would be to set an environment variable within the script and check it in BEGIN block.
You can use Windows-Mutex or Windows-Semaphore Objects of the package
http://search.cpan.org/~cjm/Win32-IPC-1.11/
use Win32::Mutex;
use Digest::MD5 qw (md5_hex);
my $mutex = Win32::Mutex->new(0, md5_hex $filename);
if ($mutex) {
do_your_job();
$mutex->release
} else {
#fail...
}

Writing to popen and reading back several files in Ruby

I need to run some shell commands on a number of files and sometimes I get back more than one file in response. The question is: How can I read back several files from IO.popen in Ruby?
For instance, imagine the following case:
file = grid.get(record['_id']) # fetch a file from database
IO.popen('tar -Oxmz', 'ab') {|pipe| pipe.write(file.read)} # pass to tar and extract
This necessitates that I reread all the extracted files from the filesystem. I figured out this is the speed bottleneck of my script and I wonder if I can accomplish the same task in-memroy. I tried the following:
file = grid.get(record['_id'])
IO.popen('tar -Oxmz', 'w+b') do |pipe|
pipe.write(file.read)
pipe.close_write
output = pipe.read
end
It works, but I get the whole response, here including several extracted files, in one piece (in variable output). I need the files separate from each other and possibly with their names. Is there any way to do this?
By the way, the resulting files are most of the time text, but sometimes binary. Running a pipe for each output file is not a solution, because the actual overhead of running the commands for each file outweights the benefits of doing the transformation in-memory.
P.S. The actual use case does not rely on tar only. I use software that do not have Ruby wrappers.

A PWM with gapped alignments in Biopython

I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments?
from Bio.Alphabet import Gapped
alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped)
m = Motif.Motif()
for a in alignment:
m.add_instance(a.seq)
m.pwm()
So you want to use clustal to make these gapped alignments? I use Perl, I see you are using Python, but the logic is basically the same. I use a system call to the clustal executable instead of using BioPerl/Biopython. I believe the clustalw2 executable handles gapped alignments without the need to call an alphabet. Not 100 percent sure, but this is a script I use that works for me. Create a directory with all of your aligments files in it (I use .fasta but you can change the flags on the system call to accept others). This is my Perl script, you must modify the executable path in the last line to match clustal's location on your computer. Hope this helps a bit. As a side note, this is good for making many alignments very quickly, which is what I use it for but if you are only looking to align a few files, might want to skip the whole creating a directory and modify the code to accept a filepath and not a dirpath.
#!/usr/bin/perl
use warnings;
print "Please type the list file name of protein fasta files to align (end the directory path with a / or this will fail!): ";
$directory = <STDIN>;
chomp $directory;
opendir (DIR,$directory) or die $!;
my #file = readdir DIR;
closedir DIR;
my $add="_align.fasta";
foreach $file (#file) {
my $infile = "$directory$file";
(my $fileprefix = $infile) =~ s/\.[^.]+$//;
my $outfile="$fileprefix$add";
system "/Users/Wes/Desktop/eggNOG_files/clustalw-2.1-macosx/clustalw2 -INFILE=$infile -OUTFILE=$outfile -OUTPUT=FASTA -tree";
}
Cheers,
Wes

Resources