Perl print single quotes from command line script - bash

The following
echo text | perl -lnE 'say "word: $_\t$_"'
prints
word: text text
I need
word: 'text' 'text'
Tried:
echo text | perl -lnE 'say "word: \'$_\' \'$_\'";' #didn't works
echo text | perl -lnE 'say "word: '$_' '$_'";' #neither
How to correctly escape the single quotes for bash?
Edit:
want prepare a shell script with a couple of mv lines (for checking, before really renames the files), e.g tried to solve the following:
find . type f -print | \
perl \
-MText::Unaccent::PurePerl=unac_string \
-MUnicode::Normalize=NFC -CASD \
-lanE 'BEGIN{$q=chr(39)}$o=$_;$_=unac_string(NFC($_));s/[{}()\[\]\s\|]+/_/g;say "mv $q$o$q $_"' >do_rename
e.g. from the filenames like:
Somé filénamé ČŽ (1980) |Full |Movie| Streaming [360p] some.mp4
want get the following output in the file do_rename
mv 'Somé filénamé ČŽ (1980) |Full |Movie| Streaming [360p] some.mp4' Some_filename_CZ_1980_Full_Movie_Streaming_360p_some.mp4
and after the manual inspection want run:
bash do_rename
for running the actual rename...

You can use ASCII code 39 for ' to avoid escape hell,
echo text | perl -lnE 'BEGIN{ $q=chr(39) } say "word: $q$_$q\t$q$_$q"'

You can use:
echo text | perl -lnE "say \"word: '\$_'\t'\$_'\""
word: 'text' 'text'
BASH allows you to include escaped double quote inside a double quote but same doesn't apply for single quoted. However while doing so we need to escape $ to avoid escaping from BASH.

OK, based on the statement of the problem you're having. My suggestion would be - don't pipe find to perl, that's just asking for all kinds of annoyance.
I'm not entirely familiar with the modules, but would suggest you try something like this:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use Text::Unaccent::PurePerl qw ( unac_string );
use Unicode::Normalize qw ( NFC );
use Getopt::Std;
use File::Copy qw ( move );
use Encode qw(decode_utf8);
my %opts;
#x to execute, p to specify a path.
#p mandatory.
getopts('xp:',\%opts);
#this sub is called for each file, by the find function.
#$File::Find::name is the full path to the file.
#$_ is just the filename.
sub rename_unicode_files {
#skip if it's not a file.
next unless -f $File::Find::name;
#convert name with functions from your example.
my $newname = unac_string(NFC(decode_utf8($File::Find::name)));
$newname =~ s/[{}()\[\]\s\|]+/_/g;
#could apply other transforms here, such as regular expressions.
#if the two names are different, consider moving.
unless ( $newname eq $File::Find::Name ) {
print "Would rename: $File::Find::Name to $newname\n";
#actually do it, if '-x' is specified.
if ( $opts{x} ) { move ( $File::Find::name, $newname ); };
}
}
#require -p <pathname> or otherwise print how to use.
unless ( -d $opts{p} ) {
print "Usage: $0 -p <pathname> [-x]\n";
exit;
}
#trigger find with callback to subroutine, over the '-p <path>'.
find ( \&rename_unicode_files, $opts{p} );
Extend with something like GetOpt::Std to check if you've specified an option - so you run normally, you get 'this is what I would do' and if you specify a particular flag, it actually does it.
And either use the perl builtin rename or the one available from File::Copy
This will neatly avoid a lot of the escaping and interpolating problems you're having, and I think leave you with generally more readable and useful code.
Edit: Given a comment suggesting that the above is 'too long' how about:
#!/usr/bin/perl
use File::Find; use Text::Unaccent::PurePerl qw ( unac_string ); use Unicode::Normalize qw ( NFC ); find( sub { next unless -f $name; print "mv \'$File::Find::Name\' \'",unac_string( NFC($File::Find::name) )."\'\n"; }, "." );
Still not convinced of the values of the approach. Even if it is only run occasionally - that's even more reason to make it as clear as possible.

There is a trivial solution using zsh shell and SQL-like (at least PostgreSQL and Oracle) quoting style:
$ setopt rc_quotes
$ echo text | perl -lnE 'say "word: ''$_''\t''$_''"'
word: 'text' 'text'
To quote a ' you simply double it and use '' in this mode.

Related

Bash Bulk Rename Folders with 3-Digit Prefix and Delimiter

I have a series of folders that I'd like to rename with a prefix number and delimited text. For instance:
% ls
blue green keyboard pictures red tango yellow
flyer gum orange pop runner videos
rename to:
% ls
001-blue 002-green 003-keyboard 004-pictures 005-red 006-tango 007-yellow
008-flyer 009-gum 010-orange 011-pop 012-runner 013-videos
I am using the following to rename except that after 009, I then have 0010, 0011, and so on. I would like to keep prefix numbers to 3 digits.
% i=0; for x in *; do; mv "$x" "00$i-$x" ; i=$((i + 1)); done
I know the problem is in the mv command because of the hard-coded 00 in the destination name, but I don't know how to change that to a 3-digit exclusive destination name with the $i variable.
Thanks in advance.
Use this Perl one-liner:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and system $cmd for #ARGV;'
To do a dry run and print the intended commands without renaming any files, use print instead of system, like so:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and print $cmd for #ARGV;'
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
See also the docs for sprintf.

bash: transform scaffold fasta

I have a fasta file with the following sequences:
>NZ_OCNF01123018.1
TACAAATACAACAAATACAAGTACACCAAGTACAAATACAAGTATCCCAAGTACAAATACAAGTA
TCCCAAGTACAAATACAAGTATTCCAAGTACAAATACAAAACCTGTTGAGCAACCTAAACCTGTTGAAC
AGCCCAAACCTGTTGAACAGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAACCTTTATCCGCACTTA
CGAGCAAATACACCAATACCGCTTTATCGGCACAGTCTGCCCAAATTGACGGATGCACCATGTTACCCAACAC
ATCAATCAACGTTTGTGGGATCACCTGAAAAAGGGCGCGGTTTGTGGTTGATG
>NZ_OCNF01123018.2
AATTGTCGTGTAAAGCCACACCAAACCCCATTATAGCCCCAAAAACACCAAAAAGGCTGCCTGAACCACATTTCAGACAG
And I want to split the all sequences in the file that contain multiple N at the site where it occurs and make two sequences out of it.
Expected solution:
>NZ_OCNF01123018.1
TACAAATACAACAAATACAAGTACACCAAGTACAAATACAAGTATCCCAAGTACAAATACAAGTA
TCCCAAGTACAAATACAAGTATTCCAAGTACAAATACAAAACCTGTTGAGCAACCTAAACCTGTTGAAC
AGCCCAAACCTGTTGAACAGC
>contig1
AAACCTTTATCCGCACTTA
CGAGCAAATACACCAATACCGCTTTATCGGCACAGTCTGCCCAAATTGACGGATGCACCATGTTACCCAACAC
ATCAATCAACGTTTGTGGGATCACCTGAAAAAGGGCGCGGTTTGTGGTTGATG
>NZ_OCNF01123018.2
AATTGTCGTGTAAAGCCACACCAAACCCCATTATAGCCCCAAAAACACCAAAAAGGCTGCCTGAACCACATTTCAGACAG
my (inelegant) approach would be this:
perl -pe 's/[N]+/\*/g' $file | perl -pe 's/\*/\n>contig1\n/g'
of course that also replaces the N of the sequence header and creates headers without a sequence. As a plus, it would be nice to number the new 'contigs' from 1 to x in case there are multiple sequences with N.
What do you suggest?
I'd suggest to use split instead of trying to get a regex just right, and in a script instead of a brittle and crammed "one"-liner.
use warnings;
use strict;
use feature 'say';
my $file = shift #ARGV;
die "Usage: $0 filename\n" if !$file; # also check submitted $file
my $content = do { # or: my $content = Path::Tiny::path($file)->slurp;
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
my #f = grep { /\S/ } split /(?<!>)NN+/, $content;
say shift #f;
my $cnt;
for (#f) {
say "\n>contig", (++$cnt), ":\n$_";
}
This slurps the file into $content since NN+ can span multiple lines; Path::Tiny module can make that cleaner. The first element of the obtained array needs no >contig so it is shifted away.
The negative lookbehind (?<!...) makes the regex in split's separator pattern match NN+ only when not preceded by >, thus protecting (excluding) header lines that may start with that. If headers may contain consecutive N which are not right after > then you need to refine this.
I expaned your perl one-liner a bit:
cat file.fasta | \
perl -pe 's/\n//g unless /^>/; s/>/\n>/g;' | \
perl -pe 's/N+(?{$n++})/\n>contig${n}\n/g unless /^>/'
the first part is to remove newlines between bases, the second part is to replace continuous 'N'.

remove only *some* fullstops from a csv file

If I have lines like the following:
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
how can I replace all instances of ,., with ,?,
I want to preserve actual decimal places in the numbers so I can't just do
sed 's/./?/g' file
however when doing:
sed 's/,.,/,?,/g' file
this only appears to work in some cases. i.e. there are still instances of ,., hanging around.
anyone have any pointers?
Thanks
This should work :
sed ':a;s/,\.,/,?,/g;ta' file
With successive ,., strings, after a substitution succeeded, next character to be processed will be the following . that doesn't match the pattern, so with you need a second pass.
:a is a label for upcoming loop
,\., will match dot between commas. Note that the dot must be escaped because . is for matching any character (,a, would match with ,.,).
g is for general substitution
ta tests previous substitution and if it succeeded, loops to :a label for remaining substitutions.
Using sed it is possible by running a loop as shown in above answer however problem is easily solved using perl command line with lookarounds:
perl -pe 's/(?<=,)\.(?=,)/?/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This command doesn't need a loop because instead of matching surrounding commas we're just asserting their position using a lookbehind and lookahead.
All that's necessary is a single substitution
$ perl -pe 's/,\.(?=,)/,?/g' dots.csv
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
You have an example using sed style regular expressions. I'll offer an alternative - parse the CSV, and then treat each thing as a 'field':
#!/usr/bin/perl
use strict;
use warnings;
#iterate input row by row
while ( <DATA> ) {
#remove linefeeds
chomp;
#split this row on ,
my #row = split /,/;
#iterate each field
foreach my $field ( #row ) {
#replace this field with "?" if it's "."
$field = "?" if $field eq ".";
}
#stick this row together again.
print join ",", #row,"\n";
}
__DATA__
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This is more verbose than it needs to be, to illustrate the concept. This could be reduced down to:
perl -F, -lane 'print join ",", map { $_ eq "." ? "?" : $_ } #F'
If your CSV also has quoting, then you can break out the Text::CSV module, which handles that neatly.
You just need 2 passes since the trailing , found on a ,., match isn't available to match the leading , on the next ,.,:
$ sed 's/,\.,/,?,/g; s/,\.,/,?,/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
The above will work in any sed on any OS.

Using sed on text files with a csv

I've been trying to do bulk find and replace on two text files using a csv. I've seen the questions that SO suggests, and none seem to answer my question.
I've created two variables for the two text files I want to modify. The csv has two columns and hundreds of rows. The first column contains strings (none have whitespaces) already in the text file that need to be replaced with the corresponding strings in same row in the second column.
As a test, I tried the script
#!/bin/bash
test1='long_file_name.txt'
find='string1'
replace='string2'
sed -e "s/$find/$replace/g" $test1 > $test1.tmp && mv $test1.tmp $test1
This was successful, except that I need to do it once for every row in the csv, using the values given by the csv in each row. My hunch is that my while loop was used wrongly, but I can't find the error. When I execute the script below, I get the command line prompt, which makes me think that something has happened. When I check the text files, nothing's changed.
The two text files, this script, and the csv are all in the same folder (it's also been my working directory when I do this).
#!/bin/bash
textfile1='long_file_name1.txt'
textfile2='long_file_name2.txt'
while IFS=, read f1 f2
do
sed -e "s/$f1/$f2/g" $textfile1 > $textfile1.tmp && \
mv $textfile1.tmp $textfile1
sed -e "s/$f1/$f2/g" $textfile2 > $textfile2.tmp && \
mv $textfile2.tmp $textfile2
done <'findreplace.csv'
It seems to me that this code should do what I want it to do (but doesn't); perhaps I'm misunderstanding something fundamental (I'm new to bash scripting)?
The csv looks like this, but with hundreds of rows. All a_i's should be replaced with their counterpart b_i in the next column over.
a_1 b_1
a_2 b_2
a_3 b_3
Something to note: All the strings actually contain underscores, just in case this affects something. I've tried wrapping the variable name in braces a la ${var}, but it still doesn't work.
I appreciate the solutions, but I'm also curious to know why the above doesn't work. (Also, I would vote everyone up, but I lack the reputation to do so. However, know that I appreciate and am learning a lot from your answers!)
If you are going to process lot of data and your patterns can contain a special character I would consider using Perl. Especially if you are going to have a lot of pairs in findreplace.csv. You can use following script as filter or in-place modification with lot of files. As side effect, it will load replacements and create Aho-Corrasic automaton only once per invocation which will make this solution pretty efficient (O(M+N) instead of O(M*N) in your solution).
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $in_place = ( #ARGV and $ARGV[0] =~ /^-i(.*)/ )
? do {
shift;
my $backup_extension = $1;
my $backup_name = $backup_extension =~ /\*/
? sub { ( my $fn = $backup_extension ) =~ s/\*/$_[0]/; $fn }
: sub { shift . $backup_extension };
my $oldargv = '-';
sub {
if ( $ARGV ne $oldargv ) {
rename( $ARGV, $backup_name->($ARGV) );
open( ARGVOUT, '>', $ARGV );
select(ARGVOUT);
$oldargv = $ARGV;
}
};
}
: sub { };
die "$0: File with replacements required." unless #ARGV;
my ( $re, %replace );
do {
my $filename = shift;
open my $fh, '<', $filename;
%replace = map { chomp; split ',', $_, 2 } <$fh>;
close $fh;
$re = join '|', map quotemeta, keys %replace;
$re = qr/($re)/;
};
while (<>) {
$in_place->();
s/$re/$replace{$1}/g;
}
continue {print}
Usage:
./replace.pl replace.csv <file.in >file.out
as well as
./replace.pl replace.csv file.in >file.out
or in-place
./replace.pl -i replace.csv file1.csv file2.csv file3.csv
or with backup
./replace.pl -i.orig replace.csv file1.csv file2.csv file3.csv
or with backup whit placeholder
./replace.pl -ithere.is.\*.original replace.csv file1.csv file2.csv file3.csv
You should convert your CSV file to a sed.script with the following command:
cat replace.csv | awk -F, '{print "s/" $1 "/" $2 "/g";}' > sed.script
And then you will be able to do a one pass replacement:
sed -i -f sed.script longfilename.txt
This will be a faster implementation of what you wanna do.
BTW, sorry, but I do not understand what is wrong with your script which should work except if your CSV file has more than 2 columns.

How to substitute the characters à, è, ì, ò, ù in bash script

I've got to rename a file such that: IndennitàMalattia.doc
by replacing the character à with a'.
The following sed command works in the command line, but not inside a .sh file.
echo $FILE | sed -e s/à/a\'/g
Can someone please help me?
Thanks!
Change your sed like below,
echo $FILE | sed "s/à/a'/g"
mv "${File}" "$( echo "${File}" | sed "s/à/a'/g;s/è/e'/g;s/ì/i'/g;s/ò/o'/g;s/ù/u'/g" )"
and any other accent char equivalent
You may find this Perl script useful. It will rename specified files by turning all grave accents into apostrophes:
#!/usr/bin/env perl
use v5.14;
use autodie;
use warnings;
use warnings qw( FATAL utf8 );
use utf8;
use open qw ( :encoding(UTF-8) :std );
use charnames qw( :full :short );
use Unicode::Normalize;
# if no args specified, use example from question
#ARGV = qw(IndennitàMalattia.doc) unless #ARGV;
foreach my $old_name (#ARGV) {
(my $new_name = NFD($old_name)) =~ s/\N{COMBINING GRAVE ACCENT}/'/g;
say qq{Renaming "$old_name" to "$new_name"};
rename $old_name, NFC($new_name);
}

Resources