How to substitute the characters à, è, ì, ò, ù in bash script

How to substitute the characters à, è, ì, ò, ù in bash script - bash

I've got to rename a file such that: IndennitàMalattia.doc
by replacing the character à with a'.
The following sed command works in the command line, but not inside a .sh file.
echo $FILE | sed -e s/à/a\'/g
Can someone please help me?
Thanks!

Change your sed like below,
echo $FILE | sed "s/à/a'/g"

mv "${File}" "$( echo "${File}" | sed "s/à/a'/g;s/è/e'/g;s/ì/i'/g;s/ò/o'/g;s/ù/u'/g" )"
and any other accent char equivalent

You may find this Perl script useful. It will rename specified files by turning all grave accents into apostrophes:
#!/usr/bin/env perl
use v5.14;
use autodie;
use warnings;
use warnings qw( FATAL utf8 );
use utf8;
use open qw ( :encoding(UTF-8) :std );
use charnames qw( :full :short );
use Unicode::Normalize;
# if no args specified, use example from question
#ARGV = qw(IndennitàMalattia.doc) unless #ARGV;
foreach my $old_name (#ARGV) {
(my $new_name = NFD($old_name)) =~ s/\N{COMBINING GRAVE ACCENT}/'/g;
say qq{Renaming "$old_name" to "$new_name"};
rename $old_name, NFC($new_name);
}

Related

Bash Bulk Rename Folders with 3-Digit Prefix and Delimiter

I have a series of folders that I'd like to rename with a prefix number and delimited text. For instance:
% ls
blue green keyboard pictures red tango yellow
flyer gum orange pop runner videos
rename to:
% ls
001-blue 002-green 003-keyboard 004-pictures 005-red 006-tango 007-yellow
008-flyer 009-gum 010-orange 011-pop 012-runner 013-videos
I am using the following to rename except that after 009, I then have 0010, 0011, and so on. I would like to keep prefix numbers to 3 digits.
% i=0; for x in *; do; mv "$x" "00$i-$x" ; i=$((i + 1)); done
I know the problem is in the mv command because of the hard-coded 00 in the destination name, but I don't know how to change that to a 3-digit exclusive destination name with the $i variable.
Thanks in advance.

Use this Perl one-liner:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and system $cmd for #ARGV;'
To do a dry run and print the intended commands without renaming any files, use print instead of system, like so:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and print $cmd for #ARGV;'
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
See also the docs for sprintf.

Convert first character to capital along with special character separator

I would like to convert first character to capital and character coming after dash(-) needs to be converted to capital using bash.
I can split individual elements using - ,
echo "string" | tr [:lower:] [:upper:]
and join all but that doesn't seem effect. Is there any easy way to take care of this using single line?
Input string:
JASON-CONRAD-983636
Expected string:
Jason-Conrad-983636

I recommend using Python for this:
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
Usage:
capitalize() {
python3 -c 'import sys; print("-".join(s.capitalize() for s in sys.stdin.read().split("-")))'
}
echo JASON-CONRAD-983636 | capitalize
Output:
Jason-Conrad-983636

In pure bash (v4+) without any third party utils
str=JASON-CONRAD-983636
IFS=- read -ra raw <<<"$str"
final=()
for str in "${raw[#]}"; do
first=${str:0:1}
rest=${str:1}
final+=( "${first^^}${rest,,}" )
done
and print the result
( IFS=- ; printf '%s\n' "${final[*]}" ; )

This might work for you (GNU sed):
sed 's/.*/\L&/;s/\b./\u&/g' file
Lowercase everything. Uppercase first characters of words.
Alternative:
sed -E 's/\b(.)((\B.)*)/\u\1\L\2/g' file

Could you please try following(in case you are ok with awk).
var="JASON-CONRAD-983636"
echo "$var" | awk -F'-' '{for(i=1;i<=NF;i++){$i=substr($i,1,1) tolower(substr($i,2))}} 1' OFS="-"

Although the party is mostly over, please let me join with a perl solution:
perl -pe 's/(^|-)([^-]+)/$1 . ucfirst lc $2/ge' <<<"JASON-CONRAD-983636"
It may be cunning to use the ucfirst function :)

How to convert actual Unicode to \u0123

I want to turn Unicode text into pure ASCII encoding using escape sequences.
Input :Ɏɇ衳 outputs to ... "\u024E\u0247\u8873"
Basically the opposite of this.
$ echo -e "\u024E\u0247\u8873"
Ɏɇ衳
I want the encoding to stay in utf8, all I'm doing is changing forms.
I've Tried:
iconv -f utf8 -t utf8 $file
iconv -f utf8 -t utf16 $file

Your mentioned codes 024E, 0247, .. are called Unicode code points and are independent from UTF-8 or UTF-16.
If perl is your option, you can retrieve the codes with:
perl -C -ne 'map {printf "\\u%04X", ord} (/./g)' <<< "Ɏɇ衳"; echo
which outputs:
\u024E\u0247\u8873
Explanation
The perl code above is mostly equivalent to:
#!/usr/bin/perl
use utf8;
$str = "Ɏɇ衳";
foreach $chr ($str =~ /./g) {
printf "\\u%04X", ord($chr);
}
print "\n";
use utf8 specifies the string is encoded in UTF-8 (just because the string is embedded in the script).
($str =~ /./g) brakes the string into an array of characters.
foreach iterates over the array of characters.
ord returns the code point of the given character.
EDIT
If you want to auto-scale the number of digits considering the out-of-BMP characters, try instead:
#!/usr/bin/perl
use utf8;
$str = "Ɏɇ衳";
foreach $chr ($str =~ /./g) {
$n = ord($chr);
$d = $n > 0xffff ? 8 : 4;
printf "\\u%0${d}X", $n;
}

If you have that in a file you can use iconv.
iconv -f $input_encoding -t $output_encoding $file
check "man iconv" for more details

Find if null exists in csv file

I have a csv file. The file has some anomalies as it contains some unknown characters.
The characters appear at line 1535 in popular editors (images attached below). The sed command in the terminal for this linedoes not show anything.
$ sed '1535!d' sample.csv
"sample_id","sample_column_text_1","sample_"sample_id","sample_column_text_1","sample_column_text_2","sample_column_text_3"
However below are the snapshots of the file in various editors.
Sublime Text
Nano
Vi
The directory has various csv files that contain this character/chain of characters.
I need to write a bash script to determine the files that have such characters. How can I achieve this?

The following is from;
http://www.linuxquestions.org/questions/programming-9/how-to-check-for-null-characters-in-file-509377/
#!/usr/bin/perl -w
use strict;
my $null_found = 0;
foreach my $file (#ARGV) {
if ( ! open(F, "<$file") ) {
warn "couldn't open $file for reading: $!\n";
next;
}
while(<F>) {
if ( /\000/ ) {
print "detected NULL at line $. in file $file\n";
$null_found = 1;
last;
}
}
close(F);
}
exit $null_found;
If it works as desired, you can save it to a file, nullcheck.pl and make it executable;
chmod +x nullcheck.pl
It seems to take an array of files names as input, but will fail if it finds in any, so I'd only pass in one at a time. The command below is used to run the script.
for f in $(find . -type f -exec grep -Iq . {} \; -and -print) ; do perl ./nullcheck.pl $f || echo "$f has nulls"; done
The above find command is lifted from Linux command: How to 'find' only text files?

You can try tr :
grep '\000' filename to find if the files contain the \000 characters.
You can use this to remove NULL and make it non-NULL file :
tr < file-with-nulls -d '\000' > file-without-nulls

Perl print single quotes from command line script

The following
echo text | perl -lnE 'say "word: $_\t$_"'
prints
word: text text
I need
word: 'text' 'text'
Tried:
echo text | perl -lnE 'say "word: \'$_\' \'$_\'";' #didn't works
echo text | perl -lnE 'say "word: '$_' '$_'";' #neither
How to correctly escape the single quotes for bash?
Edit:
want prepare a shell script with a couple of mv lines (for checking, before really renames the files), e.g tried to solve the following:
find . type f -print | \
perl \
-MText::Unaccent::PurePerl=unac_string \
-MUnicode::Normalize=NFC -CASD \
-lanE 'BEGIN{$q=chr(39)}$o=$_;$_=unac_string(NFC($_));s/[{}()\[\]\s\|]+/_/g;say "mv $q$o$q $_"' >do_rename
e.g. from the filenames like:
Somé filénamé ČŽ (1980) |Full |Movie| Streaming [360p] some.mp4
want get the following output in the file do_rename
mv 'Somé filénamé ČŽ (1980) |Full |Movie| Streaming [360p] some.mp4' Some_filename_CZ_1980_Full_Movie_Streaming_360p_some.mp4
and after the manual inspection want run:
bash do_rename
for running the actual rename...

You can use ASCII code 39 for ' to avoid escape hell,
echo text | perl -lnE 'BEGIN{ $q=chr(39) } say "word: $q$_$q\t$q$_$q"'

You can use:
echo text | perl -lnE "say \"word: '\$_'\t'\$_'\""
word: 'text' 'text'
BASH allows you to include escaped double quote inside a double quote but same doesn't apply for single quoted. However while doing so we need to escape $ to avoid escaping from BASH.

OK, based on the statement of the problem you're having. My suggestion would be - don't pipe find to perl, that's just asking for all kinds of annoyance.
I'm not entirely familiar with the modules, but would suggest you try something like this:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use Text::Unaccent::PurePerl qw ( unac_string );
use Unicode::Normalize qw ( NFC );
use Getopt::Std;
use File::Copy qw ( move );
use Encode qw(decode_utf8);
my %opts;
#x to execute, p to specify a path.
#p mandatory.
getopts('xp:',\%opts);
#this sub is called for each file, by the find function.
#$File::Find::name is the full path to the file.
#$_ is just the filename.
sub rename_unicode_files {
#skip if it's not a file.
next unless -f $File::Find::name;
#convert name with functions from your example.
my $newname = unac_string(NFC(decode_utf8($File::Find::name)));
$newname =~ s/[{}()\[\]\s\|]+/_/g;
#could apply other transforms here, such as regular expressions.
#if the two names are different, consider moving.
unless ( $newname eq $File::Find::Name ) {
print "Would rename: $File::Find::Name to $newname\n";
#actually do it, if '-x' is specified.
if ( $opts{x} ) { move ( $File::Find::name, $newname ); };
}
}
#require -p <pathname> or otherwise print how to use.
unless ( -d $opts{p} ) {
print "Usage: $0 -p <pathname> [-x]\n";
exit;
}
#trigger find with callback to subroutine, over the '-p <path>'.
find ( \&rename_unicode_files, $opts{p} );
Extend with something like GetOpt::Std to check if you've specified an option - so you run normally, you get 'this is what I would do' and if you specify a particular flag, it actually does it.
And either use the perl builtin rename or the one available from File::Copy
This will neatly avoid a lot of the escaping and interpolating problems you're having, and I think leave you with generally more readable and useful code.
Edit: Given a comment suggesting that the above is 'too long' how about:
#!/usr/bin/perl
use File::Find; use Text::Unaccent::PurePerl qw ( unac_string ); use Unicode::Normalize qw ( NFC ); find( sub { next unless -f $name; print "mv \'$File::Find::Name\' \'",unac_string( NFC($File::Find::name) )."\'\n"; }, "." );
Still not convinced of the values of the approach. Even if it is only run occasionally - that's even more reason to make it as clear as possible.

There is a trivial solution using zsh shell and SQL-like (at least PostgreSQL and Oracle) quoting style:
$ setopt rc_quotes
$ echo text | perl -lnE 'say "word: ''$_''\t''$_''"'
word: 'text' 'text'
To quote a ' you simply double it and use '' in this mode.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to substitute the characters à, è, ì, ò, ù in bash script - bash

I've got to rename a file such that: IndennitàMalattia.doc by replacing the character à with a'. The following sed command works in the command line, but not inside a .sh file. echo $FILE | sed -e s/à/a\'/g Can someone please help me? Thanks!

Change your sed like below, echo $FILE | sed "s/à/a'/g"

mv "${File}" "$( echo "${File}" | sed "s/à/a'/g;s/è/e'/g;s/ì/i'/g;s/ò/o'/g;s/ù/u'/g" )" and any other accent char equivalent

Related

Bash Bulk Rename Folders with 3-Digit Prefix and Delimiter

Convert first character to capital along with special character separator

How to convert actual Unicode to \u0123

Find if null exists in csv file

Perl print single quotes from command line script

Categories

Resources