Bash Bulk Rename Folders with 3-Digit Prefix and Delimiter - bash

I have a series of folders that I'd like to rename with a prefix number and delimited text. For instance:
% ls
blue green keyboard pictures red tango yellow
flyer gum orange pop runner videos
rename to:
% ls
001-blue 002-green 003-keyboard 004-pictures 005-red 006-tango 007-yellow
008-flyer 009-gum 010-orange 011-pop 012-runner 013-videos
I am using the following to rename except that after 009, I then have 0010, 0011, and so on. I would like to keep prefix numbers to 3 digits.
% i=0; for x in *; do; mv "$x" "00$i-$x" ; i=$((i + 1)); done
I know the problem is in the mv command because of the hard-coded 00 in the destination name, but I don't know how to change that to a 3-digit exclusive destination name with the $i variable.
Thanks in advance.

Use this Perl one-liner:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and system $cmd for #ARGV;'
To do a dry run and print the intended commands without renaming any files, use print instead of system, like so:
perl -le '$cmd = sprintf( "mv $_ %03d-$_", ++$i ) and print $cmd for #ARGV;'
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
See also the docs for sprintf.

Related

bash remove/change values from one field with a loop

I have a file where the 10th column in excel contains prices.
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5500",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50000",19.50,bieber,20160506,0,,N,E,,,,,,
When it goes to csv the quotes and the comma's stay.
I need to pick out the column that is surrounded by quotes - I use grep -o
and then after clearing the commas, i get rid of the quotes.
I can't use quotes or comma to delimit in awk because the prices get broken up into different fields.
cat /tmp/wowmom | awk -F ',' '{print $10}'
"5000"
"75
"100
"5500"
"50
"350
"50000"
while read line
do
clean_price=$(grep -o '".*"' $line)
echo "$clean_price" | tr -d',' > cleanprice1
echo "cleanprice1" | tr -d'"' > clearnprice2
done </tmp/wowmom
I get errors though "No such file or directory" on the grep
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"5000",19.50,justin,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"75,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"100,000",19.50,selena,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"50,000",19.50,gomez,20160506,0,,N,E,,,,,,:No such file or directory
grep:CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,"350,000",19.50,bieber,20160506,0,,N,E,,,,,,:No such file or directory
I want to some way, Isolate the value within quotes with a grep -o and take out comma from the number , then use awk to take the quotes out of field 10.
I am doinng this manually right now It is a suprizingly long job - there are thousands of lines on this.
You an use FPAT with gnu-awk for this:
awk -v FPAT='"[^"]+",|[^,]*' '{gsub(/[",]+/, "", $10)} 1' OFS=, file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,
You are using the wrong tool here.
sed -r 's/^(([^,]+,){9})"([^,]+),?([^,]+)"/\1\3\4/' file.csv > newfile.csv
The regular expression captures the first nine fields into the first back reference (and also populates the second with the last of the nine fields), the number before the separator comma in the third, and the rest of the number in the fourth, then the substitution glues them back without the skipped elements.
If you have numbers with more than one thousands separator (i.e. above one million), you will need a slightly more complex script.
In terms of what's wrong with your original script, the second argument to grep is the name of the file to grep, not the string to grep. You can use a here string (in Bash) or pipe the string to grep, but again, this is not how you do it properly.
grep -o '"[^"]*"' <<<"$line"
or
printf '%s' "$line" | grep -o '"[^"]*"'
Notice also the quotes -- omitting quotes are a common newbie error; you can get away with it for a while, and then it bites you.
A pure Bash solution:
while IFS=\" read -r l n r; do
printf '%s\n' "$l${n//,/}$r"
done < input_file.txt
If you're looking for perl:
#!perl
use strict;
use warnings;
use Text::CSV;
use autodie;
my $csv = Text::CSV->new({binary=>1, eol=>"\n"});
my $filename = shift #ARGV;
open my $fh, "<", $filename;
while (my $row = $csv->getline($fh)) {
$row->[9] =~ s/,//g;
$csv->print(*STDOUT, $row);
}
close $fh;
demo:
$ perl csv.pl file
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5000,19.50,justin,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,75000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,100000,19.50,selena,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,5500,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,gomez,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,350000,19.50,bieber,20160506,0,,N,E,,,,,,
CASPER,N,CUSIP,0000000000,WOWMOM,USD,USD,US,B,50000,19.50,bieber,20160506,0,,N,E,,,,,,

Getting different output files

I'm doing a test with these files:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R2_001.fastq
I want to get the files that have the same code until the first _ (underscore) and have the code R1 in different output files. The output files should be called according with the code until the first _ (underscore).
-This is my code, but I'm having trouble on making the output files.
#!/bin/bash
for i in {900..995}; do
if [[ ${i} -eq ${i} ]]; then
cat comp${i}_*_R1_001.fastq
fi
done
-I want to have two outputs:
One output will have all lines from:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and its name should be comp900_R1.out
The other output will have lines from:
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
and its name should be comp995_R1.out
Finally, as I said, this is a small test. I want my script to work with a lot of files that have the same characteristics.
Using awk:
ls -1 *.fastq | awk -F_ '$8 == "R1" {system("cat " $0 ">>" $1 "_R1.out")}'
List all files *.fastq into awk, splitting on _. Check if 8:th part $8 is R1, then append cat >> the file into first part $1 + _R1.out, which will be comp900_R1.out or comp995_R1.out. It is assumed that no filenames contain spaces or other special characters.
Result:
File comp900_R1.out containing all lines from
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and file comp995_R1.out containing all lines from
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
My stab at a general solution:
#!/bin/bash
for f in *_R1_*; do
code=$(echo $f | cut -d _ -f 1)
cat $f >> ${code}_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
done
Iterates over files with _R1_ in it, then appends its output to a file based on code.
cut pulls out the code by splitting the filename (-d _) and returning the first field (-f 1).

Need to pick Latest File From a Dir Using Shell Script

I am new to Shell Script and I got a requirement to pick the latest files from a dir using Shell script
Directory Name : FTPDIR
File In this Dir will be of
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
Note: (Need to Pick one Latest from each Group)- Below is the out put which I need to get after executing the shell script
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
Ex: In the latest File APC5502015VP092020121451.csv the no 092020121451 is the date part in the format : MMDDYYYYHHMM and string part is APC5502015VP (Length Not Fixed in String Part)
I need to pick those three files from the dir using shell script
Can you help me to resolve this?
It's going to be really problematic to do this safely in just bash. As Jonathan mentioned, "special" characters like spaces or newlines may bung up your script.
If we can assume that there won't be any of those, then we can do most of job in bash, without involving other tools.
# Make an associative array to record types, in the second loop...
declare -A a
for file in *.csv; do
# First, we convert the filenames into something that can be sorted.
# The next three lines account for your "unknown length" in the first part
# of the filename. We assume the date+time is the 12 chars before ".csv".
new="$(rev <<<"$file")"
new="${new:4:12}"
new="$(rev <<<"$new")"
new="${new:4:4}${new:0:2}${new:2:2}${new:8:4}"
len=$(( ${#file} - 16 ))
echo "$new ${file:0:$len} $file"
done | sort | while read date type file; do
# Next, we print only the first of each "type"...
if [[ ${a[$type]} -eq 0 ]]; then
a[$type]=1
echo "$file"
fi
# And stop once we have collected three types.
if [[ ${#a[*]} -ge 3 ]]; then
break
fi
done
As I say, this doesn't handle newlines in filenames.
Note also that this uses rev and sort, which are not built in to bash. The rev parts could be done internally, using more code, which might make them execute faster, but you'd only see a difference in very extreme cases. There's not much we can do about sort, since there isn't a built-in within bash.
This Perl script works on the given data. No doubt it could be improved.
#!/usr/bin/env perl
use strict;
use warnings;
my %bases;
while (<>)
{
chomp;
my $name = $_;
my($prefix, $mmdd, $yyyy, $hhmm) = ($name =~ m/(.*)(\d{4})(\d{4})(\d{4})\.csv/);
#print "$name = $prefix $yyyy $mmdd $hhmm\n";
my $stamp = "$yyyy$mmdd$hhmm";
if (!exists($bases{$prefix}) || ($stamp > $bases{$prefix}->{stamp}))
{
$bases{$prefix} = { name => $name, stamp => $stamp };
}
}
foreach my $prefix (sort keys %bases)
{
print "$bases{$prefix}->{name}\n";
}
Output:
APC5502015VP092020121451.csv
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
this is the awk solution:
cd FTPDIR
ls -1|awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}'
Testted Below:
> cat temp
APC5502015VP072020121826.csv
APC5502015VP082020122314.csv
APC5502015VP092020121451.csv
CBC5502015VP092020122045.csv
CBC5502015VP102020122045.csv
S5502015VP072020121620.csv
S5502015VP072020122314.csv
S5502015VP092020122045.csv
> awk -F"VP" '{split($2,a,".");if(a[1]>b[$1]){b[$1]=$2}}END{for(i in b)print i"VP"b[i]}' temp
CBC5502015VP102020122045.csv
S5502015VP092020122045.csv
APC5502015VP092020121451.csv

gnuplot for cycle and spaces in filename

I have small script in bash, which is generating graphs via gnuplot.
Everything works fine until names of input files contain space(s).
Here's what i've got:
INPUTFILES=("data1.txt" "data2 with spaces.txt" "data3.txt")
...
#MAXROWS is set earlier, not relevant.
for LINE in $( seq 0 $(( MAXROWS - 1 )) );do
gnuplot << EOF
reset
set terminal png
set output "out/graf_${LINE}.png"
filenames="${INPUTFILES[#]}"
set multiplot
plot for [file in filenames] file every ::0::${LINE} using 1:2 with line title "graf_${LINE}"
unset multiplot
EOF
done
This code works, but only without spaces in names of input files.
In the example gnuplot evaluate this:
1 iteration: file=data1.txt - CORRECT
2 iteration: file=data2 - INCORRECT
3 iteration: file=with - INCORRECT
4 iteration: file=spaces.txt - INCORRECT
The quick answer is that you can't do exactly what you want to do. Gnuplot splits the string in an iteration on spaces and there's no way around that (AFIK). Depending on what you want, there may be a "Work-around". You can write a (recursive) function in gnuplot to replace a character string with another --
#S,C & R stand for STRING, CHARS and REPLACEMENT to help this be a little more legible.
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] ,C,R) : S
Bonus points to anyone who can figure out how to do this without recursion...
Then your (bash) loop looks something like:
INPUTFILES_BEFORE=("data1.txt" "data2 with spaces.txt" "data3.txt")
INPUTFILES=()
#C style loop to avoid changing IFS -- Sorry SO doesn't like the #...
#This loop pre-processes files and changes spaces to '#_#'
for (( i=0; i < ${#INPUTFILES_BEFORE[#]}; i++)); do
FILE=${INPUTFILES_BEFORE[${i}]}
INPUTFILES+=( "`echo ${FILE} | sed -e 's/ /#_#/g'`" ) #replace ' ' with '#_#'
done
which preprocesses your input files to add '#_#' to the filenames which have spaces in them... Finally, the "complete" script:
...
INPUTFILES_BEFORE=("data1.txt" "data2 with spaces.txt" "data3.txt")
INPUTFILES=()
for (( i=0; i < ${#INPUTFILES_BEFORE[#]}; i++)); do
FILE=${INPUTFILES_BEFORE[${i}]}
INPUTFILES+=( "`echo ${FILE} | sed -e 's/ /#_#/g'`" ) #replace ' ' with '#_#'
done
for LINE in $( seq 0 $(( MAXROWS - 1 )) );do
gnuplot <<EOF
filenames="${INPUTFILES[#]}"
replace(S,C,R)=(strstrt(S,C)) ? \
replace( S[:strstrt(S,C)-1].R.S[strstrt(S,C)+strlen(C):] , C ,R) : S
#replace '#_#' with ' ' in filenames.
plot for [file in filenames] replace(file,'#_#',' ') every ::0::${LINE} using 1:2 with line title "graf_${LINE}"
EOF
done
However, I think the take-away here is that you shouldn't use spaces in filenames ;)
Escape the spaces:
"data2\ with\ spaces.txt"
EDIT
It seems that even with escape sequences, as you have mentioned, the bash for will always parse the input on the spaces.
Can you convert your script to work in a while loop fashion:
http://ubuntuforums.org/showthread.php?t=83424
This also may be a solution, but it's new to me and I'm still playing with it to understand exactly what it's doing:
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html

Setting a BASH environment variable directly in AWK (in an AWK one-liner)

I have a file that has two columns of floating point values. I also have a C program that takes a floating point value as input and returns another floating point value as output.
What I'd like to do is the following: for each row in the original, execute the C program with the value in the first column as input, and then print out the first column (unchanged) followed by the second column minus the result of the C program.
As an example, suppose c_program returns the square of the input and behaves like this:
$ c_program 4
16
$
and suppose data_file looks like this:
1 10
2 11
3 12
4 13
What I'd like to return as output, in this case, is
1 9
2 7
3 3
4 -3
To write this in really sketchy pseudocode, I want to do something like this:
awk '{print $1, $2 - `c_program $1`}' data_file
But of course, I can't just pass $1, the awk variable, into a call to c_program. What's the right way to do this, and preferably, how could I do it while still maintaining the "awk one-liner"? (I don't want to pull out a sledgehammer and write a full-fledged C program to do this.)
you just do everything in awk
awk '{cmd="c_program "$1; cmd|getline l;print $1,$2-l}' file
This shows how to execute a command in awk:
ls | awk '/^a/ {system("ls -ld " $1)}'
You could use a bash script instead:
while read line
do
FIRST=`echo $line | cut -d' ' -f1`
SECOND=`echo $line | cut -d' ' -f2`
OUT=`expr $SECOND \* 4`
echo $FIRST $OUT `expr $OUT - $SECOND`
done
The shell is a better tool for this using a little used feature. There is a shell variable IFS which is the Input Field Separator that sh uses to split command lines when parsing; it defaults to <Space><Tab><Newline> which is why ls foo is interpreted as two words.
When set is given arguments not beginning with - it sets the positional parameters of the shell to the contents of the arguments as split via IFS, thus:
#!/bin/sh
while read line ; do
set $line
subtrahend=`c_program $1`
echo $1 `expr $2 - $subtrahend`
done < data_file
Pure Bash, without using any external executables other than your program:
#!/bin/bash
while read num1 num2
do
(( result = $(c_program num2) - num1 ))
echo "$num1 $result"
done
As others have pointed out: awk is not not well equipped for this job. Here is a suggestion in bash:
#!/bin/sh
data_file=$1
while read column_1 column_2 the_rest
do
((result=$(c_program $column_1)-$column_2))
echo $column_1 $result "$the_rest"
done < $data_file
Save this to a file, say myscript.sh, then invoke it as:
sh myscript.sh data_file
The read command reads each line from the data file (which was redirected to the standard input) and assign the first 2 columns to $column_1 and $column_2 variables. The rest of the line, if there is any, is stored in $the_rest.
Next, I calculate the result based on your requirements and prints out the line based on your requirements. Note that I surround $the_rest with quotes to reserve spacing. Failure to do so will result in multiple spaces in the input file to be squeezed into one.

Resources