I have found how to rename from the Terminal using the rename command.
However, I couldn't find how to rename from a particular value? Let's say I want to rename from title223 onwards (not necessarily from title001).
What would be the syntax for this?
Thank you.
I hope I figured out your problem.
If you want to rename multiple files with this command and also what to use some Perl function, code, etc; you have declare it explicitly. Thus this command it not correct:
~ ❱ rename -n -v 's/_/printf("%.2d",$n++)/e' *.txt
Global symbol "$n" requires explicit package name (did you forget to declare "my $n"?) at (user-supplied code).
but this one is correct:
~ ❱ rename -n -v 'my $n;s/_/printf("%.2d",$n++)/e' *.txt
00rename(0_file.txt, 01file.txt)
00rename(1_file.txt, 11file.txt)
00rename(2_file.txt, 21file.txt)
00rename(3_file.txt, 31file.txt)
00rename(4_file.txt, 41file.txt)
but the problem with this code is that the printf function does not puts its result in the substitution and for fixing this we can use sprintf.
putting zero-leading and _ at the beginning
~ ❱ rename -v -n 'my $n;s/^/sprintf("%.3d_",$n++)/ge' *.txt
rename(0_file.txt, 000_0_file.txt)
rename(1_file.txt, 000_1_file.txt)
rename(2_file.txt, 000_2_file.txt)
rename(3_file.txt, 000_3_file.txt)
rename(4_file.txt, 000_4_file.txt)
As you see all the 000_ are zero, because the $n is local and it is declared per line.
Thus we can make global by: our keyword like this:
~ ❱ rename -v -n 'our $n; s/^/sprintf("%.3d_",$n++)/ge' *.txt
rename(0_file.txt, 000_0_file.txt)
rename(1_file.txt, 001_1_file.txt)
rename(2_file.txt, 002_2_file.txt)
rename(3_file.txt, 003_3_file.txt)
rename(4_file.txt, 004_4_file.txt)
And eventually your demand: I want to rename from title223
I am NOT sure but I think you cannot use rename command in this way, It should work but ...!!!
And in such a case I prefer to use Perl one-liner
Renaming with Perl non-recursive version:
~ ❱ perl -le '($old=$_) && s/^/sprintf("%.3d_",$n++)/e && print "$old => $_" for <*.txt>'
0_file.txt => 000_0_file.txt
1_file.txt => 001_1_file.txt
2_file.txt => 002_2_file.txt
3_file.txt => 003_3_file.txt
4_file.txt => 004_4_file.txt
And for beginning form a special value like 223:
~ ❱ perl -le '$n=223; ($old=$_) && s/^/sprintf("%.3d_",$n++)/e && print "$old => $_" for <*.txt>'
0_file.txt => 223_0_file.txt
1_file.txt => 224_1_file.txt
2_file.txt => 225_2_file.txt
3_file.txt => 226_3_file.txt
4_file.txt => 227_4_file.txt
Just instead of print "$old => $_ use rename $old,$_ function.
Renaming with Perl recursive version:
For this one you have to use find command
~ ❱ find . -name '*.txt'
./4_file.txt
./2_file.txt
./0_file.txt
./3_file.txt
./1_file.txt
~ ❱ find . -name '*.txt' > find.log
~ ❱
~ ❱ # run perl with print just for test
~ ❱
~ ❱ perl -lne '$n=223; ($old=$_) && s/(?<=\.\/)(?=)/sprintf("%.3d_",$n++)/e && print "$old => $_"' find.log
./4_file.txt => ./223_4_file.txt
./2_file.txt => ./223_2_file.txt
./0_file.txt => ./223_0_file.txt
./3_file.txt => ./223_3_file.txt
./1_file.txt => ./223_1_file.txt
~ ❱
~ ❱ # Ohhhhh Nooooo it is still 233!
~ ❱
~ ❱ # first solution: using BEGIN{ $n=233 }
~ ❱
~ ❱ perl -lne 'BEGIN{$n=223}; ($old=$_) && s/(?<=\.\/)(?=)/sprintf("%.3d_",$n++)/e && print "$old => $_"' find.log
./4_file.txt => ./223_4_file.txt
./2_file.txt => ./224_2_file.txt
./0_file.txt => ./225_0_file.txt
./3_file.txt => ./226_3_file.txt
./1_file.txt => ./227_1_file.txt
~ ❱
~ ❱ # second solution is: using static declaration
~ ❱ # in Perl it is: state
~ ❱ # thus we should use: state $n=223;
~ ❱ # but we have to turn on Perl 5 feature with -E
~ ❱
~ ❱ perl -lnE 'state $n=233; ($old=$_) && s/(?<=\.\/)(?=)/sprintf("%.3d_",$n++)/e && print "$old => $_"' find.log
./4_file.txt => ./233_4_file.txt
./2_file.txt => ./234_2_file.txt
./0_file.txt => ./235_0_file.txt
./3_file.txt => ./236_3_file.txt
./1_file.txt => ./237_1_file.txt
~ ❱
~ ❱ # test it with rename $old,$_
~ ❱ perl -lnE 'state $n=233; ($old=$_) && s/(?<=\.\/)(?=)/sprintf("%.3d_",$n++)/e && rename $old,$_' find.log
~ ❱
~ ❱ # see the result:
~ ❱
~ ❱ ls *.txt | cat -n
1 233_4_file.txt
2 234_2_file.txt
3 235_0_file.txt
4 236_3_file.txt
5 237_1_file.txt
NOTE BEGIN{} and state does not work on rename command. Why? I do not know :)
Related
This question already has answers here:
Find files containing multiple strings
(6 answers)
Closed 4 years ago.
I have the following files:
100005.txt 107984.txt 116095.txt 124152.txt 133339.txt 139345.txt 18147.txt 25750.txt 32647.txt 40390.txt 48979.txt 56502.txt 64234.txt 72964.txt 80311.txt 888.txt 95969.txt
100176.txt 108084.txt 116194.txt 124321.txt 133435.txt 139438.txt 18331.txt 25940.txt 32726.txt 40489.txt 49080.txt 56506.txt 64323.txt 73063.txt 80481.txt 88958.txt 9601.txt
100347.txt 108255.txt 116378.txt 124494.txt 133531.txt 139976.txt 18420.txt 26034.txt 32814.txt 40589.txt 49082.txt 56596.txt 64414.txt 73163.txt 80580.txt 89128.txt 96058.txt
100447.txt 108343.txt 116467.txt 124594.txt 133627.txt 140519.txt 18509.txt 26128.txt 32903.txt 40854.txt 49254.txt 56768.txt 64418.txt 73498.txt 80616.txt 89228.txt 96148.txt
100617.txt 108432.txt 11647.txt 124766.txt 133728.txt 14053.txt 1866.txt 26227.txt 32993.txt 41026.txt 49308.txt 56857.txt 6449.txt 73670.txt 80704.txt 89400.txt 96239.txt
10071.txt 108521.txt 116556.txt 124854.txt 133830.txt 141062.txt 18770.txt 26327.txt 33093.txt 41197.txt 49387.txt 57029.txt 64508.txt 7377.txt 80791.txt 89500.txt 96335.txt
100788.txt 10897.txt 116746.txt 124943.txt 133866.txt 141630.txt 18960.txt 2646.txt 33194.txt 41296.txt 4971.txt 57128.txt 64680.txt 73841.txt 80880.txt 89504.txt 96436.txt
Some of the files look like:
spec:
annotations:
name: "ubuntu4"
labels:
key: "cont_name"
value: "ubuntuContainer4"
labels:
key: "cont_service"
value: "UbuntuService4"
task:
container:
image: "ubuntu:latest"
args: "tail"
args: "-f"
args: "/dev/null"
mounts:
source: "/home/testVolume"
target: "/opt"
replicated:
replicas: 1
I want to get every filename that contains ubuntu AND replicas.
I have tried awk '/ubuntu/ && /replicas/{print FILENAME}' *.txt but it doesn't seem to work for me.
Any ideas on how to fix this?
Grep can return a list of the files that match a string. You can nest that grep call so that you first get a list of files that match ubuntu, then use that list of files to get a list of files that match replicas.
grep -l replicas $( grep -l ubuntu *.txt )
This does assume that at least one file will match ubuntu. To get around that limitation, you can add a test for the existence of one file first, and then do the combined search:
grep -q ubuntu *.txt && grep -l replicas $( grep -l ubuntu *.txt )
Check if both strings appear in a given file by using a counter for each and then checking if they were incremented. You can do this with BEGINFILE, available on GNU awk:
awk 'BEGINFILE {ub=0; re=0}
/ubuntu/ {ub++}
/replicas/ {re++}
(ub>0 && re>0) {print FILENAME; nextfile}' *.txt
This sets two counters to 0 when it starts to read a file: one for one string and another one for the other. When one of the patterns is found, it increments its corresponding counter. Then it keeps checking if the two counters have been incremented. If so, it prints its filename using the FILENAME variable that contains that string. Also, it skips the rest of the file using nextfile, since there is no need to continue checking for the patterns.
awk '/ubuntu/ && /replicas/{print FILENAME}' *.txt
looks for both regexps on the same line. To find them both in the same file but possibly on separate lines with GNU awk for ENDFILE is:
awk '/ubuntu/{u=1} /replicas/{r=1} ENDFILE{if (u && r) print FILENAME; u=r=0}' *.txt
or more efficiently adding gawks nextfile construct and preferentially switching to BEGINFILE (as #fedorqui already showed) instead of ENDFILE since all that remains between file reads is to set the 2 variables:
awk 'BEGINFILE{u=r=0} /ubuntu/{u=1} /replicas/{r=1} u && r{print FILENAME; nextfile}' *.txt
With other awks it'd be:
awk '
FNR==1{prt()} /ubuntu/{u=1} /replicas/{r=1} END{prt()}
function prt() {if (u && r) print fname; fname=FILENAME; u=r=0}
' *.txt
If no subdirs have to been visited:
for f in *.txt
do
grep -q -m1 'ubuntu' $f && grep -q -m1 'replicas' $f && echo "found: $f"
done
or as oneliner:
for f in *.txt ; do grep -q -m1 'ubuntu' $f && grep -q -m1 replicas $f && echo found:$f ; done
The -q makes grep quiet, so the matches aren't display, the -m1 only searches for 1 match, so grep can report a match fast.
The && is short circuiting, so if the first grep doesn't find anything, the second isn't tried.
For working on the files further down the pipeline, you will of course eliminate the chatty "found: ".
I want my bash prompt paths to be shortened:
~/workspace/project/my-project
# Should be
~/w/p/my-project
This could be achieved by just shortening parts of the path string between // to just the first character.
Is there a way to do this for example in sed?
edit:
Thought someone else looking into this might find what I ended useful so I'm editing it here.
.bashrc:
dir_chomp () {
pwd | sed "s|^$HOME|~|" 2> /dev/null | sed 's:\(\.\?[^/]\)[^/]*/:\1/:g'
}
parse_git_branch() {
git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/'
}
export PS1="\[\033[32m\]\$(dir_chomp)\[\033[33m\]\$(parse_git_branch)\[\033[00m\] $ "
prompt examples (coloring doesn't show):
~/w/e/coolstuff (master) $
~/.c/A/Cache $
If you want to unconditionally shorten all path components, you can do it quite easily with sed:
sed 's:\([^/]\)[^/]*/:\1/:g'
If you want to also insert ~ at the beginning of paths which start with $HOME, you can add that to the sed command (although this naive version assumes that $HOME does not include a colon).
sed 's:^'"$HOME"':~:/;s:\([^/]\)[^/]*/:\1/:g'
A better solution is to use bash substitution:
short_pwd() {
local pwd=$(pwd)
pwd=${pwd/#$HOME/\~}
sed 's:\([^/]\)[^/]*/:\1/:g' <<<"$pwd"
}
With that bash function, you can then "call" it from your PS1 string:
$ PS1='$(short_pwd)\$ '
~/s/tmp$ PS1='\$ '
$
Use PROMPT_COMMAND to set your prompt dynamically each time it is displayed.
shorten_path () {
cwd=${PWD/workspace/w}
cwd=${cwd/project/p}
cwd=${cwd/$HOME/~}
PS1="$cwd "'\$ '
}
PROMPT_COMMAND=shorten_path
This replaces the use of \w escape with custom code to shorten the current working directory. It has the unfortunate side effect of replacing ~ with the name of your home directory, though, which is why the third line is necessary to put it back, if desired.
I use this to shorten to 3 caracters plus "..":
shortpath()
{
dir=${1%/*} && last=${1##*/}
res=$(for i in ${dir//\// } ; do echo -n "${i:0:3}../" ; done)
echo "/$res$last"
}
Version to short to one caracter:
shortpath()
{
dir=${1%/*} && last=${1##*/}
res=$(for i in ${dir//\// } ; do echo -n "${i:0:1}/" ; done)
echo "/$res$last"
}
And then:
export PS1="\$(shortpath \$(pwd)) $"
I have a file like this called new.samples.dat
-4.5000000000E-01 8.0000000000E+00 -1.3000000000E-01
5.0000000000E-02 8.0000000000E+00 3.4000000000E-01
...
I have to search all this numbers of this file in another file called Remaining.Simulations.dat and copy them in another file. I did like this
for sample_index in $(seq 1 100)
do
sample=$(awk 'NR=='$sample_index'' new.samples.dat)
grep "$sample" Remaining.Simulations.dat >> Previous.Training.dat
done
It works almost fine but it does not copy all the $sample into Previous.Training.dat even if I am sure that these are in Remaining.Simulations.dat
This errors appear
grep: invalid option -- '.'
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
Do you have any idea how to solve it?Thank you
It's because you're trying to grep for something like -4.5 and grep is treating that as an option rather than a search string. If you use -- to indicate there are no more options, this should work okay:
pax> echo -4.5000000000E-01 | grep -4.5000000000E-01
grep: invalid option -- '.'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
pax> echo -4.5000000000E-01 | grep -- -4.5000000000E-01
-4.5000000000E-01
In addition, if you pass the string 7.2 to grep, it will match any line containing 7 followed by any character followed by 2 since:
Regular expressions treat . as a special character; and
Without start and end markers, 7.2 will also match 47.2, 7.25 and so on.
With awk you can try something like:
awk '
NR==FNR {
for (i=1;i<=NF;i++) {
numbers[$i]++
}
next
}
{
for (number in numbers)
if (index ($0,number) > 0) {
print $0
}
}' new.samples.dat Remaining.Simulations.dat > anotherfile
I need a command that will help me accomplish what I am trying to do. At the moment, I am looking for all the ".html" files in a given directory, and seeing which ones contain the string "jacketprice" in any of them.
Is there a way to do this? And also, for the second (but separate) command, I will need a way to replace every instance of "jacketprice" with "coatprice", all in one command or script. If this is feasible feel free to let me know. Thanks
find . -name "*.html" -exec grep -l jacketprice {} \;
for i in `find . -name "*.html"`
do
sed -i "s/jacketprice/coatprice/g" $i
done
As for the second question,
find . -name "*.html" -exec sed -i "s/jacketprice/coatprice/g" {} \;
Use recursive grep to search through your files:
grep -r --include="*.html" jacketprice /my/dir
Alternatively turn on bash's globstar feature (if you haven't already), which allows you to use **/ to match directories and sub-directories.
$ shopt -s globstar
$ cd /my/dir
$ grep jacketprice **/*.html
$ sed -i 's/jacketprice/coatprice/g' **/*.html
Depending on whether you want this recursively or not, perl is a good option:
Find, non-recursive:
perl -nwe 'print "Found $_ in file $ARGV\n" if /jacketprice/' *.html
Will print the line where the match is found, followed by the file name. Can get a bit verbose.
Replace, non-recursive:
perl -pi.bak -we 's/jacketprice/coatprice/g' *.html
Will store original with .bak extension tacked on.
Find, recursive:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
say $ARGV if /jacketprice/'
It will print the file name for each match. Somewhat less verbose might be:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
$found{$ARGV}++ if /jacketprice/; END { say for keys %found }'
Replace, recursive:
perl -MFile::Find -pi.bak -we '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
s/jacketprice/coatprice/g'
Note: In all recursive versions, /dir is the bottom level directory you wish to search. Also, if your perl version is less than 5.10, say can be replaced with print followed by newline, e.g. print "$_\n" for keys %found.
This is the script in question:
for file in `ls products`
do
echo -n `cat products/$file \
| grep '<td>.*</td>' | grep -v 'img' | grep -v 'href' | grep -v 'input' \
| head -1 | sed -e 's/^ *<td>//g' -e 's/<.*//g'`
done
I'm going to run it on 50000+ files, which would take about 12 hours with this script.
The algorithm is as follows:
Find only lines containing table cells (<td>) that do not contain any of 'img', 'href', or 'input'.
Select the first of them, then extract the data between the tags.
The usual bash text filters (sed, grep, awk, etc.) are available, as well as perl.
Looks like that can all be replace by one gawk command:
gawk '
/<td>.*<\/td>/ && !(/img/ || /href/ || /input/) {
sub(/^ *<td>/,""); sub(/<.*/,"")
print
nextfile
}
' products/*
This uses the gawk extension nextfile.
If the wildcard expansion is too big, then
find products -type f -print | xargs gawk '...'
Here's some quick perl to do the whole thing that should be alot faster.
#!/usr/bin/perl
process_files($ARGV[0]);
# process each file in the supplied directory
sub process_files($)
{
my $dirpath = shift;
my $dh;
opendir($dh, $dirpath) or die "Cant readdir $dirpath. $!";
# get a list of files
my #files;
do {
#files = readdir($dh);
foreach my $ent ( #files ){
if ( -f "$dirpath/$ent" ){
get_first_text_cell("$dirpath/$ent");
}
}
} while ($#files > 0);
closedir($dh);
}
# return the content of the first html table cell
# that does not contain img,href or input tags
sub get_first_text_cell($)
{
my $filename = shift;
my $fh;
open($fh,"<$filename") or die "Cant open $filename. $!";
my $found = 0;
while ( ( my $line = <$fh> ) && ( $found == 0 ) ){
## capture html and text inside a table cell
if ( $line =~ /<td>([&;\d\w\s"'<>]+)<\/td>/i ){
my $cell = $1;
## omit anything with the following tags
if ( $cell !~ /<(img|href|input)/ ){
$found++;
print "$cell\n";
}
}
}
close($fh);
}
Simply invoke it by passing the directory to be searched as the first argument:
$ perl parse.pl /html/documents/
What about this (should be much faster and clearer):
for file in products/*; do
grep -P -o '(?<=<td>).*(?=<\/td>)' $file | grep -vP -m 1 '(img|input|href)'
done
the for will look to every file in products. See the difference with your syntax.
the first grep will output just the text between <td> and </td> without those tags for every cell as long as each cell is in a single line.
finally the second grep will output just the first line (which is what I believe you wanted to achieve with that head -1) of those lines which doesn't contain img, href or input (and will exit right then reducing the overall time allowing to process the next file faster)
I would have loved to use just a single grep, but then the regex will be really awful. :-)
Disclaimer: of course I haven't tested it