"filename too long" bash mv command old files - bash

#! /bin/sh -
cd /PHOTAN || exit
fn=$(ls -t | tail -n -30)
mv -f -- "${fn}" /old
all I want todo is keep most recent 30 files... but cant get past the mv
"File name too long" problem
please help'

The notation "${fn}" adds all the file names into a single argument string, separated by spaces. Just for once, assuming you don't have to worry about file names with spaces in them, you need:
mv -f -- ${fn} /old
If you have file names with spaces in them, then you've got problems starting with parsing the output of the ls command.
But what if you do have to worry about spaces in your filenames?
Then, as I stated, you have major problems, starting with the issues of parsing the output of ls.
$ echo > 'a b'
$ echo > ' c d '
$
Two nice file names with spaces in them. They cause merry hell. I'm about to assume you're on Linux or something similar enough. You need to use bash arrays, the stat command, printf, sort -z, sed -z. Or you should simply outlaw filenames with spaces; it is probably easier.
names=( * )
The array names contains each file name as a separate array element, leading and trailing and embedded blanks all handled correctly.
names=( * )
for file in "${names[#]}"
do printf "%s\0" "$(stat -c '%Y' "$file") $file"
done |
sort -nzr |
sed -nze '1,30s/^[0-9][0-9]* //p' |
tr '\0' '\n'
The for loop evaluates the modification time of each file separately, and combines the modification time, a space, and the file name into a single string followed by a null byte to mark the end of the string. The sort command sorts the 'lines' numerically, assuming the lines are terminated by null bytes because of the -z option, and places the most recent file names first. The sed command prints the first 30 'lines' (file names) only; the tr command replaces null bytes with newlines (but in doing so, loses the identity of file name boundaries).
The code works even with file names containing newlines, but only on systems where sed and sort support the (non-standard) -z option to process null-terminated input 'lines' — that means systems using GNU sed and sort (even BSD sed as found on Mac OS X does not, though the Mac OS X sort is GNU sort and does support -z).
Ugh! The shell was designed for spaces to appear between and not within file names.
As noted by BroSlow in a comment, if you assume 'no newlines in filenames', then the code can be simpler and more nearly portable — but it is still tricky:
ls -t |
tail -30 |
{
list=()
while IFS='' read -r file
do list+=( "$file" )
done
mv -f -- "${list[#]}" /old
}
The IFS='' is needed so that leading and trailing spaces in filenames are preserved (and tabs, too).
I note in passing that the Korn shell would not require the braces but Bash does.

Related

Deleting first n rows and column x from multiple files using Bash script

I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.
file0001.csv (there are several hundred files like these in one folder)
Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
Desired output
1,1027.84
2,1027.92
3,1028
4,1028.81
I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.
sem2csv.sh
for files in 'ls *.csv' #list of all .csv files
do
sed '1,2d' -i $files | cut -f '1-2' -d ','
done
Actual output:
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm
I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?
The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.
Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to
sed -i 1,2d ls *.csv
and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.
Anyway, the ls is useless -- the proper idiom is
for files in *.csv; do
sed '1,2d' "$files" ... # the double quotes here are important
done
Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be
for f in *.csv; do
sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
done
which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")
You may use awk,
$ awk 'NR>2{sub(/,[^,]*$/,"",$0);print}' file
1,1027.84
2,1027.92
3,1028
4,1028.81
or
sed -i '1,2d;s/,[^,]*$//' file
1,2d; for deleting the first two lines.
s/,[^,]*$// removes the last comma part in remaining lines.

A bash command that adds newlines to *.cpp and *.h (or any specified filetypes)?

I tried running
echo "\n" >> *.cpp *.h
But I got
*.cpp: Ambiguous.
A simple command would do, but added bonus if it only adds it to non-empty files that don't end with newline already
This doesn't win any beauty awards, but it's explicit and functional at least, and matches the requirements exactly (only adds newlines to files that are not empty and that do not end with a newline):
for f in *.cpp *.h; do
if [[ $(tail -c1 "$f"; echo x) =~ [^$'\n']x ]]; then
echo >> "$f"
fi
done
Explanation:
tail -c1 "$f" outputs the last character from the file, if any.
echo x is a hack to make sure newlines are kept. (Trailing newlines are deleted from $() command substitutions.)
=~ compares against the regular expression [^\n]x, i.e., some character (there must be a character) that isn't a newline followed by x. We need $'\n' (a bashism) instead of \n to match a literal newline.
echo >> "$f" adds a newline to the end of a file. (echo will print a terminating newline unless the -n flag is used.)
Update:
Here's a simpler version:
for f in *.cpp *.h; do
if tail -c1 "$f" | egrep -q .; then
echo >> "$f"
fi
done
egrep matches line-by-line, and so won't match against a newline. The regular expression . will therefore match any character (and there must be a character) that isn't a newline. (if on a pipeline checks the exit status of the last command in the pipeline -- egrep in this case. -q makes egrep silent.)
If what you are attempting is to ensure that every file ends with a newline character (as opposed to ending with a single blank line), then you could do this with either GNU or BSD sed (Posix standard sed doesn't have to recognize the -i option, but most do):
sed -n -i.bak 'G;P' *.{cc,h}
The -n option suppresses automatic printing. -i.bak edits in place, keeping a backup file with the .bak extension. (If you don't want the backup file, supply an empty string as the extension.)
The program:
G append a newline and the hold buffer (which is empty)
P print the pattern space up to the first newline.

Bash variables not acting as expected

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done
You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done
The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

Bash and filenames with spaces

The following is a simple Bash command line:
grep -li 'regex' "filename with spaces" "filename"
No problems. Also the following works just fine:
grep -li 'regex' $(<listOfFiles.txt)
where listOfFiles.txt contains a list of filenames to be grepped, one
filename per line.
The problem occurs when listOfFiles.txt contains filenames with
embedded spaces. In all cases I've tried (see below), Bash splits the
filenames at the spaces so, for example, a line in listOfFiles.txt
containing a name like ./this is a file.xml ends up trying to run
grep on each piece (./this, is, a and file.xml).
I thought I was a relatively advanced Bash user, but I cannot find a
simple magic incantation to get this to work. Here are the things I've
tried.
grep -li 'regex' `cat listOfFiles.txt`
Fails as described above (I didn't really expect this to work), so I
thought I'd put quotes around each filename:
grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Bash interprets the quotes as part of the filename and gives "No such
file or directory" for each file (and still splits the filenames with
blanks)
for i in $(<listOfFiles.txt); do grep -li 'regex' "$i"; done
This fails as for the original attempt (that is, it behaves as if the
quotes are ignored) and is very slow since it has to launch one 'grep'
process per file instead of processing all files in one invocation.
The following works, but requires some careful double-escaping if
the regular expression contains shell metacharacters:
eval grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Is this the only way to construct the command line so it will
correctly handle filenames with spaces?
Try this:
(IFS=$'\n'; grep -li 'regex' $(<listOfFiles.txt))
IFS is the Internal Field Separator. Setting it to $'\n' tells Bash to use the newline character to delimit filenames. Its default value is $' \t\n' and can be printed using cat -etv <<<"$IFS".
Enclosing the script in parenthesis starts a subshell so that only commands within the parenthesis are affected by the custom IFS value.
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -li 'regex'
The -0 option on xargs tells xargs to use a null character rather than white space as a filename terminator. The tr command converts the incoming newlines to a null character.
This meets the OP's requirement that grep not be invoked multiple times. It has been my experience that for a large number of files avoiding the multiple invocations of grep improves performance considerably.
This scheme also avoids a bug in the OP's original method because his scheme will break where listOfFiles.txt contains a number of files that would exceed the buffer size for the commands. xargs knows about the maximum command size and will invoke grep multiple times to avoid that problem.
A related problem with using xargs and grep is that grep will prefix the output with the filename when invoked with multiple files. Because xargs invokes grep with multiple files one will receive output with the filename prefixed, but not for the case of one file in listOfFiles.txt or the case of multiple invocations where the last invocation contains one filename. To achieve consistent output add /dev/null to the grep command:
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -i 'regex' /dev/null
Note that was not an issue for the OP because he was using the -l option on grep; however it is likely to be an issue for others.
This works:
while read file; do grep -li dtw "$file"; done < listOfFiles.txt
With Bash 4, you can also use the builtin mapfile function to set an array containing each line and iterate on this array:
$ tree
.
├── a
│ ├── a 1
│ └── a 2
├── b
│ ├── b 1
│ └── b 2
└── c
├── c 1
└── c 2
3 directories, 6 files
$ mapfile -t files < <(find -type f)
$ for file in "${files[#]}"; do
> echo "file: $file"
> done
file: ./a/a 2
file: ./a/a 1
file: ./b/b 2
file: ./b/b 1
file: ./c/c 2
file: ./c/c 1
Though it may overmatch, this is my favorite solution:
grep -i 'regex' $(cat listOfFiles.txt | sed -e "s/ /?/g")
Do note that if you somehow ended up with a list in a file which has Windows line endings, \r\n, NONE of the notes above about the input file separator $IFS (and quoting the argument) will work; so make sure that the line endings are correctly \n (I use scite to show the line endings, and easily change them from one to the other).
Also cat piped into while file read ... seems to work (apparently without need to set separators):
cat <(echo -e "AA AA\nBB BB") | while read file; do echo $file; done
... although for me it was more relevant for a "grep" through a directory with spaces in filenames:
grep -rlI 'search' "My Dir"/ | while read file; do echo $file; grep 'search\|else' "$ix"; done

Resources