shell script to batch replace specific string in .csv file - shell

I want to replace some strings in my raw csv file for further use and I search for the internet and create the script so far. But it seems they doesn't work. Hope anyone can help me
The csv file is like this and I want to delete "^M" and "# Columns: " so that I can read my file.
# Task: bending1^M
# Frequency (Hz): 20^M
# Clock (millisecond): 250^M
# Duration (seconds): 120^M
# Columns: time,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23^M
#!/usr/bin/env bash
function scandir(){
cd `dirname $0`
echo `pwd`
local cur_dir parent_dir workir
workdir=$1
cd ${workdir}
if [ ${workdir}="/" ]
then
cur_dir=""
else
cur_dir=$(pwd)
fi
for dirlist in $(ls ${cur_dir})
do
if test -d ${dirlist}
then
cd ${dirlist}
scandir ${cur_dir}/${dirlist}
cd ..
else
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
fi
done
}

Your whole script looks like just:
find "$workdir" -type f | xargs -n1 sed -i -e 's/\r//g; s/^# Columns://'
Notes to your script:
Check your scripts for validity on https://www.shellcheck.net/
The of << EOF here document is invalid. The closing word EOF has to start from the beginning of the line inside the script:
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
#^^ no spaces in front of EOF, also no spaces/tabs after EOF
# the whole line needs to be exactly 'EOF'
There cannot be any spaces, tabs in front of it. Also, I don't think vi is not the best tool to run substitutions on a file, also I don't know how it acts with tabs or spaces infront of :. You may want to try to run it without whitespace characters in front of ::
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
Backticks ` are deprecated, less readable and don't allow for easy nesting. Use $( ... ) command substitution instead.
echo `pwd` is just invalid use of echo, just use pwd.
for dirlist in $(ls parsing ls output is bad. Use find command instead, or if you have to, shell globulation, ie. for dirlist in *.
if [ ${workdir}="/" ] is invalid. This tests if the string "${workdir}=/ is not null. Bash is space aware, it needs a space between = and operands. It should be if [ "${workdir}" = "/" ].
Always quote your variables. Don't cd ${dirlist} do cd "${dirlist}" and so on.

Well posted answer are corrects, but I would recommand this syntax:
find "$1" -type f -name '*.csv' -exec sed -e 's/\r$//;s/^# Columns: //' -i~ {} +
Using + instead of \; at end of find command will permit sed to work on many files at once, reducing forks and make whole job quicker.
The ~ after -i option will rename existing files by appending tilde at end of names instead of deleting them.
Using -type f will ensure working on files only (no symlinks, dirs, socket, fifos, devices...)

You can reduce the entire script to one command, and you do not have to use Vim to process the files:
find ${workdir} -name '*.csv' -exec sed -i -e 's/\r$//; /^#/d' '{}' \;
Explanation:
find <dir> -name <pattern> -exec <command> \; will search <dir> for files matchingand execute` on each file. You want to search for CSV files and do something with them (run a command on them).
The command run on every (CSV) file that was found will be sed -i -e 's/\r$//; /^#/d'. This means to edit the files in-place (-i) and run two transformations on them. s/\r$// will remove the ^M from each line, and /^#/d will remove all lines that start with a #.
'{}' is replaced with the files found by find and \; marks the end of the command run by find (see the find manual page for this).
Most of your script emulates part of the find command. That is not a good idea.
Also, for simple text processing, it is easier and faster to use sed instead of invoking an editor such as Vim.

Related

How to find files with specific extensions recursively using the for/in syntax? [duplicate]

x=$(find . -name "*.txt")
echo $x
if I run the above piece of code in Bash shell, what I get is a string containing several file names separated by blank, not a list.
Of course, I can further separate them by blank to get a list, but I'm sure there is a better way to do it.
So what is the best way to loop through the results of a find command?
TL;DR: If you're just here for the most correct answer, you probably want my personal preference (see the bottom of this post):
# execute `process` once for each file
find . -name '*.txt' -exec process {} \;
If you have time, read through the rest to see several different ways and the problems with most of them.
The full answer:
The best way depends on what you want to do, but here are a few options. As long as no file or folder in the subtree has whitespace in its name, you can just loop over the files:
for i in $x; do # Not recommended, will break on whitespace
process "$i"
done
Marginally better, cut out the temporary variable x:
for i in $(find -name \*.txt); do # Not recommended, will break on whitespace
process "$i"
done
It is much better to glob when you can. White-space safe, for files in the current directory:
for i in *.txt; do # Whitespace-safe but not recursive.
process "$i"
done
By enabling the globstar option, you can glob all matching files in this directory and all subdirectories:
# Make sure globstar is enabled
shopt -s globstar
for i in **/*.txt; do # Whitespace-safe and recursive
process "$i"
done
In some cases, e.g. if the file names are already in a file, you may need to use read:
# IFS= makes sure it doesn't trim leading and trailing whitespace
# -r prevents interpretation of \ escapes.
while IFS= read -r line; do # Whitespace-safe EXCEPT newlines
process "$line"
done < filename
read can be used safely in combination with find by setting the delimiter appropriately:
find . -name '*.txt' -print0 |
while IFS= read -r -d '' line; do
process "$line"
done
For more complex searches, you will probably want to use find, either with its -exec option or with -print0 | xargs -0:
# execute `process` once for each file
find . -name \*.txt -exec process {} \;
# execute `process` once with all the files as arguments*:
find . -name \*.txt -exec process {} +
# using xargs*
find . -name \*.txt -print0 | xargs -0 process
# using xargs with arguments after each filename (implies one run per filename)
find . -name \*.txt -print0 | xargs -0 -I{} process {} argument
find can also cd into each file's directory before running a command by using -execdir instead of -exec, and can be made interactive (prompt before running the command for each file) using -ok instead of -exec (or -okdir instead of -execdir).
*: Technically, both find and xargs (by default) will run the command with as many arguments as they can fit on the command line, as many times as it takes to get through all the files. In practice, unless you have a very large number of files it won't matter, and if you exceed the length but need them all on the same command line, you're SOL find a different way.
What ever you do, don't use a for loop:
# Don't do this
for file in $(find . -name "*.txt")
do
…code using "$file"
done
Three reasons:
For the for loop to even start, the find must run to completion.
If a file name has any whitespace (including space, tab or newline) in it, it will be treated as two separate names.
Although now unlikely, you can overrun your command line buffer. Imagine if your command line buffer holds 32KB, and your for loop returns 40KB of text. That last 8KB will be dropped right off your for loop and you'll never know it.
Always use a while read construct:
find . -name "*.txt" -print0 | while read -d $'\0' file
do
…code using "$file"
done
The loop will execute while the find command is executing. Plus, this command will work even if a file name is returned with whitespace in it. And, you won't overflow your command line buffer.
The -print0 will use the NULL as a file separator instead of a newline and the -d $'\0' will use NULL as the separator while reading.
find . -name "*.txt"|while read fname; do
echo "$fname"
done
Note: this method and the (second) method shown by bmargulies are safe to use with white space in the file/folder names.
In order to also have the - somewhat exotic - case of newlines in the file/folder names covered, you will have to resort to the -exec predicate of find like this:
find . -name '*.txt' -exec echo "{}" \;
The {} is the placeholder for the found item and the \; is used to terminate the -exec predicate.
And for the sake of completeness let me add another variant - you gotta love the *nix ways for their versatility:
find . -name '*.txt' -print0|xargs -0 -n 1 echo
This would separate the printed items with a \0 character that isn't allowed in any of the file systems in file or folder names, to my knowledge, and therefore should cover all bases. xargs picks them up one by one then ...
Filenames can include spaces and even control characters. Spaces are (default) delimiters for shell expansion in bash and as a result of that x=$(find . -name "*.txt") from the question is not recommended at all. If find gets a filename with spaces e.g. "the file.txt" you will get 2 separated strings for processing, if you process x in a loop. You can improve this by changing delimiter (bash IFS Variable) e.g. to \r\n, but filenames can include control characters - so this is not a (completely) safe method.
From my point of view, there are 2 recommended (and safe) patterns for processing files:
1. Use for loop & filename expansion:
for file in ./*.txt; do
[[ ! -e $file ]] && continue # continue, if file does not exist
# single filename is in $file
echo "$file"
# your code here
done
2. Use find-read-while & process substitution
while IFS= read -r -d '' file; do
# single filename is in $file
echo "$file"
# your code here
done < <(find . -name "*.txt" -print0)
Remarks
on Pattern 1:
bash returns the search pattern ("*.txt") if no matching file is found - so the extra line "continue, if file does not exist" is needed. see Bash Manual, Filename Expansion
shell option nullglob can be used to avoid this extra line.
"If the failglob shell option is set, and no matches are found, an error message is printed and the command is not executed." (from Bash Manual above)
shell option globstar: "If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match." see Bash Manual, Shopt Builtin
other options for filename expansion: extglob, nocaseglob, dotglob & shell variable GLOBIGNORE
on Pattern 2:
filenames can contain blanks, tabs, spaces, newlines, ... to process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL. see also Gnu Findutils Manpage, Unsafe File Name Handling, safe File Name Handling, unusual characters in filenames. See David A. Wheeler below for detailed discussion of this topic.
There are some possible patterns to process find results in a while loop. Others (kevin, David W.) have shown how to do this using pipes:
files_found=1
find . -name "*.txt" -print0 |
while IFS= read -r -d '' file; do
# single filename in $file
echo "$file"
files_found=0 # not working example
# your code here
done
[[ $files_found -eq 0 ]] && echo "files found" || echo "no files found"
When you try this piece of code, you will see, that it does not work: files_found is always "true" & the code will always echo "no files found". Reason is: each command of a pipeline is executed in a separate subshell, so the changed variable inside the loop (separate subshell) does not change the variable in the main shell script. This is why I recommend using process substitution as the "better", more useful, more general pattern.See I set variables in a loop that's in a pipeline. Why do they disappear... (from Greg's Bash FAQ) for a detailed discussion on this topic.
Additional References & Sources:
Gnu Bash Manual, Pattern Matching
Filenames and Pathnames in Shell: How to do it Correctly, David A. Wheeler
Why you don't read lines with "for", Greg's Wiki
Why you shouldn't parse the output of ls(1), Greg's Wiki
Gnu Bash Manual, Process Substitution
(Updated to include #Socowi's execellent speed improvement)
With any $SHELL that supports it (dash/zsh/bash...):
find . -name "*.txt" -exec $SHELL -c '
for i in "$#" ; do
echo "$i"
done
' {} +
Done.
Original answer (shorter, but slower):
find . -name "*.txt" -exec $SHELL -c '
echo "$0"
' {} \;
If you can assume the file names don't contain newlines, you can read the output of find into a Bash array using the following command:
readarray -t x < <(find . -name '*.txt')
Note:
-t causes readarray to strip newlines.
It won't work if readarray is in a pipe, hence the process substitution.
readarray is available since Bash 4.
Bash 4.4 and up also supports the -d parameter for specifying the delimiter. Using the null character, instead of newline, to delimit the file names works also in the rare case that the file names contain newlines:
readarray -d '' x < <(find . -name '*.txt' -print0)
readarray can also be invoked as mapfile with the same options.
Reference: https://mywiki.wooledge.org/BashFAQ/005#Loading_lines_from_a_file_or_stream
# Doesn't handle whitespace
for x in `find . -name "*.txt" -print`; do
process_one $x
done
or
# Handles whitespace and newlines
find . -name "*.txt" -print0 | xargs -0 -n 1 process_one
I like to use find which is first assigned to variable and IFS switched to new line as follow:
FilesFound=$(find . -name "*.txt")
IFSbkp="$IFS"
IFS=$'\n'
counter=1;
for file in $FilesFound; do
echo "${counter}: ${file}"
let counter++;
done
IFS="$IFSbkp"
As commented by #Konrad Rudolph this will not work with "new lines" in file name. I still think it is handy as it covers most of the cases when you need to loop over command output.
As already posted on the top answer by Kevin, the best solution is to use a for loop with bash glob, but as bash glob is not recursive by default, this can be fixed by a bash recursive function:
#!/bin/bash
set -x
set -eu -o pipefail
all_files=();
function get_all_the_files()
{
directory="$1";
for item in "$directory"/* "$directory"/.[^.]*;
do
if [[ -d "$item" ]];
then
get_all_the_files "$item";
else
all_files+=("$item");
fi;
done;
}
get_all_the_files "/tmp";
for file_path in "${all_files[#]}"
do
printf 'My file is "%s"\n' "$file_path";
done;
Related questions:
Bash loop through directory including hidden file
Recursively list files from a given directory in Bash
ls command: how can I get a recursive full-path listing, one line per file?
List files recursively in Linux CLI with path relative to the current directory
Recursively List all directories and files
bash script, create array of all files in a directory
How can I creates array that contains the names of all the files in a folder?
How can I creates array that contains the names of all the files in a folder?
How to get the list of files in a directory in a shell script?
based on other answers and comment of #phk, using fd #3:
(which still allows to use stdin inside the loop)
while IFS= read -r f <&3; do
echo "$f"
done 3< <(find . -iname "*filename*")
You can put the filenames returned by find into an array like this:
array=()
while IFS= read -r -d ''; do
array+=("$REPLY")
done < <(find . -name '*.txt' -print0)
Now you can just loop through the array to access individual items and do whatever you want with them.
Note: It's white space safe.
You can store your find output in array if you wish to use the output later as:
array=($(find . -name "*.txt"))
Now to print the each element in new line, you can either use for loop iterating to all the elements of array, or you can use printf statement.
for i in ${array[#]};do echo $i; done
or
printf '%s\n' "${array[#]}"
You can also use:
for file in "`find . -name "*.txt"`"; do echo "$file"; done
This will print each filename in newline
To only print the find output in list form, you can use either of the following:
find . -name "*.txt" -print 2>/dev/null
or
find . -name "*.txt" -print | grep -v 'Permission denied'
This will remove error messages and only give the filename as output in new line.
If you wish to do something with the filenames, storing it in array is good, else there is no need to consume that space and you can directly print the output from find.
I think using this piece of code (piping the command after while done):
while read fname; do
echo "$fname"
done <<< "$(find . -name "*.txt")"
is better than this answer because while loop is executed in a subshell according to here, if you use this answer and variable changes cannot be seen after while loop if you want to modify variables inside the loop.
function loop_through(){
length_="$(find . -name '*.txt' | wc -l)"
length_="${length_#"${length_%%[![:space:]]*}"}"
length_="${length_%"${length_##*[![:space:]]}"}"
for i in {1..$length_}
do
x=$(find . -name '*.txt' | sort | head -$i | tail -1)
echo $x
done
}
To grab the length of the list of files for loop, I used the first command "wc -l".
That command is set to a variable.
Then, I need to remove the trailing white spaces from the variable so the for loop can read it.
find <path> -xdev -type f -name *.txt -exec ls -l {} \;
This will list the files and give details about attributes.
Another alternative is to not use bash, but call Python to do the heavy lifting. I recurred to this because bash solutions as my other answer were too slow.
With this solution, we build a bash array of files from inline Python script:
#!/bin/bash
set -eu -o pipefail
dsep=":" # directory_separator
base_directory=/tmp
all_files=()
all_files_string="$(python3 -c '#!/usr/bin/env python3
import os
import sys
dsep="'"$dsep"'"
base_directory="'"$base_directory"'"
def log(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def check_invalid_characther(file_path):
for thing in ("\\", "\n"):
if thing in file_path:
raise RuntimeError(f"It is not allowed {thing} on \"{file_path}\"!")
def absolute_path_to_relative(base_directory, file_path):
relative_path = os.path.commonprefix( [ base_directory, file_path ] )
relative_path = os.path.normpath( file_path.replace( relative_path, "" ) )
# if you use Windows Python, it accepts / instead of \\
# if you have \ on your files names, rename them or comment this
relative_path = relative_path.replace("\\", "/")
if relative_path.startswith( "/" ):
relative_path = relative_path[1:]
return relative_path
for directory, directories, files in os.walk(base_directory):
for file in files:
local_file_path = os.path.join(directory, file)
local_file_name = absolute_path_to_relative(base_directory, local_file_path)
log(f"local_file_name {local_file_name}.")
check_invalid_characther(local_file_name)
print(f"{base_directory}{dsep}{local_file_name}")
' | dos2unix)";
if [[ -n "$all_files_string" ]];
then
readarray -t temp <<< "$all_files_string";
all_files+=("${temp[#]}");
fi;
for item in "${all_files[#]}";
do
OLD_IFS="$IFS"; IFS="$dsep";
read -r base_directory local_file_name <<< "$item"; IFS="$OLD_IFS";
printf 'item "%s", base_directory "%s", local_file_name "%s".\n' \
"$item" \
"$base_directory" \
"$local_file_name";
done;
Related:
os.walk without hidden folders
How to do a recursive sub-folder search and return files in a list?
How to split a string into an array in Bash?
How about if you use grep instead of find?
ls | grep .txt$ > out.txt
Now you can read this file and the filenames are in the form of a list.

shell script does not find the directory

I'm starting in the shell script.I'm need to make the checksum of a lot of files, so I thought to automate the process using an shell script.
I make to scripts: the first script uses an recursive ls command with an egrep -v that receive as parameter the path of file inputed by me, these command is saved in a ambient variable that converts the output in a string, follow by a loop(for) that cut the output's string in lines and pass these lines as a parameter when calling the second script; The second script take this parameter and pass they as parameter to hashdeep command,wich in turn is saved in another ambient variable that, as in previous script,convert the output's command in a string and cut they using IFS,lastly I'm take the line of interest and put then in a text file.
The output is:
/home/douglas/Trampo/shell_scripts/2016-10-27-001757.jpg: No such file
or directory
----Checksum FILE: 2016-10-27-001757.jpg
----Checksum HASH:
the issue is: I sets as parameter the directory ~/Pictures but in the output error they return another directory,/home/douglas/Trampo/shell_scripts/(the own directory), in this case, the file 2016-10-27-001757.jpg is in the ~/Pictures directory,why the script is going in its own directory?
First script:
#/bin/bash
arquivos=$(ls -R $1 | egrep -v '^d')
for linha in $arquivos
do
bash ./task2.sh $linha
done
second script:
#/bin/bash
checksum=$(hashdeep $1)
concatenado=''
for i in $checksum
do
concatenado+=$i
done
IFS=',' read -ra ADDR <<< "$concatenado"
echo
echo '----Checksum FILE:' $1
echo '----Checksum HASH:' ${ADDR[4]}
echo
echo ${ADDR[4]} >> ~/Trampo/shell_scripts/txt2.txt
I think that's...sorry about the English grammatic errors.
I hope that the question has become clear.
Thanks ins advanced!
There are several wrong in the first script alone.
When running ls in recursive mode using -R, the output is listed per directory and each file is listed relative to their parent instead of full pathname.
ls -R doesn't list the directory in long format as implied by | grep -v ^d where it seems you are looking for files (non directories).
In your specific case, the missing file 2016-10-27-001757.jpg is in a subdirectory but you lost the location by using ls -R.
Do not parse the output of ls. Use find and you won't have the same issue.
First script can be replaced by a single line.
Try this:
#!/bin/bash
find $1 -type f -exec ./task2.sh "{}" \;
Or if you prefer using xargs, try this:
#!/bin/bash
find $1 -type f -print0 | xargs -0 -n1 -I{} ./task2.sh "{}"
Note: enclosing {} in quotes ensures that task2.sh receives a complete filename even if it contains spaces.
In task2.sh the parameter $1 should also be quoted "$1".
If task2.sh is executable, you are all set. If not, add bash in the line so it reads as:
find $1 -type f -exec bash ./task2.sh "{}" \;
task2.sh, though not posted in the original question, is not executable. It has a missing execute permission.
Add execute permission to it by running chmod like:
chmod a+x task2.sh
Goodluck.

Shell script: Check if a Directory is of YYYY_MM_DD_HH this format

I have a script that creates a file list of directories available in another path.
Now, I would like to do some tasks only if the Directory is of the format "YYYY_MM_DD_HH" in this file list.
My file list has following entries:
2014_04_21_01
asdf
2012_01_19_10
2010_01
Now I would like to move the directories with names as YYYY_MM_DD_HH to another path. I.e., only 2014_04_21_01 & 2012_01_19_10 MUST be MOVED.
Please advise.
Use bash regex pattern matching:
for dir in $list
do if [[ "$dir" =~ ^[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}$ ]]
then mv "$dir" newdir/
fi
done
Assuming you have a GNU version of sed on your computer, you could use it to easily parse your directory names and execute a command.
Say we have following input file:
2014_04_21_01
asdf
2012_01_19_10
2010_01
2012_01_19_10_09
62012_01_19_10
You can search for your regex with sed and replace it with a mv command as follows:
sed 's/^[0-9]\{4\}\(_[0-9]\{2\}\)\{3\}$/mv "&" "other_dir"/' file_list
will output:
mv "2014_04_21_01" "other_dir" # We want to run this
asdf
mv "2012_01_19_10" "other_dir" # and this
2010_01
2012_01_19_10_09
62012_01_19_10
Now if you add the (GNU sed) e option at the end of sed substitution (and -n option before sed script to ensure only successul substitutions are executed), the generated command will be piped into your shell:
sed -n 's/^[0-9]\{4\}\(_[0-9]\{2\}\)\{3\}$/mv "&" "other_dir"/e' file_list
# ^^ ^
I would recommand to run it first without the e option so as to check that mv commands will be properly formatted.
Why to make separate file for file list. Just go in that directory execute following command. I have taken the destination directory as /home/newdir/
ls | grep [0-9][0-9][0-9][0-9]_[01][0-9]_[0123][0-9]_[012][0-9] | awk '{print $0" /home/newdir/"}' | xargs mv
Be Careful while working with dates. As you have mentioned that file name is in format YYYY_MM_DD_HH then we have restrictions on MM,DD and HH. If we talk about restrictions then we know how a calendar is constructed. So 9999_99_99_99 is invalid file name. It is not satisfying YYYY_MM_DD_HH.
We have to build script for restrictions or I can say whole calendar. Still working on it.
Example:
perl -nle 'system("mv $_ dir/year$1") if /^(\d{4})_\d\d_\d\d_\d\d/$' flist
would extract the year and rename dir 2014_04_21_01 to dir/year2014
This single find command with -regex option should take care of this:
cd /base/path/of/these/dirs
find . -type d -regextype posix-egrep -regex '.*/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]{2}$' \
-exec mv '{}' /dest/dir/ \;

How can I convert tabs to spaces in every file of a directory?

How can I convert tabs to spaces in every file of a directory (possibly recursively)?
Also, is there a way of setting the number of spaces per tab?
Simple replacement with sed is okay but not the best possible solution. If there are "extra" spaces between the tabs they will still be there after substitution, so the margins will be ragged. Tabs expanded in the middle of lines will also not work correctly. In bash, we can say instead
find . -name '*.java' ! -type d -exec bash -c 'expand -t 4 "$0" > /tmp/e && mv /tmp/e "$0"' {} \;
to apply expand to every Java file in the current directory tree. Remove / replace the -name argument if you're targeting some other file types. As one of the comments mentions, be very careful when removing -name or using a weak, wildcard. You can easily clobber repository and other hidden files without intent. This is why the original answer included this:
You should always make a backup copy of the tree before trying something like this in case something goes wrong.
Try the command line tool expand.
expand -i -t 4 input | sponge output
where
-i is used to expand only leading tabs on each line;
-t 4 means that each tab will be converted to 4 whitespace chars (8 by default).
sponge is from the moreutils package, and avoids clearing the input file. On macOS, the package moreutils is available via Homebrew (brew install moreutils) or MacPorts (sudo port install moreutils).
Finally, you can use gexpand on macOS, after installing coreutils with Homebrew (brew install coreutils) or MacPorts (sudo port install coreutils).
Warning: This will break your repo.
This will corrupt binary files, including those under svn, .git! Read the comments before using!
find . -iname '*.java' -type f -exec sed -i.orig 's/\t/ /g' {} +
The original file is saved as [filename].orig.
Replace '*.java' with the file ending of the file type you are looking for. This way you can prevent accidental corruption of binary files.
Downsides:
Will replace tabs everywhere in a file.
Will take a long time if you happen to have a 5GB SQL dump in this directory.
Collecting the best comments from Gene's answer, the best solution by far, is by using sponge from moreutils.
sudo apt-get install moreutils
# The complete one-liner:
find ./ -iname '*.java' -type f -exec bash -c 'expand -t 4 "$0" | sponge "$0"' {} \;
Explanation:
./ is recursively searching from current directory
-iname is a case insensitive match (for both *.java and *.JAVA likes)
type -f finds only regular files (no directories, binaries or symlinks)
-exec bash -c execute following commands in a subshell for each file name, {}
expand -t 4 expands all TABs to 4 spaces
sponge soak up standard input (from expand) and write to a file (the same one)*.
NOTE: * A simple file redirection (> "$0") won't work here because it would overwrite the file too soon.
Advantage: All original file permissions are retained and no intermediate tmp files are used.
Use backslash-escaped sed.
On linux:
Replace all tabs with 1 hyphen inplace, in all *.txt files:
sed -i $'s/\t/-/g' *.txt
Replace all tabs with 1 space inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i $'s/\t/ /g' *.txt
On a mac:
Replace all tabs with 4 spaces inplace, in all *.txt files:
sed -i '' $'s/\t/ /g' *.txt
You can use the generally available pr command (man page here). For example, to convert tabs to four spaces, do this:
pr -t -e=4 file > file.expanded
-t suppresses headers
-e=num expands tabs to num spaces
To convert all files in a directory tree recursively, while skipping binary files:
#!/bin/bash
num=4
shopt -s globstar nullglob
for f in **/*; do
[[ -f "$f" ]] || continue # skip if not a regular file
! grep -qI "$f" && continue # skip binary files
pr -t -e=$num "$f" > "$f.expanded.$$" && mv "$f.expanded.$$" "$f"
done
The logic for skipping binary files is from this post.
NOTE:
Doing this could be dangerous in a git or svn repo
This is not the right solution if you have code files that have bare tabs embedded in string literals
My recommendation is to use:
find . -name '*.lua' -exec ex '+%s/\t/ /g' -cwq {} \;
Comments:
Use in place editing. Keep backups in a VCS. No need to produce *.orig files. It's good practice to diff the result against your last commit to make sure this worked as expected, in any case.
sed is a stream editor. Use ex for in place editing. This avoids creating extra temp files and spawning shells for each replacement as in the top answer.
WARNING: This messes with all tabs, not only those used for indentation. Also it does not do context aware replacement of tabs. This was sufficient for my use case. But might not be acceptable for you.
EDIT: An earlier version of this answer used find|xargs instead of find -exec. As pointed out by #gniourf-gniourf this leads to problems with spaces, quotes and control chars in file names cf. Wheeler.
You can use find with tabs-to-spaces package for this.
First, install tabs-to-spaces
npm install -g tabs-to-spaces
then, run this command from the root directory of your project;
find . -name '*' -exec t2s --spaces 2 {} \;
This will replace every tab character with 2 spaces in every file.
How can I convert tabs to spaces in every file of a directory (possibly
recursively)?
This is usually not what you want.
Do you want to do this for png images? PDF files? The .git directory? Your
Makefile (which requires tabs)? A 5GB SQL dump?
You could, in theory, pass a whole lot of exlude options to find or whatever
else you're using; but this is fragile, and will break as soon as you add other
binary files.
What you want, is at least:
Skip files over a certain size.
Detect if a file is binary by checking for the presence of a NULL byte.
Only replace tabs at the start of a file (expand does this, sed
doesn't).
As far as I know, there is no "standard" Unix utility that can do this, and it's not very easy to do with a shell one-liner, so a script is needed.
A while ago I created a little script called
sanitize_files which does exactly
that. It also fixes some other common stuff like replacing \r\n with \n,
adding a trailing \n, etc.
You can find a simplified script without the extra features and command-line arguments below, but I
recommend you use the above script as it's more likely to receive bugfixes and
other updated than this post.
I would also like to point out, in response to some of the other answers here,
that using shell globbing is not a robust way of doing this, because sooner
or later you'll end up with more files than will fit in ARG_MAX (on modern
Linux systems it's 128k, which may seem a lot, but sooner or later it's not
enough).
#!/usr/bin/env python
#
# http://code.arp242.net/sanitize_files
#
import os, re, sys
def is_binary(data):
return data.find(b'\000') >= 0
def should_ignore(path):
keep = [
# VCS systems
'.git/', '.hg/' '.svn/' 'CVS/',
# These files have significant whitespace/tabs, and cannot be edited
# safely
# TODO: there are probably more of these files..
'Makefile', 'BSDmakefile', 'GNUmakefile', 'Gemfile.lock'
]
for k in keep:
if '/%s' % k in path:
return True
return False
def run(files):
indent_find = b'\t'
indent_replace = b' ' * indent_width
for f in files:
if should_ignore(f):
print('Ignoring %s' % f)
continue
try:
size = os.stat(f).st_size
# Unresolvable symlink, just ignore those
except FileNotFoundError as exc:
print('%s is unresolvable, skipping (%s)' % (f, exc))
continue
if size == 0: continue
if size > 1024 ** 2:
print("Skipping `%s' because it's over 1MiB" % f)
continue
try:
data = open(f, 'rb').read()
except (OSError, PermissionError) as exc:
print("Error: Unable to read `%s': %s" % (f, exc))
continue
if is_binary(data):
print("Skipping `%s' because it looks binary" % f)
continue
data = data.split(b'\n')
fixed_indent = False
for i, line in enumerate(data):
# Fix indentation
repl_count = 0
while line.startswith(indent_find):
fixed_indent = True
repl_count += 1
line = line.replace(indent_find, b'', 1)
if repl_count > 0:
line = indent_replace * repl_count + line
data = list(filter(lambda x: x is not None, data))
try:
open(f, 'wb').write(b'\n'.join(data))
except (OSError, PermissionError) as exc:
print("Error: Unable to write to `%s': %s" % (f, exc))
if __name__ == '__main__':
allfiles = []
for root, dirs, files in os.walk(os.getcwd()):
for f in files:
p = '%s/%s' % (root, f)
if do_add:
allfiles.append(p)
run(allfiles)
I like the "find" example above for the recursive application. To adapt it to be non-recursive, only changing files in the current directory that match a wildcard, the shell glob expansion can be sufficient for small amounts of files:
ls *.java | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh -v
If you want it silent after you trust that it works, just drop the -v on the sh command at the end.
Of course you can pick any set of files in the first command. For example, list only a particular subdirectory (or directories) in a controlled manner like this:
ls mod/*/*.php | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
Or in turn run find(1) with some combination of depth parameters etc:
find mod/ -name '*.php' -mindepth 1 -maxdepth 2 | awk '{print "expand -t 4 ", $0, " > /tmp/e; mv /tmp/e ", $0}' | sh
I used astyle to re-indent all my C/C++ code after finding mixed tabs and spaces. It also has options to force a particular brace style if you'd like.
One can use vim for that:
find -type f \( -name '*.css' -o -name '*.html' -o -name '*.js' -o -name '*.php' \) -execdir vim -c retab -c wq {} \;
As Carpetsmoker stated, it will retab according to your vim settings. And modelines in the files, if any. Also, it will replace tabs not only at the beginning of the lines. Which is not what you generally want. E.g., you might have literals, containing tabs.
To convert all Java files recursively in a directory to use 4 spaces instead of a tab:
find . -type f -name *.java -exec bash -c 'expand -t 4 {} > /tmp/stuff;mv /tmp/stuff {}' \;
No body mentioned rpl? Using rpl you can replace any string.
To convert tabs to spaces,
rpl -R -e "\t" " " .
very simple.
Download and run the following script to recursively convert hard tabs to soft tabs in plain text files.
Execute the script from inside the folder which contains the plain text files.
#!/bin/bash
find . -type f -and -not -path './.git/*' -exec grep -Iq . {} \; -and -print | while read -r file; do {
echo "Converting... "$file"";
data=$(expand --initial -t 4 "$file");
rm "$file";
echo "$data" > "$file";
}; done;
Git repository friendly method
git-tab-to-space() (
d="$(mktemp -d)"
git grep --cached -Il '' | grep -E "${1:-.}" | \
xargs -I'{}' bash -c '\
f="${1}/f" \
&& expand -t 4 "$0" > "$f" && \
chmod --reference="$0" "$f" && \
mv "$f" "$0"' \
'{}' "$d" \
;
rmdir "$d"
)
Act on all files under the current directory:
git-tab-to-space
Act only on C or C++ files:
git-tab-to-space '\.(c|h)(|pp)$'
You likely want this notably because of those annoying Makefiles which require tabs.
The command git grep --cached -Il '':
lists only the tracked files, so nothing inside .git
excludes directories, binary files (would be corrupted), and symlinks (would be converted to regular files)
as explained at: How to list all text (non-binary) files in a git repository?
chmod --reference keeps the file permissions unchanged: https://unix.stackexchange.com/questions/20645/clone-ownership-and-permissions-from-another-file Unfortunately I can't find a succinct POSIX alternative.
If your codebase had the crazy idea to allow functional raw tabs in strings, use:
expand -i
and then have fun going over all non start of line tabs one by one, which you can list with: Is it possible to git grep for tabs?
Tested on Ubuntu 18.04.
The use of expand as suggested in other answers seems the most logical approach for this task alone.
That said, it can also be done with Bash and Awk in case you may want to do some other modifications along with it.
If using Bash 4.0 or greater, the shopt builtin globstar can be used to search recursively with **.
With GNU Awk version 4.1 or greater, sed like "inplace" file modifications can be made:
shopt -s globstar
gawk -i inplace '{gsub("\t"," ")}1' **/*.ext
In case you want to set the number of spaces per tab:
gawk -i inplace -v n=4 'BEGIN{for(i=1;i<=n;i++) c=c" "}{gsub("\t",c)}1' **/*.ext
Converting tabs to space in just in ".lua" files [tabs -> 2 spaces]
find . -iname "*.lua" -exec sed -i "s#\t# #g" '{}' \;
Use the vim-way:
$ ex +'bufdo retab' -cxa **/*.*
Make the backup! before executing the above command, as it can corrupt your binary files.
To use globstar (**) for recursion, activate by shopt -s globstar.
To specify specific file type, use for example: **/*.c.
To modify tabstop, add +'set ts=2'.
However the down-side is that it can replace tabs inside the strings.
So for slightly better solution (by using substitution), try:
$ ex -s +'bufdo %s/^\t\+/ /ge' -cxa **/*.*
Or by using ex editor + expand utility:
$ ex -s +'bufdo!%!expand -t2' -cxa **/*.*
For trailing spaces, see: How to remove trailing whitespaces for multiple files?
You may add the following function into your .bash_profile:
# Convert tabs to spaces.
# Usage: retab *.*
# See: https://stackoverflow.com/q/11094383/55075
retab() {
ex +'set ts=2' +'bufdo retab' -cxa $*
}

How to do something to every file in a directory using bash?

I started with this:
command *
But it doesn't work when the directory is empty; the * wildcard becomes a literal "*" character. So I switched to this:
for i in *; do
...
done
which works, but again, not if the directory is empty. I resorted to using ls:
for i in `ls -A`
but of course, then file names with spaces in them get split. I tried tacking on the -Q switch:
for i in `ls -AQ`
which causes the names to still be split, only with a quote character at the beginning and ending of the name. Am I missing something obvious here, or is this harder than it ought it be?
Assuming you only want to do something to files, the simple solution is to test if $i is a file:
for i in *
do
if test -f "$i"
then
echo "Doing somthing to $i"
fi
done
You should really always make such tests, because you almost certainly don't want to treat files and directories in the same way. Note the quotes around the "$i" which prevent problems with filenames containing spaces.
find could be what you want.
find . | while read file; do
# do something with $file
done
Or maybe like this:
find . -exec <command> {} \;
If you do not want the search to include subdirectories you might need to add a combination of -type f and -maxdepth 1 to the find command. See the find man page for details.
It depends whether you're going to type this at a command prompt, and which command you're applying to the files.
If it's typed you could go with your second choice and substitute something harmless for the command. I like to use echo instead of mv or rm, for example.
Put it all on one line:
for i in * ; do command $i; done
When that works - or you can see where it fails, and whether it's harmless, you can press up-arrow, edit the command and try again.
Use shopt to prevent expansion to *.txt
shopt -s nullglob
for myfile in *.txt
do
# your code here
echo $myfile
done
this should do the trick:
find -type d -print0 | xargs -n 1 -0 echo "your folder: {} !"
find -type f -print0 | xargs -n 1 -0 echo "your file: {} !"
the print0 / 0 are there to avoid problems with whitespace

Resources