how to use sed command to search and replace recursively in subdirectories? - shell

I want to search for var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); and replace it with 'google analytics code' in all the files which are under home directory.
In php eclips editor it shows all the occurrences but cannot replace all at a time.
So, I am thinking to use shell script for it.
/home/project/news4u/web is my rot directory and how to use sed command for achieving it.
I tried sed s/'var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");'/'google analytics' *

If you have GNU sed it has -i option that can edit files in-place.
And if you have bash you need to enable globstar:
shopt -s globstar
because afair sed can walk through directory tree itself
sed -i.bak "s#'var gaJsHost = ((\"https:\" == document.location.protocol) ? \"https://ssl.\" : "http://www.");'#'google analytics'#" **
-i.bak will backup each file with .bak extension. It is usefull if somthing goes wrong.
I advise you to check at first with "dry" run without -i option on some specific file if everything works fine and only after that to run it with -i.
ps. If you don't have GNU sed or bash or both you can do it more complicated, but POSIX compatbile way:
find . -type f -exec sed "s#'var gaJsHost = ((\"https:\" == document.location.protocol) ? \"https://ssl.\" : "http://www.");'#'google analytics'#" {} > {}.tmp \; -exec mv {}.tmp {} \;

Related

How to use sed to replace "word{word" or "word}word"

I tried to write a bash-script to replace things like:
{"Ausstattung":"
"},{"Ausstattung":"
"}
in all CSV files of a subfolder.
This is what I am using so far:
function suche_und_ersetze() { // search and replace
LC_ALL=C find ./subfolder -type f ! -name ".DS_Store" -exec sed -i "" "s/${OLD}/${NEW}/g" {} \;
}
OLD='{\"Ausstattung\":\"'
NEW="#"
suche_und_ersetze
OLD='\"},{\"Ausstattung\":\"'
NEW=" #"
suche_und_ersetze
OLD='\"}'
NEW=""
suche_und_ersetze
Somehow I can't make it work. The script is replacing text without " or } or ,, but not the upper listed words.
Sorry,
i should have looked into the csv with an editor:
It is
""},{""Ausstattung"":""
and not
"},{"Ausstattung":"
In sed you even don't need to escape " if you use '

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Find if null exists in csv file

I have a csv file. The file has some anomalies as it contains some unknown characters.
The characters appear at line 1535 in popular editors (images attached below). The sed command in the terminal for this linedoes not show anything.
$ sed '1535!d' sample.csv
"sample_id","sample_column_text_1","sample_"sample_id","sample_column_text_1","sample_column_text_2","sample_column_text_3"
However below are the snapshots of the file in various editors.
Sublime Text
Nano
Vi
The directory has various csv files that contain this character/chain of characters.
I need to write a bash script to determine the files that have such characters. How can I achieve this?
The following is from;
http://www.linuxquestions.org/questions/programming-9/how-to-check-for-null-characters-in-file-509377/
#!/usr/bin/perl -w
use strict;
my $null_found = 0;
foreach my $file (#ARGV) {
if ( ! open(F, "<$file") ) {
warn "couldn't open $file for reading: $!\n";
next;
}
while(<F>) {
if ( /\000/ ) {
print "detected NULL at line $. in file $file\n";
$null_found = 1;
last;
}
}
close(F);
}
exit $null_found;
If it works as desired, you can save it to a file, nullcheck.pl and make it executable;
chmod +x nullcheck.pl
It seems to take an array of files names as input, but will fail if it finds in any, so I'd only pass in one at a time. The command below is used to run the script.
for f in $(find . -type f -exec grep -Iq . {} \; -and -print) ; do perl ./nullcheck.pl $f || echo "$f has nulls"; done
The above find command is lifted from Linux command: How to 'find' only text files?
You can try tr :
grep '\000' filename to find if the files contain the \000 characters.
You can use this to remove NULL and make it non-NULL file :
tr < file-with-nulls -d '\000' > file-without-nulls

Using bash: how do you find a way to search through all your files in a directory (recursively?)

I need a command that will help me accomplish what I am trying to do. At the moment, I am looking for all the ".html" files in a given directory, and seeing which ones contain the string "jacketprice" in any of them.
Is there a way to do this? And also, for the second (but separate) command, I will need a way to replace every instance of "jacketprice" with "coatprice", all in one command or script. If this is feasible feel free to let me know. Thanks
find . -name "*.html" -exec grep -l jacketprice {} \;
for i in `find . -name "*.html"`
do
sed -i "s/jacketprice/coatprice/g" $i
done
As for the second question,
find . -name "*.html" -exec sed -i "s/jacketprice/coatprice/g" {} \;
Use recursive grep to search through your files:
grep -r --include="*.html" jacketprice /my/dir
Alternatively turn on bash's globstar feature (if you haven't already), which allows you to use **/ to match directories and sub-directories.
$ shopt -s globstar
$ cd /my/dir
$ grep jacketprice **/*.html
$ sed -i 's/jacketprice/coatprice/g' **/*.html
Depending on whether you want this recursively or not, perl is a good option:
Find, non-recursive:
perl -nwe 'print "Found $_ in file $ARGV\n" if /jacketprice/' *.html
Will print the line where the match is found, followed by the file name. Can get a bit verbose.
Replace, non-recursive:
perl -pi.bak -we 's/jacketprice/coatprice/g' *.html
Will store original with .bak extension tacked on.
Find, recursive:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
say $ARGV if /jacketprice/'
It will print the file name for each match. Somewhat less verbose might be:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
$found{$ARGV}++ if /jacketprice/; END { say for keys %found }'
Replace, recursive:
perl -MFile::Find -pi.bak -we '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
s/jacketprice/coatprice/g'
Note: In all recursive versions, /dir is the bottom level directory you wish to search. Also, if your perl version is less than 5.10, say can be replaced with print followed by newline, e.g. print "$_\n" for keys %found.

How can I copy all my disorganized files into a single directory? (on linux)

I have thousands of mp3s inside a complex folder structure which resides within a single folder. I would like to move all the mp3s into a single directory with no subfolders. I can think of a variety of ways of doing this using the find command but one problem will be duplicate file names. I don't want to replace files since I often have multiple versions of a same song. Auto-rename would be best. I don't really care how the files are renamed.
Does anyone know a simple and safe way of doing this?
You could change a a/b/c.mp3 path into a - b - c.mp3 after copying. Here's a solution in Bash:
find srcdir -name '*.mp3' -printf '%P\n' |
while read i; do
j="${i//\// - }"
cp -v "srcdir/$i" "dstdir/$j"
done
And in a shell without ${//} substitution:
find srcdir -name '*.mp3' -printf '%P\n' |
sed -e 'p;s:/: - :g' |
while read i; do
read j
cp -v "srcdir/$i" "dstdir/$j"
done
For a different scheme, GNU's cp and mv can make numbered backups instead of overwriting -- see -b/--backup[=CONTROL] in the man pages.
find srcdir -name '*.mp3' -exec cp -v --backup=numbered {} dstdir/ \;
bash like pseudocode:
for i in `find . -name "*.mp3"`; do
NEW_NAME = `basename $i`
X=0
while ! -f move_to_dir/$NEW_NAME
NEW_NAME = $NEW_NAME + incr $X
mv $i $NEW_NAME
done
#!/bin/bash
NEW_DIR=/tmp/new/
IFS="
"; for a in `find . -type f `
do
echo "$a"
new_name="`basename $a`"
while test -e "$NEW_DIR/$new_name"
do
new_name="${new_name}_"
done
cp "$a" "$NEW_DIR/$new_name"
done
I'd tend to do this in a simple script rather than try to fit in in a single command line.
For instance, in python, it would be relatively trivial to do a walk() through the directory, copying each mp3 file found to a different directory with an automatically incremented number.
If you want to get fancier, you could have a dictionary of existing file names, and simply append a number to the duplicates. (the index of the dictionary being the file name, and the value being the number of files found so far, which would become your suffix)
find /path/to/mp3s -name *.mp3 -exec mv \{\} /path/to/target/dir \;
At the risk of many downvotes, a perl script could be written in short time to accomplish this.
Pseudocode:
while (-e filename)
change filename to filename . "1";
In python: to actually move the file, change debug=False
import os, re
from_dir="/from/dir"
to_dir = "/target/dir"
re_ext = "\.mp3"
debug = True
w = os.walk(from_dir)
n = w.next()
while n:
d, arg, names = n
names = filter(lambda fn: re.match(".*(%s)$"%re_ext, fn, re.I) , names)
n = w.next()
for fn in names:
from_fn = os.path.join(d,fn)
target_fn = os.path.join(to_dir, fn)
file_exists = os.path.exists(target_fn)
if not debug:
if not file_exists:
os.rename(from_fn, target_fn)
else:
print "DO NOT MOVE - FILE EXISTS ", from_fn
else:
print "MOVE ", from_fn, " TO " , target_fn
Since you don't care how the duplicate files are named, utilize the 'backup' option on move:
find /path/to/mp3s -name *.mp3 -exec mv --backup=numbered {} /path/to/target/dir \;
Will get you:
song.mp3
song.mp3.~1~
song.mp3.~2~

Resources