How to use sed to replace "word{word" or "word}word" - bash

I tried to write a bash-script to replace things like:
{"Ausstattung":"
"},{"Ausstattung":"
"}
in all CSV files of a subfolder.
This is what I am using so far:
function suche_und_ersetze() { // search and replace
LC_ALL=C find ./subfolder -type f ! -name ".DS_Store" -exec sed -i "" "s/${OLD}/${NEW}/g" {} \;
}
OLD='{\"Ausstattung\":\"'
NEW="#"
suche_und_ersetze
OLD='\"},{\"Ausstattung\":\"'
NEW=" #"
suche_und_ersetze
OLD='\"}'
NEW=""
suche_und_ersetze
Somehow I can't make it work. The script is replacing text without " or } or ,, but not the upper listed words.

Sorry,
i should have looked into the csv with an editor:
It is
""},{""Ausstattung"":""
and not
"},{"Ausstattung":"
In sed you even don't need to escape " if you use '

Related

Want to add the headers in text file in shell script

I want to add the header at the start of the file for that I use the following code but it will not add the header can please help where i am doing wrong.
start_MERGE_JobRec()
{
FindBatchNumber
export TEMP_SP_FORMAT="Temp_${file_indicator}_SP_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SS_FORMAT="Temp_${file_indicator}_SS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SG_FORMAT="Temp_${file_indicator}_SG_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_GS_FORMAT="Temp_${file_indicator}_GS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export SP_OUTPUT_FILE="RTBCON_${file_indicator}_SP_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SS_OUTPUT_FILE="RTBCON_${file_indicator}_SS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SG_OUTPUT_FILE="RTBCON_${file_indicator}_SG_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export GS_OUTPUT_FILE="RTBCON_${file_indicator}_GS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
#---------------------------------------------------
# Add header at the start for each file
#---------------------------------------------------
awk '{print "recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason"$0}' >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SP_FORMAT}" -exec cat {} + >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SS_FORMAT}" -exec cat {} + >> $SS_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SG_FORMAT}" -exec cat {} + >> $SG_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_GS_FORMAT}" -exec cat {} + >> $GS_OUTPUT_FILE
}
I use awk to add the header but it's not working.
Awk requires an input file before it will print anything.
A common way to force Awk to print something even when there is no input is to put the print statement in a BEGIN block;
awk 'BEGIN { print "something" }' /dev/null
but if you want to prepend a header to all the output files, I don't see why you are using Awk here at all, let alone printing the header in front of every output line. Are you looking for this, instead?
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason' |
tee "$SS_OUTPUT_FILE" "$SG_OUTPUT_FILE" "$GS_OUTPUT_FILE" >"$SP_OUTPUT_FILE"
Notice also how we generally always quote shell variables unless we specifically want the shell to perform word splitting and wildcard expansion on their values, and avoid upper case for private variables.
There also does not seem to be any reason to export your variables -- neither Awk nor find pays any attention to them, and there are no other processes here. The purpose of export is to make a variable visible to the environment of subprocesses. You might want to declare them as local, though.
Perhaps break out a second function to avoid all this code repetition, anyway?
merge_individual_job() {
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason'
find . -maxdepth 1 -type f -name "$1" -exec cat {} +
}
start_MERGE_JobRec()
{
FindBatchNumber
local id
for id in SP SS SG GS; do
merge_individual_job \
"Temp_${file_indicator}_${id}_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt" \
>"RTBCON_${file_indicator}_${id}_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
done
}
If FindBatchNumber sets the variable file_indicator, a more idiomatic and less error-prone approach is to have it just echo it, and have the caller assign it:
file_indicator=$(FindBatchNumber)

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Find if null exists in csv file

I have a csv file. The file has some anomalies as it contains some unknown characters.
The characters appear at line 1535 in popular editors (images attached below). The sed command in the terminal for this linedoes not show anything.
$ sed '1535!d' sample.csv
"sample_id","sample_column_text_1","sample_"sample_id","sample_column_text_1","sample_column_text_2","sample_column_text_3"
However below are the snapshots of the file in various editors.
Sublime Text
Nano
Vi
The directory has various csv files that contain this character/chain of characters.
I need to write a bash script to determine the files that have such characters. How can I achieve this?
The following is from;
http://www.linuxquestions.org/questions/programming-9/how-to-check-for-null-characters-in-file-509377/
#!/usr/bin/perl -w
use strict;
my $null_found = 0;
foreach my $file (#ARGV) {
if ( ! open(F, "<$file") ) {
warn "couldn't open $file for reading: $!\n";
next;
}
while(<F>) {
if ( /\000/ ) {
print "detected NULL at line $. in file $file\n";
$null_found = 1;
last;
}
}
close(F);
}
exit $null_found;
If it works as desired, you can save it to a file, nullcheck.pl and make it executable;
chmod +x nullcheck.pl
It seems to take an array of files names as input, but will fail if it finds in any, so I'd only pass in one at a time. The command below is used to run the script.
for f in $(find . -type f -exec grep -Iq . {} \; -and -print) ; do perl ./nullcheck.pl $f || echo "$f has nulls"; done
The above find command is lifted from Linux command: How to 'find' only text files?
You can try tr :
grep '\000' filename to find if the files contain the \000 characters.
You can use this to remove NULL and make it non-NULL file :
tr < file-with-nulls -d '\000' > file-without-nulls

Using bash: how do you find a way to search through all your files in a directory (recursively?)

I need a command that will help me accomplish what I am trying to do. At the moment, I am looking for all the ".html" files in a given directory, and seeing which ones contain the string "jacketprice" in any of them.
Is there a way to do this? And also, for the second (but separate) command, I will need a way to replace every instance of "jacketprice" with "coatprice", all in one command or script. If this is feasible feel free to let me know. Thanks
find . -name "*.html" -exec grep -l jacketprice {} \;
for i in `find . -name "*.html"`
do
sed -i "s/jacketprice/coatprice/g" $i
done
As for the second question,
find . -name "*.html" -exec sed -i "s/jacketprice/coatprice/g" {} \;
Use recursive grep to search through your files:
grep -r --include="*.html" jacketprice /my/dir
Alternatively turn on bash's globstar feature (if you haven't already), which allows you to use **/ to match directories and sub-directories.
$ shopt -s globstar
$ cd /my/dir
$ grep jacketprice **/*.html
$ sed -i 's/jacketprice/coatprice/g' **/*.html
Depending on whether you want this recursively or not, perl is a good option:
Find, non-recursive:
perl -nwe 'print "Found $_ in file $ARGV\n" if /jacketprice/' *.html
Will print the line where the match is found, followed by the file name. Can get a bit verbose.
Replace, non-recursive:
perl -pi.bak -we 's/jacketprice/coatprice/g' *.html
Will store original with .bak extension tacked on.
Find, recursive:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
say $ARGV if /jacketprice/'
It will print the file name for each match. Somewhat less verbose might be:
perl -MFile::Find -nwE '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
$found{$ARGV}++ if /jacketprice/; END { say for keys %found }'
Replace, recursive:
perl -MFile::Find -pi.bak -we '
BEGIN { find(sub { /\.html$/i && push #ARGV, $File::Find::name }, '/dir'); };
s/jacketprice/coatprice/g'
Note: In all recursive versions, /dir is the bottom level directory you wish to search. Also, if your perl version is less than 5.10, say can be replaced with print followed by newline, e.g. print "$_\n" for keys %found.

How can I copy all my disorganized files into a single directory? (on linux)

I have thousands of mp3s inside a complex folder structure which resides within a single folder. I would like to move all the mp3s into a single directory with no subfolders. I can think of a variety of ways of doing this using the find command but one problem will be duplicate file names. I don't want to replace files since I often have multiple versions of a same song. Auto-rename would be best. I don't really care how the files are renamed.
Does anyone know a simple and safe way of doing this?
You could change a a/b/c.mp3 path into a - b - c.mp3 after copying. Here's a solution in Bash:
find srcdir -name '*.mp3' -printf '%P\n' |
while read i; do
j="${i//\// - }"
cp -v "srcdir/$i" "dstdir/$j"
done
And in a shell without ${//} substitution:
find srcdir -name '*.mp3' -printf '%P\n' |
sed -e 'p;s:/: - :g' |
while read i; do
read j
cp -v "srcdir/$i" "dstdir/$j"
done
For a different scheme, GNU's cp and mv can make numbered backups instead of overwriting -- see -b/--backup[=CONTROL] in the man pages.
find srcdir -name '*.mp3' -exec cp -v --backup=numbered {} dstdir/ \;
bash like pseudocode:
for i in `find . -name "*.mp3"`; do
NEW_NAME = `basename $i`
X=0
while ! -f move_to_dir/$NEW_NAME
NEW_NAME = $NEW_NAME + incr $X
mv $i $NEW_NAME
done
#!/bin/bash
NEW_DIR=/tmp/new/
IFS="
"; for a in `find . -type f `
do
echo "$a"
new_name="`basename $a`"
while test -e "$NEW_DIR/$new_name"
do
new_name="${new_name}_"
done
cp "$a" "$NEW_DIR/$new_name"
done
I'd tend to do this in a simple script rather than try to fit in in a single command line.
For instance, in python, it would be relatively trivial to do a walk() through the directory, copying each mp3 file found to a different directory with an automatically incremented number.
If you want to get fancier, you could have a dictionary of existing file names, and simply append a number to the duplicates. (the index of the dictionary being the file name, and the value being the number of files found so far, which would become your suffix)
find /path/to/mp3s -name *.mp3 -exec mv \{\} /path/to/target/dir \;
At the risk of many downvotes, a perl script could be written in short time to accomplish this.
Pseudocode:
while (-e filename)
change filename to filename . "1";
In python: to actually move the file, change debug=False
import os, re
from_dir="/from/dir"
to_dir = "/target/dir"
re_ext = "\.mp3"
debug = True
w = os.walk(from_dir)
n = w.next()
while n:
d, arg, names = n
names = filter(lambda fn: re.match(".*(%s)$"%re_ext, fn, re.I) , names)
n = w.next()
for fn in names:
from_fn = os.path.join(d,fn)
target_fn = os.path.join(to_dir, fn)
file_exists = os.path.exists(target_fn)
if not debug:
if not file_exists:
os.rename(from_fn, target_fn)
else:
print "DO NOT MOVE - FILE EXISTS ", from_fn
else:
print "MOVE ", from_fn, " TO " , target_fn
Since you don't care how the duplicate files are named, utilize the 'backup' option on move:
find /path/to/mp3s -name *.mp3 -exec mv --backup=numbered {} /path/to/target/dir \;
Will get you:
song.mp3
song.mp3.~1~
song.mp3.~2~

Resources