Replace third line of nth file with nth line of a single file - bash

Say I have hundreds of *.xml in /train/xml/, in the following format
# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
<path>/train/img/WRONG_NAME.jpg</path> # this is the WRONG_NAME
</annotation>
The file name WRONG_NAME in <path>...</path> should match that of the .xml file, so that it looks like this:
# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
<path>/train/img/RIGHT_NAME.jpg</path> # this is the **RIGHT_NAME**
</annotation>
One solution I can think of is to:
1. export all file names into a text file:
ls -1 *.xml > filenames.txt
which generates a file with the content:
RIGHT_NAME_0.xml
RIGHT_NAME_1.xml
...
2. then edit filenames.txt, so that it becomes:
# tab at beginning of each line
<path>/train/img/RIGHT_NAME_0.jpg</path>
<path>/train/img/RIGHT_NAME_1.jpg</path>
...
3. Then, replace the third line of nth .xml file with the nth line from filenames.txt.
Thus the question title.
I've hammered around with sedand awk but had no success. How should I do it (on a EDIT: MacOS machine)? Also, is there a more elegant solution?
Thanks in advance for helping out!
---things I've tried (and didnt work out)---
# this replaces the fifth line with an empty string
for i in *.xml ; do perl -i.bak -pe 's/.*/$i/ if $.==5' RIGHT_NAME.xml ; done
# this apprehends contents of filenames.txt after third line
sed -i.bak -e '/\<path\>/r filenames.txt' RIGHT_NAME.xml
# also, trying to utilize the <path>...</path> pattern...

Untested:
for xml in *.xml; do
sed -E -i.bak '3s/([^/]*.jpg)/'"${xml/.xml/.jpg}/" "$xml"
done

If ed is acceptable since it should be installed by default on a mac.
#!/bin/sh
for file in ./*.xml; do
printf 'Processing %s\n' "$file"
f=${file%.*}; f=${f#*./}
printf '%s\n' H "g/<annotation>/;/<\/annotation>/\
s|^\([[:blank:]]*<path>.*/\)[^.]*\(.*</path>\)|\1${f}\2|" %p Q |
ed -s "$file" || break
done
Will give desired results even if you have
/foo/bar/baz/more/train/img/WRONG_NAME.jpg
Will only edit/parse the string inside the path tag which is inside the annotation tag.
Change Q to w if in-place editing is needed.
Remove the %p to silence the output.
Caveat:
ed is not an xml editor/parser.

Using GNU awk (which you can easily install on MacOS if it's not already present on your system) for "inplace" editing, gensub() and the 3rd arg to match():
$ cat tst.awk
match($0,"(^\\s*<path>.*/).*([.][^.]+</path>)",a) {
name = gensub("(.*/)?(.*)[.][^.]+$","\\2",1,FILENAME)
$0 = a[1] name a[2]
}
{ print }
$ head *.xml
==> RIGHT_NAME_1.xml <==
# this is the content of /train/xml/RIGHT_NAME_1.xml
<annotation>
<path>/train/img/WRONG_NAME.xml.jpg</path>
</annotation>
==> RIGHT_NAME_2.xml <==
# this is the content of /train/xml/RIGHT_NAME_2.xml
<annotation>
<path>/train/img/WRONG_NAME.xml.jpg</path>
</annotation>
$ awk -i inplace -f tst.awk *.xml
$ head *.xml
==> RIGHT_NAME_1.xml <==
# this is the content of /train/xml/RIGHT_NAME_1.xml
<annotation>
<path>/train/img/RIGHT_NAME_1.jpg</path>
</annotation>
==> RIGHT_NAME_2.xml <==
# this is the content of /train/xml/RIGHT_NAME_2.xml
<annotation>
<path>/train/img/RIGHT_NAME_2.jpg</path>
</annotation>
Just call it as awk -i inplace -f tst.awk /train/xml/* on your system. Note that the above just replaces the name in the <path> tag content wherever it occurs on it's own line and so it will work whether that's the 3rd line in any given file or some other line. If you REALLY only want to do this for the 3rd line then just change match(... to FNR==3 && match(....

This might work for you (GNU sed & parallel):
parallel --dry sed -i '3s#[^/]*.jpg#{/.}.jpg#' {} ::: /train/xml/*.xml
In parallel the {} represents the file name and its path whereas the {/.} represents the filename less the path and its extension.
Once the output from the above solution has been checked the option --dry which is short form --dry-run can be removed.

Related

Update version number in property file using bash

I am new in bash scripting and I need help with awk. So the thing is that I have a property file with version inside and I want to update it.
version=1.1.1.0
and I use awk to do that
file="version.properties"
awk -F'["]' -v OFS='"' '/version=/{
split($4,a,".");
$4=a[1]"."a[2]"."a[3]"."a[4]+1
}
;1' $file > newFile && mv newFile $file
but I am getting strange result version="1.1.1.0""...1
Could someone help me please with this.
You mentioned in your comment you want to update the file in place. You can do that in a one-liner with perl:
perl -pe '/^version=/ and s/(\d+\.\d+\.\d+\.)(\d+)/$1 . ($2+1)/e' -i version.properties
Explanation
-e is followed by a script to run. With -p and -i, the effect is to run that script on each line, and modify the file in place if the script changes anything.
The script itself, broken down for explanation, is:
/^version=/ and # Do the following on lines starting with `version=`
s/ # Make a replacement on those lines
(\d+\.\d+\.\d+\.)(\d+)/ # Match x.y.z.w, and set $1 = `x.y.z.` and $2 = `w`
$1 . ($2+1)/ # Replace x.y.z.w with a copy of $1, followed by w+1
e # This tells Perl the replacement is Perl code rather
# than a text string.
Example run
$ cat foo.txt
version=1.1.1.2
$ perl -pe '/^version=/ and s/(\d+\.\d+\.\d+\.)(\d+)/$1 . ($2+1)/e' -i foo.txt
$ cat foo.txt
version=1.1.1.3
This is not the best way, but here's one fix.
Test case
I am assuming the input file has at least one line that is exactly version=1.1.1.0.
$ awk -F'["]' -v OFS='"' '/version=/{
> split($4,a,".");
> $4=a[1]"."a[2]"."a[3]"."a[4]+1
> }
> ;1' <<<'version=1.1.1.0'
Output:
version=1.1.1.0"""...1
The """ is because you are assigning to field 4 ($4). When you do that, awk adds field separators (OFS) between fields 1 and 2, 2 and 3, and 3 and 4. Three OFS => """, in your example.
Minimal change
$ awk -F'["]' -v OFS='"' '/version=/{
split($1,a,".");
$1=a[1]"."a[2]"."a[3]"."a[4]+1;
print
}
' <<<'version=1.1.1.0'
version=1.1.1.1
Two changes:
Change $4 to $1
Since the input field separator (-F) is ["], $4 is whatever would be after the third " (if there were any in the input). Therefore, split($4, ...) splits an empty field. The contents of the line, before the first " (if any), are in $1.
print at the end instead of ;1
The 1 after the closing curly brace is the next condition, and there is no action specified. The default action is to print the current line, as modified, so the 1 triggers printing. Instead, just print within your action when you are done processing. That way your action is self-contained. (Of course, if you needed to do other processing, you might want to print later, after that processing.)
You can use the = as the delimiter, like this:
awk -F= -v v=1.0.1 '$1=="version"{printf "version=\"%s\"\n", v}' file.properties

awk to add extracted prefix from file to filename

The below awk execute as is, but it renames fields within each matching file that matches $p (which is extracted from each text file) instead of adding $x which is the prefix to add (from $1 of rename) to each filename in the directory. Each $x is followed by a_ the the filename. I can see in the echo $p the correct value to use in the lookup for $2 is extracted but each file in the directory is unchanged. Not every file in the rename will be in the directory, but it will always have a match to $p. Maybe there is a better way as I am not sure what I am doing wrong. Thank you :).
rename tab-delimeted
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
file1
File-01_xxxx.txt
file2
File-02_yyyy.txt
desired output
00-0000_File-01-xxxx.txt
00-0001_File-02-yyyy.txt
bash
for file1 in /path/to/folders/*.txt
do
# Grab file prefix
bname=`basename $file1` # strip of path
p="$(echo $bname|cut -d_ -f1,1)" # remove after second underscore
echo $p
# add prefix to matching file
awk -v var="$p" '$2~var{x=$1}(NR=x){print $x"_",$bname}' $file1 rename OFS="\t" > tmp && mv tmp $file1
done
This script :
touch File-01-azer.txt
touch File-02-ytrf.txt
touch File-03-fdfd.txt
touch File-04-dfrd.txt
while read p f;
do
f=$(ls $f*)
mv ${f} "${p}_${f}"
done << EEE
00-0000 File-01
00-0001 File-02
00-0002 File-03
00-0003 File-04
EEE
ls -1
outputs :
00-0000_File-01-azer.txt
00-0001_File-02-ytrf.txt
00-0002_File-03-fdfd.txt
00-0003_File-04-dfrd.txt
You can use a file as input using done < rename_map.txt or cat rename_map.txt | while

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

inserting text into a specific line

I've got a text file, and using Bash I wish to insert text into into a specific line.
Text to be inserted for example is !comment: http://www.test.com into line 5
!aaaa
!bbbb
!cccc
!dddd
!eeee
!ffff
becomes,
!aaaa
!bbbb
!cccc
!dddd
!comment: http://www.test.com
!eeee
!ffff
sed '4a\
!comment: http://www.test.com' file.txt > result.txt
i inserts before the current line, a appends after the line.
you can use awk as well
$ awk 'NR==5{$0="!comment: http://www.test.com\n"$0}1' file
!aaaa
!bbbb
!cccc
!dddd
!comment: http://www.test.com
!eeee
!ffff
Using man 1 ed (which reads entire file into memory and performs in-place file editing without previous backup):
# cf. http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
line='!comment: http://www.test.com'
#printf '%s\n' H '/!eeee/i' "$line" . wq | ed -s file
printf '%s\n' H 5i "$line" . wq | ed -s file

Resources