Adding a header to source files with Bash - bash

I have a script that adds a string C++ source files:
for f in $(find . -name '*.h' -or -name '*.cpp');
do sed -i '1s/^/<added text> /' $f;
done;
How to make it add a multiline variable value?
Tried this, but with not success:
h=$'//Copyright 2021 My Corporation\n//Authors: me and others\n'
variable h becomes single line.

Use ed, not sed, to edit files.
You also don't need to use find to locate the files if you're using bash:
#!/usr/bin/env bash
h=$'//Copyright 2021 My Corporation\n//Authors: me and others\n'
# Turn on recursive globbing and extended globbing and have patterns that
# fail to match expand to an empty string
shopt -s globstar extglob nullglob
for f in **/*.#(h|cpp); do
# Add the contents of the variable h at the beginning of the file
# Note: no lines in $h should be just a period as that signals end of
# text input mode.
ed -s "$f" <<EOF
1i
$h
.
w
EOF
done
If you must use sed for some reason, you also want to use its i\ command, not s (People tend to forget that sed even has other commands), which inserts the following lines of text until one without a backslash at the end marks the last line of input. Some bash parameter expansion to massage your variable into the appropriate form is needed, of course:
#!/usr/bin/env bash
h=$'//Copyright 2021 My Corporation\n//Authors: me and others\n'
# Add a newline at the very beginning, and a backlash at the end
# of each line
h=$'\n'"${h//$'\n'/$'\\\n'}"
# Or just format the original string appropriately at the start
# h=$'\n//Copyright 2021 My Corporation\\\n//Authors: me and others\\\n'
shopt -s globstar extglob nullglob
for f in **/*.#(h|cpp); do
sed -i -e "1i\\$h" "$f"
done
# If using GNU sed, you can avoid the loop:
# sed -si -e "1i\\$h" **/*.#(h|cpp)

sed can be a bit temperamental when dealing with embedded (single-character) \n's; we can get around this a couple ways ...
Sample input file:
$ cat xx
1
2
3
A couple sed ideas depending on how the ${h} variable is defined:
1) ${h} defined with literal/single-char \n:
$ h=$'//Copyright 2021 My Corporation\n//Authors: me and others\n'
$ sed "1 s|^|${h//$'\n'/\\n}|" xx
Where:
1 s|^| - line 1, replace start of line with ...
${h//$'\n'/\\n} - change all single-char \n into 3-char \\n (escaped \ + n)
NOTE: because / characters show up in the data we need a different delimiter for the sed command, hence the |
2) ${h} defined with 2-char \ + n:
$ h='//Copyright 2021 My Corporation\n//Authors: me and others\n'
$ sed "1 s|^|${h//\/\\}|" xx*
Where:
1 s|^| - line 1, replace start of line with ...
${h//\/\\} - escape all \ (NOTE: this could cause issues with \ that are not to be interpreted as control characters)
NOTE: because / characters show up in the data we need a different delimiter for the sed command, hence the |
Both of these generate:
//Copyright 2021 My Corporation
//Authors: me and others
1
2
3
Once OP is satisfied with the result the -i flag can be added to perform the inplace update.

Using sed is overkill and limit you with sed syntax.
Use cat instead:
#!/usr/bin/env bash
# Some helpful settings to avoid editing the core code
corporation='Example INC'
authors=(
'me'
'someone'
'someone-else'
)
# Capture the year (require bash 4.2+)
# If older bash, use year="$(date '+%Y')"
printf -v year '%(%Y)T'
# Be sure tmp_file variable does not exist (may have been exported)
unset tmp_file
# Prepare EXIT trap to clean-up temporary file
trap 'rm -f -- "$tmp_file"' EXIT
# Safely create a temporary file to hold changes
tmp_file="$(mktemp)"
# Find .h or .cpp files and make it a null delimited records stream
find . \( -name '*.h' -or -name '*.cpp' \) -print0 |
# Iterate the null delimited file names
while read -r -d '' cpp_file; do
# Skip files already containing the copyright header
tail -n2 "$cpp_file" | grep -qF ' * Copyright' && continue
# If successfully merging here-document header with file
# into the temporary file
if cat - "$cpp_file" >"$tmp_file" <<EOF
/*
* Copyright ${year} ${corporation}
* Authors: ${authors[*]}
*
*/
EOF
# then copy the temporary file over
then cp "$tmp_file" "$cpp_file"
fi
done

Related

BASH: File sorting according to file name

I need to sort 12000 filles into 1000 groups, according to its name and create for each group a new folder containing filles of this group. The name of each file is given in multi-column format (with _ separator), where the second column is varried from 1 to 12 (number of the part) and the last column ranged from 1 to 1000 (number of the system), indicating that initially 1000 different systems (last column) were splitted on 12 separate parts (second column).
Here is an example for a small subset based on 3 systems devided by 12 parts, totally 36 filles.
7000_01_lig_cne_1.dlg
7000_02_lig_cne_1.dlg
7000_03_lig_cne_1.dlg
...
7000_12_lig_cne_1.dlg
7000_01_lig_cne_2.dlg
7000_02_lig_cne_2.dlg
7000_03_lig_cne_2.dlg
...
7000_12_lig_cne_2.dlg
7000_01_lig_cne_3.dlg
7000_02_lig_cne_3.dlg
7000_03_lig_cne_3.dlg
...
7000_12_lig_cne_3.dlg
I need to group these filles based on the second column of their names (01, 02, 03 .. 12), thus creating 1000 folders, which should contrain 12 filles for each system in the following manner:
Folder1, name: 7000_lig_cne_1, it contains 12 filles: 7000_{this is from 01 to 12}_lig_cne_1.dlg
Folder2, name: 7000_lig_cne_2, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_2.dlg
...
Folder1000, name: 7000_lig_cne_1000, it contains 12 filles 7000_{this is from 01 to 12}_lig_cne_1000.dlg
Assuming that all *.dlg filles are present withint the same dir, I propose bash loop workflow, which only lack some sorting function (sed, awk ??), organized in the following manner:
#set the name of folder with all DLG
home=$PWD
FILES=${home}/all_DLG/7000_CNE
# set the name of protein and ligand library to analyse
experiment="7000_CNE"
#name of the output
output=${home}/sub_folders_to_analyse
#now here all magic comes
rm -r ${output}
mkdir ${output}
# sed sollution
for i in ${FILES}/*.dlg # define this better to suit your needs
do
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
mkdir -p ${output}/"${experiment}_lig$n"
cp "$i" ${output}/"${experiment}_lig$n"
done
! Note: there I indicated beggining of the name of each folder as ${experiment} to which I add the number of the final column $n at the end. Would it be rather possible to set up each time the name of the new folder automatically based on the name of the coppied filles? Manually it could be achived via skipping the second column in the name of the folder
cp ./all_DLG/7000_*_lig_cne_987.dlg ./output/7000_lig_cne_987
Iterate over files. Extract the destination directory name from the filename. Move the file.
for i in *.dlg; do
# extract last number with your favorite tool
n=$( <<<"$i" sed 's/.*[^0-9]\([0-9]*\)\.dlg$/\1/' )
# move the file to proper dir
echo mkdir -p "folder$n"
echo mv "$i" "folder$n"
done
Notes:
Do not use upper case variables in your scripts. Use lower case variables.
Remember to quote variables expansions.
Check your scripts with http://shellcheck.net
Tested on repl
update: for OP's foldernaming convention:
for i in *.dlg; do
foldername="$HOME/output/${i%%_*}_${i#*_*_}"
echo mkdir -p "$foldername"
echo mv "$i" "$foldername"
done
This might work for you (GNU parallel):
ls *.dlg |
parallel --dry-run 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d'
Pipe the output of ls command listing files ending in .dlg to parallel, which creates directories and moves the files to them.
Run the solution as is, and when satisfied the output of the dry run is ok, remove the option --dry-run.
The solution could be one instruction:
parallel 'd={=s/^(7000_).*(lig.*)\.dlg/$1$2/=};mkdir -p $d;mv {} $d' ::: *.dlg
Using POSIX shell's built-in grammar only and sort:
#!/usr/bin/env sh
curdir=
# Create list of files with newline
# Safe since we know there is no special
# characters in name
printf -- %s\\n *.dlg |
# Sort the list by 5th key with _ as field delimiter
sort -t_ -k5 |
# Iterate reading the _ delimited fields of the sorted list
while IFS=_ read -r _ _ c d e; do
# Compose the new directory name
newdir="${c}_${d}_${e%.dlg}"
# If we enter a new group / directory
if [ "$curdir" != "$newdir" ]; then
# Make the new directory current
curdir="$newdir"
# Create the new directory
echo mkdir -p "$curdir"
# Move all its files into it
echo mv -- *_"$curdir.dlg" "$curdir/"
fi
done
Optionally as a sort and xargs arguments stream:
printf -- %s\\n * |
sort -u -t_ -k5
xargs -n1 sh -c
'd="lig_cne_${0##*_}"
d="${d%.dlg}"
echo mkdir -p "$d"
echo mv -- *"_$d.dlg" "$d/"
'
Here is a very simple awk script that do the trick in single sweep.
script.awk
BEGIN{FS="[_.]"} # make field separator "_" or "."
{ # for each filename
dirName=$1"_"$3"_"$4"_"$5; # compute the target dir name from fields
sysCmd = "mkdir -p " dirName"; cp "$0 " "dirName; # prepare bash command
system(sysCmd); # run bash command
}
running script.awk
ls -1 *.dlg | awk -f script.awk
oneliner awk script
ls -1 *.dlg | awk 'BEGIN{FS="[_.]"}{d=$1"_"$3"_"$4"_"$5;system("mkdir -p "d"; cp "$0 " "d);}'

How to process tr across all files in a directory and output to a different name in another directory?

mpu3$ echo * | xargs -n 1 -I {} | tr "|" "/n"
which outputs:
#.txt
ag.txt
bg.txt
bh.txt
bi.txt
bid.txt
dh.txt
dw.txt
er.txt
ha.txt
jo.txt
kc.txt
lfr.txt
lg.txt
ng.txt
pb.txt
r-c.txt
rj.txt
rw.txt
se.txt
sh.txt
vr.txt
wa.txt
is what I have so far. What is missing is the output; I get none. What I really want is to get a list of txt files, use their name up to the extension, process out the "|" and replace it with a LF/CR and put the new file in another directory as [old-name].ics. HALP. THX in advance. - Idiot me.
You can loop over the files and use sed to process the file:
for i in *.txt; do
sed -e 's/|/\n/g' "$i" > other_directory/"${i%.txt}".ics
done
No need to use xargs, especially with echo which would risk the filenames getting word split and having globbing apply to them, so could well do the wrong thing.
Then we use sed and use s to substitute | with \n g makes it a global replace. We redirect that to the other director you want and use bash's parameter expansion to strip off the .txt from the end
Here's an awk solution:
$ awk '
FNR==1 { # for first record of every file
close(f) # close previous file f
f="path_to_dir/" FILENAME # new filename with path
sub(/txt$/,"ics",f) } # replace txt with ics
{
gsub(/\|/,"\n") # replace | with \n
print > f }' *.txt # print to new file

Bash - Search and Replace operation with reporting the files and lines that got changed

I have a input file "test.txt" as below -
hostname=abc.com hostname=xyz.com
db-host=abc.com db-host=xyz.com
In each line, the value before space is the old value which needs to be replaced by the new value after the space recursively in a folder named "test". I am able to do this using below shell script.
#!/bin/bash
IFS=$'\n'
for f in `cat test.txt`
do
OLD=$(echo $f| cut -d ' ' -f 1)
echo "Old = $OLD"
NEW=$(echo $f| cut -d ' ' -f 2)
echo "New = $NEW"
find test -type f | xargs sed -i.bak "s/$OLD/$NEW/g"
done
"sed" replaces the strings on the fly in 100s of files.
Is there a trick or an alternative way by which i can get a report of the files changed like absolute path of the file & the exact lines that got changed ?
PS - I understand that sed or stream editors doesn't support this functionality out of the box. I don't want to use versioning as it will be an overkill for this task.
Let's start with a simple rewrite of your script, to make it a little bit more robust at handling a wider range of replacement values, but also faster:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -exec sed "/$(escapeRegex "$old")/$(escapeSubst "$new")/g" -i '{}' \;
done <test.txt
So, we loop over pairs of whitespace-separated fields (old, new) in lines from test.txt and run a standard sed in-place replace on all files found with find.
Pretty similar to your script, but we properly read lines from test.txt (no word splitting, pathname/variable expansion, etc.), we use Bash builtins whenever possible (no need to call external tools like cat, cut, xargs); and we escape sed metacharacters in old/new values for proper use as sed's regexp and replacement expressions.
Now let's add logging from sed:
#!/bin/bash
# escape regexp and replacement strings for sed
escapeRegex() { sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$1"; }
escapeSubst() { sed 's/[&/\]/\\&/g' <<<"$1"; }
while read -r old new; do
find test -type f -printf '\n[%p]\n' -exec sed "/$(escapeRegex "$old")/{
h
s//$(escapeSubst "$new")/g
H
x
s/\n/ --> /
w /dev/stdout
x
}" -i '{}' > >(tee -a change.log) \;
done <test.txt
The sed script above changes each old to new, but it also writes old --> new line to /dev/stdout (Bash-specific), which we in turn append to change.log file. The -printf action in find outputs a "header" line with file name, for each file processed.
With this, your "change log" will look something like:
[file1]
hostname=abc.com --> hostname=xyz.com
[file2]
[file1]
db-host=abc.com --> db-host=xyz.com
[file2]
db-host=abc.com --> db-host=xyz.com
Just for completeness, a quick walk-through the sed script. We act only on lines containing the old value. For each such line, we store it to hold space (h), change it to new, append that new value to the hold space (joined with newline, H) which now holds old\nnew. We swap hold with pattern space (x), so we can run s command that converts it to old --> new. After writing that to the stdout with w, we move the new back from hold to pattern space, so it gets written (in-place) to the file processed.
From man sed:
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
This can be used to create a backup file when replacing. You can then look for any backup files, which indicate which files were changed, and diff those with the originals. Once you're done inspecting the diff, simply remove the backup files.
If you formulate your replacements as sed statements rather than a custom format you can go one further, and use either a sed shebang line or pass the file to -f/--file to do all the replacements in one operation.
There's several problems with your script, just replace it all with (using GNU awk instead of GNU sed for inplace editing):
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{ for (old in map) gsub(old,map[old]) }
' test.txt "${files[#]}"
You'll find that is orders of magnitude faster than what you were doing.
That still has the issue your existing script does of failing when the "test.txt" strings contain regexp or backreference metacharacters and modifying previously-modified strings and handling partial matches - if that's an issue let us know as it's easy to work around with awk (and extremely difficult with sed!).
To get whatever kind of report you want you just tweak the { for ... } line to print them, e.g. to print a record of the changes to stderr:
mapfile -t files < <(find test -type f)
awk -i inplace '
NR==FNR { map[$1] = $2; next }
{
orig = $0
for (old in map) {
gsub(old,map[old])
}
if ($0 != orig) {
printf "File %s, line %d: \"%s\" became \"%s\"\n", FILENAME, FNR, orig, $0 | "cat>&2"
}
}
' test.txt "${files[#]}"

Insert dirnames into respectively named filenames in bash

In my directory I have plenty of *.yml files named like:
work-arran.yml
work-cap.yml
work-exposed.yml
work-humax.yml
work-instruc.yml
work-kiln.yml
work-lex.yml
work-merc.yml
and also directories with same names but using underscores instead of dashes:
work_cap
work_exposed
work_humax
work_instruc
work_kiln
work_lex
work_merc
I want to put yaml record grid_pool: dir_name_here into each respective
*.yml file automatically, where dir_name_here would be the name of each
file's respective directory.
Tried with this, and while it would work it appended desired line into
underscored version of *.yml file instead of exsisting dashed.
How to change that last fn.yml so it replaces again _ for -?
ls *.yml | sed 's/-/_/g' | sed 's/.yml//g' | xargs -n1 -I fn bash -c "echo ' grid_pool: fn' >> fn.yml"
Use parameter expansion with substitution:
for file in *.yml ; do
dirname=${file/-/_}
echo " grid_pool: ${dirname%.yml}" >> "$file"
done
You can use the same method with xargs but I fear it'll be slower, as it starts a new shell for each file:
... | xargs -I fn bash -c 's=fn; echo " grid_pool: fn" >> ${s/_/-}.yml'

Bash script 'sed: first RE may not be empty' error

I have written the following bash script, it is not finished yet so it is still a little messy. The script looks for directories at the same level as the script, it then searches for a particular file within the directory which it makes some changes to.
When I run the script it returns the following error:
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
sed: first RE may not be empty
My research tells me that it may be something to do with the '/'s in the directory name strings but I have not been able to solve the issue.
Despite the error messages the script seems to be working fine and is making the changes to the files correctly. Can anyone help explain why I am getting the error message above?
#!/bin/bash
FIND_DIRECTORIES=$(find . -type d -maxdepth 1 -mindepth 1)
FIND_IN_DIRECTORIES=$(find $FIND_DIRECTORIES"/app/design/adminhtml" -name "login.phtml")
for i in $FIND_IN_DIRECTORIES
do
# Generate Random Number
RANDOM=$[ ( $RANDOM % 1000 ) + 1 ]
# Find the line where password is printed out on the page
# Grep for the whole line, then remove all but the numbers
# This will leave the old password number
OLD_NUM_HOLDER=$(cat $i | grep "<?php echo Mage::helper('adminhtml')->__('Password: ')" )
OLD_NUM="${OLD_NUM_HOLDER//[!0-9]}"
# Add old and new number to the end of text string
# Beginning text string is used so that sed can find
# Replace old number with new number
OLD_NUM_FULL="password\" ?><?php echo \""$OLD_NUM
NEW_NUM_FULL="password\" ?><?php echo \""$RANDOM
sed -ie "s/$OLD_NUM_FULL/$NEW_NUM_FULL/g" $i
# GREP for the setNewPassword function line
# GREP for new password that has just been set above
SET_NEW_GREP=$(cat $i | grep "setNewPassword(" )
NEW_NUM_GREP=$(cat $i | grep "<?php echo \"(password\" ?><?php echo" )
NEW_NUM_GREPP="${NEW_NUM_GREP//[!0-9]}"
# Add new password to string for sed
# Find and replace old password for setNewPassword function
FULL_NEW_PASS="\$user->setNewPassword(password"$NEW_NUM_GREPP")"
sed -ie "s/$SET_NEW_GREP/$FULL_NEW_PASS/g" $i
done
Thanks in advance for any help with this.
UPDATE -- ANSWER
The issue here was that the for loop was not working as expected. I thought that it was doing /first/directory"/app/design/adminhtml" looping through and then doing /second/directory"/app/design/adminhtml" and then looping through. It was actually doing /first/directory looping through and then doing /second/directory"/app/design/adminhtml" and then looping through. So it was actually attaching the full directory path to the last item in the iteration. I have fixed the issue in the script below:
#!/bin/bash
for i in $(find . -type d -maxdepth 1 -mindepth 1); do
FIND_IN_DIRECTORIES=$i"/app/design/adminhtml/default"
FIND_IN_DIRECTORIES=$(find $FIND_IN_DIRECTORIES -name "login.phtml")
# Generate Random Number
RANDOM=$[ ( $RANDOM % 1000 ) + 1 ]
# Find the line where password is printed out on the page
# Grep for the whole line, then remove all but the numbers
# This will leave the old password number
OLD_NUM_HOLDER=$(cat $FIND_IN_DIRECTORIES | grep "<?php echo Mage::helper('adminhtml')->__('Password: ')" )
OLD_NUM="${OLD_NUM_HOLDER//[!0-9]}"
# Add old and new number to the end of text string
# Beginning text string is used so that sed can find
# Replace old number with new number
OLD_NUM_FULL="password\" ?><?php echo \""$OLD_NUM
NEW_NUM_FULL="password\" ?><?php echo \""$RANDOM
sed -ie "s/$OLD_NUM_FULL/$NEW_NUM_FULL/g" $FIND_IN_DIRECTORIES
# GREP for the setNewPassword function line
# GREP for new password that has just been set above
SET_NEW_GREP=$(cat $FIND_IN_DIRECTORIES | grep "setNewPassword(" )
NEW_NUM_GREP=$(cat $FIND_IN_DIRECTORIES | grep "<?php echo \"(password\" ?><?php echo" )
NEW_NUM_GREPP="${NEW_NUM_GREP//[!0-9]}"
# Add new password to string for sed
# Find and replace old password for setNewPassword function
FULL_NEW_PASS="\$user->setNewPassword(password"$NEW_NUM_GREPP")"
sed -ie "s/$SET_NEW_GREP/$FULL_NEW_PASS/g" $FIND_IN_DIRECTORIES
done
without debugging your whole setup, note that you can use an alternate character to delimit sed reg-ex/match values, i.e.
sed -i "s\#$OLD_NUM_FULL#$NEW_NUM_FULL#g" $i
and
sed -i "s\#$SET_NEW_GREP#$FULL_NEW_PASS#g" $i
You don't need the -e, so I have removed it.
Some seds require the leading '\' before the #, so I include it. It is possible that some will be confused by it, so if this doesn't work, try removing the leading '\'
you should also turn on shell debugging, to see exactly which sed (and what values) are causing the problem. Add a line with set -vx near the top of your script to turn on debugging.
I hope this helps.

Resources