bash/sed/awk: replace arbitrary substring in arbitrary files - bash

Update:
I have tried markp-fuso's answer and it worked like a charm
I'm starting to get frustrated here as I'm not a daily user of bash/sed and the like.
Starting point:
I have many subfolders with many source files (.c,.cpp,.cxx).
These source files are referenced for compilation in project files (.vcxproj).
What I want to do:
I want to find all source files that contain the string #import. I then want to find all project files that reference those source files.
I then want to edit all occurences of these references inplace within these project files
e.g. <Include="folder/file.cpp"/> -> <Include="folder/file.cpp" Attribute="Value"/>
What I have tried:
egrep -lir --include=*.{c,cpp,cxx} "(#import)" ./e3 | xargs -L 1 basename | egrep -ir --include=*.vcxproj -f - ./e3 | sed 's/:/ /g'
which produces a list like that:
./src/base/base.vcxproj <ClCompile Include="Folder1\Folder1File1.cpp" />
./src/mod/mod.vcxproj <ClCompile Include="Folder2\Folder2File1.cpp" />
./src/ext/ext.vcxproj <ClCompile Include="Folder3\Folder3File1.cpp" />
So I then tried
egrep -lir --include=*.{c,cpp,cxx} "(#import)" ./e3 | xargs -L 1 basename | egrep -ir --include=*.vcxproj -f - ./e3 | sed 's/:/ /g' | awk '{ sed -iE 's/($2,$3)/\1 Attribute="Value"/g' }'
which errors out with
bash: syntax error near unexpected token `('
I have tried a solution with a shell script, which didn't work either and I don't know if and how to solve above error message. I'm open to any solution as long as it's running within bash, can even be more gross than what I came up with.

Setup:
mkdir -p src/{base,mod,ext}
echo 'some stuff on this line
<ClCompile Include="Folder1\Folder1File1.cpp" />
some more stuff on this line' > src/base/base.vcxproj
echo 'some stuff on this line
<ClCompile Include="Folder2\Folder2File1.cpp" />
some more stuff on this line' > src/mod/mod.vcxproj
echo 'some stuff on this line
<ClCompile Include="Folder3\Folder3File1.cpp" />
some more stuff on this line' > src/ext/ext.vcxproj
For the sake of getting something working in my environment I've placed the intermediate data in a local file:
$ cat proj.dat
./src/base/base.vcxproj <ClCompile Include="Folder1\Folder1File1.cpp" />
./src/mod/mod.vcxproj <ClCompile Include="Folder2\Folder2File1.cpp" />
./src/ext/ext.vcxproj <ClCompile Include="Folder3\Folder3File1.cpp" />
One idea using parameter substitution:
while read -r fname oldstring # 2nd-Nth space-delimited fields go into the single variable "oldstring"
do
oldstring="${oldstring//\\/\\\\}" # escape literal backslashes
newstring="${oldstring//\/>/ Attribute=\"Value\"\/>}" # replace /> with Attribute="Value"/>
echo "##################### ${fname}"
sed "s|${oldstring}|${newstring}|g" "${fname}"
done < proj.dat
NOTES:
sed replace applied to all occurences in a file
if additional datasets cause sed to abort with errors it may be necessary to add additional parameter expansions to escape other problematic characters
added a space on the front of the Attribute string since the textual description suggested a space may not exist before the /> (eg, ...file.cpp"/>)
OP should be able pipe the current egrep | xargs | egrep | sed to this while loop (replacing done < proj.dat with done)
once OP is satisfied with the results the -i flag can be added to the sed call to perform an inplace update of ${fname}
Generates:
##################### ./src/base/base.vcxproj
some stuff on this line
<ClCompile Include="Folder1\Folder1File1.cpp" Attribute="Value"/>
some more stuff on this line
##################### ./src/mod/mod.vcxproj
some stuff on this line
<ClCompile Include="Folder2\Folder2File1.cpp" Attribute="Value"/>
some more stuff on this line
##################### ./src/ext/ext.vcxproj
some stuff on this line
<ClCompile Include="Folder3\Folder3File1.cpp" Attribute="Value"/>
some more stuff on this line

Related

Bash change number to another value on specific line

i'm new with bash scripting , and i looking for solution to change a number to another value on specific line.
I have file named foo.config and in this file i have about 100 lines of configuration.
For example i have
<UpdateInterval>2</UpdateInterval>
and i need to find this line on foo.config and replace number(this can be number for 0 to 10 and for my example is 2) for 0 as always.
Like this :
<UpdateInterval>0</UpdateInterval>
How can i do it with sed ? please suggest
the part of lines:
<InstallUrl />
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>2</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName />
<SupportUrl />
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription />
<LoadBehavior>3</LoadBehavior>
sed and others(grep, awk) never be a good tools for parsing xml/html data. Use a proper xml/html parsers, like xmlstarlet:
xmlstarlet ed -L -O -u "//UpdateInterval" -v 0 foo.config
ed - edit mode
-L - edit the file inplace
-O - omit xml declaration
-u - update action
"//UpdateInterval" - xpath expression
-v 0 - the new value of the element to be updated
The final (exemplary) foo.config contents:
<root>
<InstallUrl/>
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>0</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName/>
<SupportUrl/>
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription/>
<LoadBehavior>3</LoadBehavior>
</root>
The <root> tag was specified for demonstration purpose, your xml/html structure should have its own "root"(most parent) tag
In a very simple way, you may try:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
This will search for <UpdateInterval> at the beginning of a line (note the ^) and then a number ([0-9] stands for a digit and + for a repetition of one or more). This bit will be replaced with <UpdateInterval>0. The / characters separate what you search and what will replace it. The s command is a search and replace.
It will take the file foo.config as input and you will get the output on standard output. If you want your output on the same file, you may do:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config >foo.temp
mv foo.temp foo.config
Or more simply:
sed -i -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
Note that this is not a good way to do the substitution if your config file contains general XML. It will only work in the simplest of cases (but will do for your example.) If your XML bit may be in the middle of a line, remove the ^ character. The search and replace expression assumes that there is no whitespace around the XML tags.
A solution using an XML parsing tool:
{ echo '<root>'; cat foo.config; echo '</root>'; } |
xmlstarlet ed -O -P -u //UpdateInterval -v 0 |
sed '1d;$d' |
sponge foo.config
The first line is to make the config file into a proper XML file.
The second line updates the value.
The third line removes the root tags.
The last line rewrites the config file. Need to install the moreutils package.

Remove the space from the beginning of file name inside the zip file, without extracting the zip file

I have n numbers of zip files in a directory. inside these zip files, there is space in the beginning of file name. I just want to remove this space without extracting the zip file.
I tried a shell script, as below.
#!/bin/sh
for zip in *.zip
do
unzipped=unzip $zip
trimmed=echo "${unzipped}" | sed -e 's/^[ \t]*//'
#Want to zip the file here with the same name
done
Need help in this script or if any short method will be most welcome.
You could use the tool zipnote which is part of Info-ZIP's zip package. Based on your code example above, a minimal working example could look like this:
#!/bin/bash
for zip in *.zip
do
ZIPNOTES=$(zipnote "$zip")
TRIMMED=$(echo "$ZIPNOTES" | sed -e 's/^#[ \t]\{2,\}\(.*\)/&\n#=\1/')
echo "$TRIMMED" | zipnote -w "$zip"
done
or as one-liner:
#!/bin/bash
for zip in *.zip; do zipnote "$zip" | sed -e 's/^#[ \t]\{2,\}\(.*\)/&\n#=\1/' | zipnote -w "$zip"; done
Long answer
Let's assume you have a zip archive archive.zip containing the files \ \ foo and bar, the utility zipnote outputs the following lines:
$ zipnote archive.zip
# foo
# (comment above this line)
# bar
# (comment above this line)
# (zip file comment below this line)
According to man zipnote:
The names of the files in the zipfile can also be changed in this way.
This is done by following lines like
"# name"
in the created temporary file (called foo.tmp here) with lines like
"#=newname"
and then using the -w option as above.
Hence, to rename the file \ \ foo within the zip archive, you have to append the line
#=foo
just beneath the corresponding line
# foo
and pipe it into zipnote -w archive.zip. The latter is done with a slight modification of your sed command which matches on lines containing an # symbol followed by two or more whitespaces, grouping the rest of the line and appending the trimmed line beneath.

Bash sed in loop

I'm trying to use the follow to substitute the tab with comma in several file:
#!/bin/sh
for i in *output_*.txt
do
sed 's/ /;/g' $i > $i
done
But it is not working because in the output file I still have the tab delimiter. It just work when I'm using it on a single file without the for loop.
Any help?
Thanks.
Several things are wrong. Unqouted variables and output redirection into same file. Loop is also not needed.
Try:
sed -i 's/ /;/g' *output_*.txt
The correct script you need is as follows:
find . -name '*output_*.txt' | while read FILENAME; do
(sed -e "s/\\t/;/g" <${FILENAME} >${FILENAME%.txt}.tmp) && (mv ${FILENAME%.txt}.tmp ${FILENAME});
done
This script has several important features:
It finds all files called *output_*.txt in the current directory and all subdirectories. If you do not want to recurse into subdirectories, then use:
find . -maxdepth 1 -name '*output_*.txt' | while read FILENAME; do
as the first line.
It does not overwrite your original input file if sed encounters an error. sed generates its output to a temporary file (<filename>.tmp) and it only replaces the original file if it is successful.
As pointed out by other posters, the tab character is represented by \t in sed scripts.
An example transformation performed by this script is as follows (the sequence <tab> represents a tab character):
Input:
<tab><tab><tab><tab><tab>line 1<tab><tab>
<tab><tab><tab>line 2<tab><tab>
<tab><tab>line 3<tab><tab>
<tab><tab><tab>line 4<tab><tab>
<tab><tab><tab><tab><tab>line<tab><tab> 5
Output:
;;;;line 1;;
;;;line 2;;
;;line 3;;
;;;line 4;;
;;;;;line;; 5

Shell script syntax error in expression

I'm trying to make self extracting file using the following Ant tasks:
...
<fixcrlf file="${src.dir}/scripts/install.sh" eol="unix"/>
<concat destfile="${build.dir}/my_program.exe" binary="yes">
<fileset file="${src.dir}/scripts/install.sh" />
<fileset file="${build.dir}/program.tar.gz" />
</concat>
.. and my_program.exe looks like this:
#!/bin/bash
begin=`head -30 $0 | grep -n ^START | cut -d ':' -f -1` # find line number of the marker
start=$(($begin+1)) # beginning of the binary archive which will be extracted
echo $start
...
START
#binary file starts
When I run my_program.exe I get the following error:
./my_program.exe: line 4: Binary file (standard input) matches: syntax error in expression (error token is "file (standard input) matches")
tail: +: invalid number of lines
When I run install.sh separately it finds line number just fine.
My guess is that something wrong with Ant task. Do I miss some properties that will fix it?
I guess you should use the -a (resp. --text, meaning to process a binary file as if it were text) option of grep. Otherwise grep will only output "Binary file matches".
So probably the line 4 should be:
begin=`head -30 $0 | grep -na ^START | cut -d ':' -f -1` # find line number of the marker

Get just the filename from a path in a Bash script [duplicate]

This question already has answers here:
Extract filename and extension in Bash
(38 answers)
Closed 6 years ago.
How would I get just the filename without the extension and no path?
The following gives me no extension, but I still have the path attached:
source_file_filename_no_ext=${source_file%.*}
Many UNIX-like operating systems have a basename executable for a very similar purpose (and dirname for the path):
pax> full_name=/tmp/file.txt
pax> base_name=$(basename ${full_name})
pax> echo ${base_name}
file.txt
That unfortunately just gives you the file name, including the extension, so you'd need to find a way to strip that off as well.
So, given you have to do that anyway, you may as well find a method that can strip off the path and the extension.
One way to do that (and this is a bash-only solution, needing no other executables):
pax> full_name=/tmp/xx/file.tar.gz
pax> xpath=${full_name%/*}
pax> xbase=${full_name##*/}
pax> xfext=${xbase##*.}
pax> xpref=${xbase%.*}
pax> echo "path='${xpath}', pref='${xpref}', ext='${xfext}'"
path='/tmp/xx', pref='file.tar', ext='gz'
That little snippet sets xpath (the file path), xpref (the file prefix, what you were specifically asking for) and xfext (the file extension).
basename and dirname solutions are more convenient. Those are alternative commands:
FILE_PATH="/opt/datastores/sda2/test.old.img"
echo "$FILE_PATH" | sed "s/.*\///"
This returns test.old.img like basename.
This is salt filename without extension:
echo "$FILE_PATH" | sed -r "s/.+\/(.+)\..+/\1/"
It returns test.old.
And following statement gives the full path like dirname command.
echo "$FILE_PATH" | sed -r "s/(.+)\/.+/\1/"
It returns /opt/datastores/sda2
Here is an easy way to get the file name from a path:
echo "$PATH" | rev | cut -d"/" -f1 | rev
To remove the extension you can use, assuming the file name has only ONE dot (the extension dot):
cut -d"." -f1
$ file=${$(basename $file_path)%.*}
Some more alternative options because regexes (regi ?) are awesome!
Here is a Simple regex to do the job:
regex="[^/]*$"
Example (grep):
FP="/hello/world/my/file/path/hello_my_filename.log"
echo $FP | grep -oP "$regex"
#Or using standard input
grep -oP "$regex" <<< $FP
Example (awk):
echo $FP | awk '{match($1, "$regex",a)}END{print a[0]}
#Or using stardard input
awk '{match($1, "$regex",a)}END{print a[0]} <<< $FP
If you need a more complicated regex:
For example your path is wrapped in a string.
StrFP="my string is awesome file: /hello/world/my/file/path/hello_my_filename.log sweet path bro."
#this regex matches a string not containing / and ends with a period
#then at least one word character
#so its useful if you have an extension
regex="[^/]*\.\w{1,}"
#usage
grep -oP "$regex" <<< $StrFP
#alternatively you can get a little more complicated and use lookarounds
#this regex matches a part of a string that starts with / that does not contain a /
##then uses the lazy operator ? to match any character at any amount (as little as possible hence the lazy)
##that is followed by a space
##this allows use to match just a file name in a string with a file path if it has an exntension or not
##also if the path doesnt have file it will match the last directory in the file path
##however this will break if the file path has a space in it.
regex="(?<=/)[^/]*?(?=\s)"
#to fix the above problem you can use sed to remove spaces from the file path only
## as a side note unfortunately sed has limited regex capibility and it must be written out in long hand.
NewStrFP=$(echo $StrFP | sed 's:\(/[a-z]*\)\( \)\([a-z]*/\):\1\3:g')
grep -oP "$regex" <<< $NewStrFP
Total solution with Regexes:
This function can give you the filename with or without extension of a linux filepath even if the filename has multiple "."s in it.
It can also handle spaces in the filepath and if the file path is embedded or wrapped in a string.
#you may notice that the sed replace has gotten really crazy looking
#I just added all of the allowed characters in a linux file path
function Get-FileName(){
local FileString="$1"
local NoExtension="$2"
local FileString=$(echo $FileString | sed 's:\(/[a-zA-Z0-9\<\>\|\\\:\)\(\&\;\,\?\*]*\)\( \)\([a-zA-Z0-9\<\>\|\\\:\)\(\&\;\,\?\*]*/\):\1\3:g')
local regex="(?<=/)[^/]*?(?=\s)"
local FileName=$(echo $FileString | grep -oP "$regex")
if [[ "$NoExtension" != "" ]]; then
sed 's:\.[^\.]*$::g' <<< $FileName
else
echo "$FileName"
fi
}
## call the function with extension
Get-FileName "my string is awesome file: /hel lo/world/my/file test/path/hello_my_filename.log sweet path bro."
##call function without extension
Get-FileName "my string is awesome file: /hel lo/world/my/file test/path/hello_my_filename.log sweet path bro." "1"
If you have to mess with a windows path you can start with this one:
[^\\]*$
$ source_file_filename_no_ext=${source_file%.*}
$ echo ${source_file_filename_no_ext##*/}

Resources