Bash change number to another value on specific line - bash

i'm new with bash scripting , and i looking for solution to change a number to another value on specific line.
I have file named foo.config and in this file i have about 100 lines of configuration.
For example i have
<UpdateInterval>2</UpdateInterval>
and i need to find this line on foo.config and replace number(this can be number for 0 to 10 and for my example is 2) for 0 as always.
Like this :
<UpdateInterval>0</UpdateInterval>
How can i do it with sed ? please suggest
the part of lines:
<InstallUrl />
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>2</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName />
<SupportUrl />
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription />
<LoadBehavior>3</LoadBehavior>

sed and others(grep, awk) never be a good tools for parsing xml/html data. Use a proper xml/html parsers, like xmlstarlet:
xmlstarlet ed -L -O -u "//UpdateInterval" -v 0 foo.config
ed - edit mode
-L - edit the file inplace
-O - omit xml declaration
-u - update action
"//UpdateInterval" - xpath expression
-v 0 - the new value of the element to be updated
The final (exemplary) foo.config contents:
<root>
<InstallUrl/>
<TargetCulture>en</TargetCulture>
<ApplicationVersion>1.0.1.8</ApplicationVersion>
<AutoIncrementApplicationRevision>true</AutoIncrementApplicationRevision>
<UpdateEnabled>true</UpdateEnabled>
<UpdateInterval>0</UpdateInterval>
<UpdateIntervalUnits>hours</UpdateIntervalUnits>
<ProductName>xxxxxxxxxxxx</ProductName>
<PublisherName/>
<SupportUrl/>
<FriendlyName>xxxxxxxxxxxx</FriendlyName>
<OfficeApplicationDescription/>
<LoadBehavior>3</LoadBehavior>
</root>
The <root> tag was specified for demonstration purpose, your xml/html structure should have its own "root"(most parent) tag

In a very simple way, you may try:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
This will search for <UpdateInterval> at the beginning of a line (note the ^) and then a number ([0-9] stands for a digit and + for a repetition of one or more). This bit will be replaced with <UpdateInterval>0. The / characters separate what you search and what will replace it. The s command is a search and replace.
It will take the file foo.config as input and you will get the output on standard output. If you want your output on the same file, you may do:
sed -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config >foo.temp
mv foo.temp foo.config
Or more simply:
sed -i -E 's/^<UpdateInterval>[0-9]+/<UpdateInterval>0/' foo.config
Note that this is not a good way to do the substitution if your config file contains general XML. It will only work in the simplest of cases (but will do for your example.) If your XML bit may be in the middle of a line, remove the ^ character. The search and replace expression assumes that there is no whitespace around the XML tags.

A solution using an XML parsing tool:
{ echo '<root>'; cat foo.config; echo '</root>'; } |
xmlstarlet ed -O -P -u //UpdateInterval -v 0 |
sed '1d;$d' |
sponge foo.config
The first line is to make the config file into a proper XML file.
The second line updates the value.
The third line removes the root tags.
The last line rewrites the config file. Need to install the moreutils package.

Related

Remove first two characters from a column in a text file excluding the headers

I want to remove the first two characters of a column in a text file.
I am using the below but this is also truncating the headers.
sed -i 's/^..//' file1.txt
Below is my file:
FileName,Age
./Acct_Bal_Tgt.txt,7229
./IDQ_HB1.txt,5367
./IDQ_HB_LOGC.txt,5367
./IDQ_HB.txt,5367
./IGC_IDQ.txt,5448
./JobSchedule.txt,3851
I want the ./ to be removed from each line in the file name.
Transferring comments to an answer, as requested.
Modify your script to:
sed -e '2,$s/^..//' file1.txt
The 2,$ prefix limits the change to lines 2 to the end of the file, leaving line 1 unchanged.
An alternative is to remove . and / as the first two characters on a line:
sed -e 's%^[.]/%%' file1.txt
I tend to use -e to specify that the script option follows; it isn't necessary unless you split the script over several arguments (so it isn't necessary here where there's just one argument for the script). You could use \. instead of [.]; I'm allergic to backslashes (as you would be if you ever spent time working out whether you needed 8 or 16 consecutive backslashes to get the right result in a troff document).
Advice: Don't use the -i option until you've got your script working correctly. It overwrites your file with the incorrect output just as happily as it will with the correct output. Consequently, if you're asking about how to write a sed script on SO, it isn't safe to be using the -i option. Also note that the -i option is non-standard and behaves differently with different versions of sed (when it is supported at all). Specifically, on macOS, the BSD sed requires a suffix specified; if you don't want a backup, you have to use two arguments: -i ''.
Use this Perl one-liner:
perl -pe 's{^[.]/}{}' file1.txt > output.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s{^[.]/}{} : Replace a literal dot ([.]) followed by a slash ('/'), found at the beginning of the line (^), with nothing (delete them). This does not modify the header since it does not match the regex.
If you prefer to modify the file in-place, you can use this:
perl -i.bak -pe 's{^[.]/}{}' file1.txt
This creates the backup file file1.txt.bak.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

How to remove this special character at the end of the file

This is the output of cat command and I don't know what this special character is called that is at the end of the file to even search for. How to remove this special character in bash?
EDIT:
Here is the actual xml file(I am just copy pasting):
<?xml version="1.0" encoding="UTF-8"?>
<Package xmlns="http://soap.sforce.com/2006/04/metadata">
<types>
<name>ApexClass</name>
<members>CreditNotesManager</members>
<members>CreditNotesManagerTest</members>
</types>
<version>47.0</version>
</Package>%
It's unclear how the % (percent sign) is ending up in your file; it's easy to remove with sed:
sed -i '' 's/\(</.*>\)%.*/\1/g' file.xml
This will remove the percent and re-save your file. If you want to do a dry-run omit the -i '' portion as this is tells sed to save the file in-line.
As mentioned in the comments, there are many ways to do it. Just be sure you aren't removing something that you want to keep.
If it is just at the last line, this should work. Using ed(1)
printf '%s\n' '$s/%//' w | ed -s file.xml
If you don't need to save changes, you could use grep:
grep -v "%" <file.xml
This uses grep along with it's inverse matching flag -v. This method will remove all instances of the character % and print the result to STOUT. The < character is a method to tell grep which file you're talking about.
EDIT: actually you don't even need the redirection, so:
grep -v "%" file.xml
This is actually a feature of zsh, not bash.
To disable it, unsetopt prompt_cr prompt_sp
The reverse prompt character showing up means that line had an end-of-file before a final ascii linefeed (newline) character.
How to remove this special character at the end of the file

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

Extract text matching pattern X after having searched for pattern Y (bash)

In a bash script how would I be able to extract a text from an XML file that begins with abc ends with /abc which comes after a pattern that I need to look for?
Exemple of the input file:
<111>
<abc>
text
</abc>
<def>
text
</def>
</111>
<222>
<abc>
text to extract
</abc>
</222>
My goal would be to display "text to extract" indicating I'm looking for the pattern <222>.
your xml example doesn't have root element?
<111> <222> are not valid xml tag names
if you are not sure your xml format is fixed, don't use regex to parse it
xpath would be the way to go
assume the 111,222 tag named as t111, t222 and you had a root element.
xmllint --xpath "//t222/abc/text()" your.xml
This is really ugly and you really should use #Kent's answer, but if you really, really insist:
grep -A 999 "<222>" file.xml | grep -A1 "<abc>" | tail -n 1
It takes up to 999 lines after finding your pattern <222>, and then, from that, it takes the single line following <abc> and from that it takes the last line.
Using GNU awk for multi-char RS and gensub():
$ awk -v RS='^$' '{print gensub(/.*<222>.*<abc>\n(.*)\n<\/abc>.*/,"\\1","")}' file
text to extract

remove absolute path using sed command

I have file which contain following context like
abc...
include /home/user/file.txt'
some text
I need to remove include and also complete path after include.
I have used following command which remove include but did not remove path.
sed -i -r 's#include##g' 'filename'
I am also trying to understand above command but did not understand following thing ( copy paste from somewhere)
i - modify file change
r - read file
s- Need input
g - Need input
Try this,
$ sed '/^include /s/.*//g' file.txt
abc...
some text
It remove all the texts in a line which starts with include. s means substitute. so s/.*//g means replace all the texts with null.g means global. The substitution will be applied globally.
OR
$ sed '/^include /d' file.txt
abc...
some text
d means delete.
It deletes the line which starts with include. To save the changes made(inline edit), your commands should be
sed -i '/^include /s/.*//g' file.txt
sed -i '/^include /d' file.txt
I your case if you just want to delete the second line, you can use:
sed -i '2d' file
If you want to explore something about linux commands then man pages are there for you.
Just go to terminal and type:
man sed
as per your question, The above command without -i will show the file content on terminal by deleting the second line from the input file. However, the input file remains unchanged. To update the original file or to make the changes permanently in the source file, use the -i option.
-i[SUFFIX], --in-place[=SUFFIX] :
edit files in place (makes backup if extension supplied)
-r or --regexp-extended :
option is to use extended regular expressions in the script.
s/regexp/replacement/ :
Attempt to match regexp against the pattern space. If success‐
ful, replace that portion matched with replacement. The
replacement may contain the special character & to refer to that
portion of the pattern space which matched, and the special
escapes \1 through \9 to refer to the corresponding matching
sub-expressions in the regexp.
g G : Copy/append hold space to pattern space.
grep -v
This is not about learning sed, but as an alternative (and short) solution, there is:
grep -v '^include' filename_in
Or with output redirection:
grep -v '^include' filename_in > filename_out
-v option for grep inverts matching (hence printing non-matching lines).
For simple deletion that's what I'd use; if you have to modify your path after the include, stick with sed instead.
You can use awk to just delete the line:
awk '/^include/ {next}1' file
sed -i -r 's#include##g' 'filename'
-i: you directly modify the treated file, by default, sed read a file, modify the content via stdout (the original file stay the same).
-r: use of extended regular expression (and not reduce to POSIX limited one).This is not necessary in this case due to simple POSIX compliant action in action list (the s### string).
s#pattern#NewValue#: substitute in current line the pattern (Regular Expression) with "Newvalue" (that also use internal buffer or specific value). The traditionnal form is s/// but in this case, using / in path (pattern or new value) an alternate form is used to avoid to escape all / in pattern or new value
g: is an option of s### that specify change EVERY occurence and not the first (by default)
so here it replace ANY occurence of include by nothing (remove) directly into your file
As per the Avinash Raj solution you got what you want but you want some explaination about some parameter used in sed command
First one is
command: s for substitution
With the sed command the substitute command s changes all occurrences of the regular expression into a new value. A simple example is changing "my" in the "file1" to "yours" in the "file2" file:
sed s/my/yours/ file1 >file2
The character after the s is the delimiter. It is conventionally a slash, because this is what ed, more, and vi use. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - you could use the backslash to quote the slash:
sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new
/g - Global replacement
Replace all matches, not just the first match.
If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first then add a g after the last delimiter and use the work-around:
Delete with d
Delete the pattern space; immediately start next cycle.
You can delete line by specifying the line number. like
sed '$d' filename.txt
It will remove last line of file
sed '2 d' file.txt
It will delete second line of file.
-i option
This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.
To modify file actully you can use -i option without it sed command repressent changes on stdout not actual file. You can take backup of original file before modification by using -i.bak option.
-r option
--regexp-extended
Use extended regular expressions rather than basic regular expressions. Extended regexps are those that egrep accepts; they can be clearer because they usually have less backslashes, but are a GNU extension and hence scripts that use them are not portable.

Resources