Replace k-th to n-th characters in 1st line and last line using bash? - bash

I want to replace some characters in header and footer of a file. If say, I want to replace 5th to 9th character how do I do it? I need to use bash or a shell command.
I want to do something like this
s="abcdabcd"
s=s=s[0]+"12"+s[4:]
>a12dabcd
I have a string of exact length I can substitute and the start and end of replacement. I want to put the generated replacement back into the file.
Example:
I have this header:
HEADER 22aabbccdd23aabbccdd
I get these start and end indices : 2,10
I get this string: xyz56789
I want this: HEADER 22xyz5678923aabbccdd
to replace the existing 1st line in the file.

This can be done with Perl:
perl -i -lpe 'if ($. == 1 || eof) { substr($_, 1, 2) = "12" }' input.txt
-i: modify file in place
-l: automatically strip newlines from input and add them back on output
-p: iterate over lines of the input file and print them back out
-e CODE: what to do for each line
First we check whether the current line number ($.) is 1 (i.e. we're processing the first line of the file) or we have reached the end of the file (i.e. the line currently being processed is the last line of the file). If the condition is true, we take the substring of the current line ($_) starting from offset 1 of length 2 and set it to "12".

Simply with sed:
input.txt:
$ cat input.txt
22aabbccdd23aabbccdd
asasdfsdfd234234234234
$ sed -Ei '1 s/(..).{8}/\1xyz56789/' input.txt
Result:
22xyz5678923aabbccdd
asasdfsdfd234234234234

Related

sed insert line after a match only once [duplicate]

UPDATED:
Using sed, how can I insert (NOT SUBSTITUTE) a new line on only the first match of keyword for each file.
Currently I have the following but this inserts for every line containing Matched Keyword and I want it to only insert the New Inserted Line for only the first match found in the file:
sed -ie '/Matched Keyword/ i\New Inserted Line' *.*
For example:
Myfile.txt:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
changed to:
Line 1
Line 2
Line 3
New Inserted Line
This line contains the Matched Keyword and other stuff
Line 4
This line contains the Matched Keyword and other stuff
Line 6
You can sort of do this in GNU sed:
sed '0,/Matched Keyword/s//New Inserted Line\n&/'
But it's not portable. Since portability is good, here it is in awk:
awk '/Matched Keyword/ && !x {print "Text line to insert"; x=1} 1' inputFile
Or, if you want to pass a variable to print:
awk -v "var=$var" '/Matched Keyword/ && !x {print var; x=1} 1' inputFile
These both insert the text line before the first occurrence of the keyword, on a line by itself, per your example.
Remember that with both sed and awk, the matched keyword is a regular expression, not just a keyword.
UPDATE:
Since this question is also tagged bash, here's a simple solution that is pure bash and doesn't required sed:
#!/bin/bash
n=0
while read line; do
if [[ "$line" =~ 'Matched Keyword' && $n = 0 ]]; then
echo "New Inserted Line"
n=1
fi
echo "$line"
done
As it stands, this as a pipe. You can easily wrap it in something that acts on files instead.
If you want one with sed*:
sed '0,/Matched Keyword/s//Matched Keyword\nNew Inserted Line/' myfile.txt
*only works with GNU sed
This might work for you:
sed -i -e '/Matched Keyword/{i\New Inserted Line' -e ':a;n;ba}' file
You're nearly there! Just create a loop to read from the Matched Keyword to the end of the file.
After inserting a line, the remainder of the file can be printed out by:
Introducing a loop place holder :a (here a is an arbitrary name).
Print the current line and fetch the next into the pattern space with the ncommand.
Redirect control back using the ba command which is essentially a goto to the a place holder. The end-of-file condition is naturally taken care of by the n command which terminates any further sed commands if it tries to read passed the end-of-file.
With a little help from bash, a true one liner can be achieved:
sed $'/Matched Keyword/{iNew Inserted Line\n:a;n;ba}' file
Alternative:
sed 'x;/./{x;b};x;/Matched Keyword/h;//iNew Inserted Line' file
This uses the Matched Keyword as a flag in the hold space and once it has been set any processing is curtailed by bailing out immediately.
If you want to append a line after first match only, use AWK instead of SED as below
awk '{print} /Matched Keyword/ && !n {print "New Inserted Line"; n++}' myfile.txt
Output:
Line 1
Line 2
Line 3
This line contains the Matched Keyword and other stuff
New Inserted Line
Line 4
This line contains the Matched Keyword and other stuff
Line 6

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.
sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file
I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input
Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

How can I retrieve the matching records from mentioned file format in bash

XYZNA0000778800Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
I have above file format from which I want to find a matching record. For example, match a number(7789) on line starting with XYZ and once matched look for a matching number (7345) in lines below starting with 1 until it reaches to line starting with 9. retrieve the entire line record. How can I accomplish this using shell script, awk, sed or any combination.
Expected Output:
XYZNA0000778900Z
17345000012300324000000004000000000000000
With sed one can do:
$ sed -n '/^XYZ.*7789/,/^9$/{/^1.*7345/p}' file
17345000012300324000000004000000000000000
Breakdown:
sed -n ' ' # -n disabled automatic printing
/^XYZ.*7789/, # Match line starting with XYZ, and
# containing 7789
/^1.*7345/p # Print line starting with 1 and
# containing 7345, which is coming
# after the previous match
/^9$/ { } # Match line that is 9
range { stuff } will execute stuff when it's inside range, in this case the range is starting at /^XYZ.*7789/ and ending with /^9$/.
.* will match anything but newlines zero or more times.
If you want to print the whole block matching the conditions, one can use:
$ sed -n '/^XYZ.*7789/{:s;N;/\n9$/!bs;/\n1.*7345/p}' file
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
This works by reading lines between ^XYZ.*7779 and ^9$ into the pattern
space. And then printing the whole thing if ^1.*7345 can be matches:
sed -n ' ' # -n disables printing
/^XYZ.*7789/{ } # Match line starting
# with XYZ that also contains 7789
:s; # Define label s
N; # Append next line to pattern space
/\n9$/!bs; # Goto s unless \n9$ matches
/\n1.*7345/p # Print whole pattern space
# if \n1.*7345 matches
I'd use awk:
awk -v rid=7789 -v fid=7345 -v RS='\n9\n' -F '\n' 'index($1, rid) { for(i = 2; i < $NF; ++i) { if(index($i, fid)) { print $i; next } } }' filename
This works as follows:
-v RS='\n9\n' is the meat of the whole thing. Awk separates its input into records (by default lines). This sets the record separator to \n9\n, which means that records are separated by lines with a single 9 on them. These records are further separated into fields, and
-F '\n' tells awk that fields in a record are separated by newlines, so that each line in a record becomes a field.
-v rid=7789 -v fid=7345 sets two awk variables rid and fid (meant by me as record identifier and field identifier, respectively. The names are arbitrary.) to your search strings. You could encode these in the awk script directly, but this way makes it easier and safer to replace the values with those of a shell variables (which I expect you'll want to do).
Then the code:
index($1, rid) { # In records whose first field contains rid
for(i = 2; i < $NF; ++i) { # Walk through the fields from the second
if(index($i, fid)) { # When you find one that contains fid
print $i # Print it,
next # and continue with the next record.
} # Remove the "next" line if you want all matching
} # fields.
}
Note that multi-character record separators are not strictly required by POSIX awk, and I'm not certain if BSD awk accepts it. Both GNU awk and mawk do, though.
EDIT: Misread question the first time around.
an extendable awk script can be
$ awk '/^9$/{s=0} s&&/7345/; /^XYZ/&&/7789/{s=1} ' file
set flag s when line starts with XYZ and contains 7789; reset when line is just 9, and print when flag is set and contains pattern 7345.
This might work for you (GNU sed):
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789/!b;/7345/p' file
Use the option -n for the grep-like nature of sed. Gather up records beginning with XYZ and ending in 9. Reject any records which do not have 7789 in the header. Print any remaining records that contain 7345.
If the 7345 will always follow the header,this could be shortened to:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789.*7345/p' file
If all records are well-formed (begin XYZ and end in 9) then use:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^[^\n]*7789.*7345/p' file

shell: how to read a certain column in a certain line into a variable

I want to extract the first column of the last line of a text file. Instead of output the content of interest in another file and read it in again, can I just use some command to read it into a variable directly?
For exampole, if my file is like this:
...
123 456 789(this is the last line)
What I want is to read 123 into a variable in my shell script. How can I do that?
One approach is to extract the line you want, read its columns into an array, and emit the array element you want.
For the last line:
#!/bin/bash
# ^^^^- not /bin/sh, to enable arrays and process substitution
read -r -a columns < <(tail -n 1 "$filename") # put last line's columns into an array
echo "${columns[0]}" # emit the first column
Alternately, awk is an appropriate tool for the job:
line=2
column=1
var=$(awk -v line="$line" -v col="$column" 'NR == line { print $col }' <"$filename")
echo "Extracted the value: $var"
That said, if you're looking for a line close to the start of a file, it's often faster (in a runtime-performance sense) and easier to stick to shell builtins. For instance, to take the third column of the second line of a file:
{
read -r _ # throw away first line
read -r _ _ value _ # extract third value of second line
} <"$filename"
This works by using _s as placeholders for values you don't want to read.
I guess with "first column", you mean "first word", do you?
If it is guaranteed, that the last line doesn't start with a space, you can do
tail -n 1 YOUR_FILE | cut -d ' ' -f 1
You could also use sed:
$> var=$(sed -nr '$s/(^[^ ]*).*/\1/p' "file.txt")
The -nr tells sed to not output data by default (-n) and use extended regular expressions (-r to avoid needing to escape the paranthesis otherwise you have to write \( \))). The $ is an address that specifies the last line. The regular expression anchors the beginning of the line with the first ^, then matches everything that is not a space [^ ]* and puts that the result into a capture group ( ) and then gets rid of the rest of the line .* by replacing the line with the capture group \1, then print p to print the line.

Replace text with sed

A program creates HTML files from a database. There are headings and stuff in between the headings.
There are not a set amount of headings.
After each heading the program places the text:
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
I need every occurrence of these 4 lines to be replaced with:
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
I am not fussed what program is used :)
I have tried:
sed 's:\$WHITE\*(\"5\")\$\n\n\$WHITE\*(\"20\")\$\n\$HRULE\$:\$WHITE\*(\"20\")\$\
\$HRULE$\
\$WHITE*("10")$:g'
and various other permutations
If that'S your input file, and this is the spec, you can do:
sed -n '3,$p;$a$WHITE*("10")$' INPUTFILE
But I assume that's not the case, so you might want to rephrase your question and/or giving some more detailes.
More specific solution with sed:
sed '/^\$WHITE\*("5")\$$/,/^$/d;/\$HRULE\$/ a$WHITE*("10")$' INPUTFILE
(Searches for the $WHITE*("5")$ line and deletes it till (including!) the next empty line. Then searches for the next $HRULE$ line and appends an $WHITE*("10")$ line.
awk solution:
awk '/\$WHITE\*\("5"\)\$/ { getline ; next }
/\$WHITE\*\("20"\)\$/ { print ;
getline ;
if ($0 ~ /\$HRULE\$/) { print ;
print "$WHITE*(\"10\")$" ;
}
else { print }
}
1 ' INPUTFILE
This reads the file and prints every line - that's why the 1 is there, except if it finds the $WHITE*("5") pattern it drops it, reads the next line and drops that too. if it finds the $WHITE*("20") prints it. Reads the next line and if its $HRULE$ then prints that and the appended $WHITE*("10") line. Else just prints the line.
HTH
UPDATE #2
From the sed faq, section 4.23.3
If you need to match a static block of text (which may occur any number of times throughout a file), where the contents of the block are known in advance, then this script is easy to use
UPDATE #1
Python?
$ cat input
first line
second line
3rd line
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
some more lines
yet another
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
THE END
the script:
#!/usr/bin/env python
## Use these 3 lines for python version < 2.5
#fd=open('input')
#text=fd.read()
#fd.close()
## Use these 2 lines for python version >= 2.5
with open('input') as fd:
text=fd.read()
old="""$WHITE*("5")$
$WHITE*("20")$
$HRULE$
"""
new="""$WHITE*("20")$
$HRULE$
$WHITE*("10")$
"""
print text.replace(old,new)
output:
first line
second line
3rd line
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
some more lines
yet another
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
THE END
Try something like
sed -e '${p;};/$WHITE\*("5")\$/,/$HRULE\$/{H;/$HRULE\$/{g;s/$HRULE\$//;s/20/10/;s/5/20/;s/\n/&$HRULE$/2p;s/.*//p;x;d;};d;};' white.txt
Crude, but it should work.
This might work for you:
sed '/^\$WHITE\*(\"5\")\$/{N;N;N;s/.*\n\n\(\(\$WHITE\*(\"\)20\(\")\$\s*\)\n\$HRULE\$\s*$\)/\1\n\210\3/}' file
Explanation:
Match on first string $WHITE*("5")$, read the next 3 lines and match on remainder. Use grouping and back references to formulate output lines.

Resources