create simple lingvo .dsl dict using sed or awk? - bash

I have some text files and every file contains a definition for a word and looks like this:
word1
<TAB> some text
<TAB> some text
title 1
<TAB> some text
<TAB> some text
title 2
<TAB> some text
.
.
I want to create a simple lingvo .DSL dictionary so the desired output should be like this :
word1
[m2][trn]
<TAB> some text
<TAB> some text
[b]title 1[/b]
<TAB> some text
<TAB> some text
[b]title 2[/b]
<TAB> some text
<TAB> some text
.
.
[/m2][/trn]
so what I need to do is :
add [m2][trn] after the first word .
if a line begins with a letter or a number (not a tab) so it's a title and should be [b]title[/b]
Add [/m2][/trn] to the end of the file .
Any help will be appreciated.

This sed command should do it:
sed -e '1s/$/\n[m2][trn]/' \
-e '1!s/^[a-Z0-9].*/[B]&[\/B]/' \
-e '$s/$/\n[\/m2][\/trn]/' \
file

I'd say
sed '1! { /^[[:alnum:]]/ s/.*/[b]&[\/b]/; }; 1 s/$/\n[m2][trn]/; $ s/$/\n[\/trn][\/m2]/' filename
That is:
1! { /^[[:alnum:]]/ s/.*/[b]&[\/b]/; } # If the current line is not the first and
# starts with a letter or number, encase
# it in [b][/b]
1 s/$/\n[m2][trn]/ # If the current line is the first, put
# [m2][trn] behind it
$ s/$/\n[\/trn][\/m2]/ # If the current line is the last, put
# [/trn][/m2] behind it.

analysis
add [m2][trn] after the first word.
use a head splitter to handle the first line differently from the rest.
and just printf this start tag.
if a line begins with a letter or a number (not a tab) so it's a title and should be [b]title[/b]
sed to search for lines starting with word characters \w
Add [/m2][/trn] to the end of the file .
printf to add end tag
example script
head -n 1 input.txt 1>output.txt;
printf "[m2][trn]\n" 1>>output.txt;
tail -n +2 input.txt |
sed 's/^\(\w\+.\+\)/[b]\1[\/b]/g' 1>>output.txt;
printf "[/m2][/trn]\n" 1>>output.txt;
output
word1
[m2][trn]
some text
some text
[b]title 1[/b]
some text
some text
[b]title 2[/b]
some text
[/m2][/trn]

Related

How to replace a whole line (between 2 words) using sed?

Suppose I have text as:
This is a sample text.
I have 2 sentences.
text is present there.
I need to replace whole text between two 'text' words. The required solution should be
This is a sample text.
I have new sentences.
text is present there.
I tried using the below command but its not working:
sed -i 's/text.*?text/text\
\nI have new sentence/g' file.txt
With your shown samples please try following. sed doesn't support lazy matching in regex. With awk's RS you could do the substitution with your shown samples only. You need to create variable val which has new value in it. Then in awk performing simple substitution operation will so the rest to get your expected output.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file
Above code will print output on terminal, once you are Happy with results of above and want to save output into Input_file itself then try following code.
awk -v val="your_new_line_Value" -v RS="" '
{
sub(/text\.\n*[^\n]*\n*text/,"text.\n"val"\ntext")
}
1
' Input_file > temp && mv temp Input_file
You have already solved your problem using awk, but in case anyone else will be looking for a sed solution in the future, here's a sed script that does what you needed. Granted, the script is using some advanced sed features, but that's the fun part of it :)
replace.sed
#!/usr/bin/env sed -nEf
# This pattern determines the start marker for the range of lines where we
# want to perform the substitution. In our case the pattern is any line that
# ends with "text." — the `$` symbol meaning end-of-line.
/text\.$/ {
# [p]rint the start-marker line.
p
# Next, we'll read lines (using `n`) in a loop, so mark this point in
# the script as the beginning of the loop using a label called `loop`.
:loop
# Read the next line.
n
# If the last read line doesn't match the pattern for the end marker,
# just continue looping by [b]ranching to the `:loop` label.
/^text/! {
b loop
}
# If the last read line matches the end marker pattern, then just insert
# the text we want and print the last read line. The net effect is that
# all the previous read lines will be replaced by the inserted text.
/^text/ {
# Insert the replacement text
i\
I have a new sentence.
# [print] the end-marker line
p
}
# Exit the script, so that we don't hit the [p]rint command below.
b
}
# Print all other lines.
p
Usage
$ cat lines.txt
foo
This is a sample text.
I have many sentences.
I have many sentences.
I have many sentences.
I have many sentences.
text is present there.
bar
$
$ ./replace.sed lines.txt
foo
This is a sample text.
I have a new sentence.
text is present there.
bar
Substitue
sed -i 's/I have 2 sentences./I have new sentences./g'
sed -i 's/[A-Z]\s[a-z].*/I have new sentences./g'
Insert
sed -i -e '2iI have new sentences.' -e '2d'
I need to replace whole text between two 'text' words.
If I understand, first text. (with a dot) is at the end of first line and second text at the beginning of third line. With awk you can get the required solution adding values to var s:
awk -v s='\nI have new sentences.\n' '/text.?$/ {s=$0 s;next} /^text/ {s=s $0;print s;s=""}' file
This is a sample text.
I have new sentences.
text is present there.

replace ending brackets with another string in bash

I would like to replace the last 3 lines with another string.. using sed, tr, or other bash solution.
Given file:
{
[
{
text text text
text text text
text text text
}
],
[
{
text text text
text text text
text text text
}
]
}
desired result:
{
[
{
text text text
text text text
text text text
}
],
[
{
text text text
text text text
text text text
bar
I tried this with sed
sed -i '' 's/\}\s+\]\s+\}/bar/g' foobar.hcl
tried this with tr
tr -s 's/\}[:blank:]\][:blank:]\}/bar/g' <foobar.hcl
With perl where you can read entire input as a single string using -0777 option. Not suitable if input is large enough to run out of available memory.
# this will replace all remaining whitespaces at the end
# with a single newline
perl -0777 -pe 's/\}\s+]\s+\}\s*\z/bar\n/' foobar.hcl
# this will preserve all remaining whitespaces, if any
perl -0777 -pe 's/\}\s+]\s+\}(?=\s*\z)/bar/' foobar.hcl
Once it is working, you can use perl -i -0777 ... for in-place editing.
This might work for you (GNU sed):
sed '1N;:a;N;/^\s*}\s*\n\s*]\s*\n}\s*$/{s//bar/;N;ba};P;D' file
Open a 3 line window and pattern match.
Using an array - assumes "text text text" has some actual nonspace, non-punctuation characters.
mapfile x < file # throw into an array
c=${#x[#]} # count the lines
let c-- # point c at last index
until [[ "${x[-1]}" =~ [^[:space:][:punct:]] ]] # while last line has no data
do let c-- # decrement the last line pointer
x=( "${x[#]:0:$c}" ) # reassign array without last line
done
x+=( bar ) # add the desired string
echo "${x[#]}" > file # write file without unwanted lines
Allows for any number of blank lines &c. Even }]} and such, so long as it isn't on the same line with the data.

Replace one string with the content pulled from other file

I have an TARGET.md file, I'm looking for a string and I want to replace it with the content of other md file, I have tried many combinations but it seems like the newline in the files are the ones sed is not liking, I just need to do this using pure bash(it doesn't have to be sed) because this is how the whole script is running:
This works:
local search="##### Header"
local replace="##### Header\\
\\
Line 1\\
Line 2\\
Line 3\\
Line 4"
sed -i '' -e "s/${search}/${replace}/" TARGET.md
But this won't:
file1.md content:
##### Header
Line 1
Line 2
Line 3
Line 4
Script:
local search="##### Header"
local replace=$(curl "path/to/file/in/other/place/file1.md")
sed -i '' -e "s/${search}/${replace}/" TARGET.md
NOTE: I don't have the file1.md in the same place, I'm doing a curl to get the raw content from it, this is why the replace is in a variable.
I'm assuming the concept is possible but my sed syntax is wrong knowing sed can handle newlines out of the box, but not sure what is the proper way to do this.
I've been searching for some days now, any help, tip or guide is appreciated!
You are using the wrong tool. sed is a line editor at heart. While you can repeatedly append to pattern space in some instances, awk with getline provides a more flexible solution. For example with your file1.md:
##### Header
Line 1
Line 2
Line 3
Line 4
and your TARGET.md as:
##### Unreleased
my dog
has fleas
The to replace "##### Unreleased" with the content of file1.md, you can do:
awk -v replace="file1.md" -v search="##### Unreleased" '
$0 == search {while (getline line < replace ) { print line }; next }
{ print }
' TARGET.md
Above you have your replace and search as with sed, but instead of using the line-editor, you use awk to locate the line containing search and the read all lines from replace using getline`. The second rule just prints all other lines as is.
Example Use/Output
In the directory containing each file, you can simply select-copy the above and middle-mouse paste into the terminal to test:
$ awk -v replace="file1.md" -v search="##### Unreleased" '
> $0 == search {while (getline line < replace ) { print line }; next }
> { print }
> ' TARGET.md
##### Header
Line 1
Line 2
Line 3
Line 4
my dog
has fleas
Look things over and let me know if you have further questions.
Taking TARGET.md file from David's answer:
cat TARGET.md
##### Unreleased
my dog
has fleas
You can run sed with r command like this:
search="##### Unreleased"
sed -e "/$search/{r file1.md" -e ';d;}' TARGET.md
##### Header
Line 1
Line 2
Line 3
Line 4
my dog
has fleas

Replace k-th to n-th characters in 1st line and last line using bash?

I want to replace some characters in header and footer of a file. If say, I want to replace 5th to 9th character how do I do it? I need to use bash or a shell command.
I want to do something like this
s="abcdabcd"
s=s=s[0]+"12"+s[4:]
>a12dabcd
I have a string of exact length I can substitute and the start and end of replacement. I want to put the generated replacement back into the file.
Example:
I have this header:
HEADER 22aabbccdd23aabbccdd
I get these start and end indices : 2,10
I get this string: xyz56789
I want this: HEADER 22xyz5678923aabbccdd
to replace the existing 1st line in the file.
This can be done with Perl:
perl -i -lpe 'if ($. == 1 || eof) { substr($_, 1, 2) = "12" }' input.txt
-i: modify file in place
-l: automatically strip newlines from input and add them back on output
-p: iterate over lines of the input file and print them back out
-e CODE: what to do for each line
First we check whether the current line number ($.) is 1 (i.e. we're processing the first line of the file) or we have reached the end of the file (i.e. the line currently being processed is the last line of the file). If the condition is true, we take the substring of the current line ($_) starting from offset 1 of length 2 and set it to "12".
Simply with sed:
input.txt:
$ cat input.txt
22aabbccdd23aabbccdd
asasdfsdfd234234234234
$ sed -Ei '1 s/(..).{8}/\1xyz56789/' input.txt
Result:
22xyz5678923aabbccdd
asasdfsdfd234234234234

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.
sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file
I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input
Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

Resources