Replace file line with multi-line, special char string - bash

I'm trying to automate generating a README.md.
The idea is:
Generate markdown table string like...
table="| Image | Name | Description | Notes |\n"
table+="| --- | --- | --- | --- |\n"
table+="| $img1 | $name1 | $desc1 | $notes1 |\n"
table+="| $img2 | $name2 | $desc2 | $notes2 |\n"
...
*simplified
*contains special characters like e.g. |-()[]/<>
Replace <!-- insert-table-here --> in a readme_template.md file with the full table
## Header
<!-- insert-table-here -->
<sub>More info...</sub>
Save new file as README.md
I can't get step 2 working.
How do you replace a line in a file with a multi-line, special char ridden string?
Every sed, awk, perl, or even head/tail command I try seems to not work. Are heredocs the better approach?
I have found some hack solutions for specific cases with specific chars but I want to identify a more robust method.
EDIT: Thanks to #potong, this is what ended up working for me.
echo -e ${table} | sed -e '/<!-- insert-table-here -->/{r /dev/stdin' -e 'd}' readme_template.md > README.md
EDIT 2: After spending some more time on this, I found a nice multi-match option through awk
awk \
-v t1="$(generate_table1)" \
-v t2="$(generate_table2)" \
'{
gsub(/<!-- insert-table-1 -->/,t1)
gsub(/<!-- insert-table-2 -->/,t2)
}1' \
readme_template.md > README.md

This might work for you (GNU sed and bash):
cat <<\! | sed -e '/<!-- insert-table-here -->/{r /dev/stdin' -e 'd}' file
Here is a heredoc
with special symbols
{}-()[]/<>
!
The heredoc is piped through to the sed command using /dev/stdin as a file for the r command, then the original line is deleted using the d command.
N.B. The use of the -e command line option to split the two parts of the sed script (oneliner). This is necessary because the r command needs to be terminated by a newline and the -e option provides this functionality.

Related

Bash script generates line break - Why?

I made a small bash script which gets the current air pressure from a website and writes it into a variable. After the air pressure I would like to add the current date and write everything into a text file. The target should be a kind of CSV file.
My problem. I always get a line break between the air pressure and the date. Attempts to remove the line break by sed or tr '\n' have failed.
2nd guess from me: wget is done "too late" and echo is already done.
So I tried it with && between all commands. Same result.
Operating system is Linux. Where is my thinking error?
I can't get any further right now. Thanks in advance.
Sven
PS.: These are my first attempts with sed. This can be written certainly nicer ;)
#!/bin/bash
luftdruck=$(wget 'https://www.fg-wetter.de/aktuelle-messwerte/' -O aktuell.html && cat aktuell.html | grep -A 0 hPa | sed -e 's/<[^>]*>//g' | sed -e '/^-/d' | sed -e '/title/d' | sed -e 's/ hPa//g')
datum=$(date)
echo -e "${luftdruck} ${datum}" >> ausgabe.txt
Replace sed -e 's/ hPa//g') with sed -e 's/ hPa//g' | dos2unix) to replace trailing carriage return (DOS/Windows) with line feed (Unix/Linux).
The html file you download is using Windows line endings (Carriage Return \r + Line Feed \n). I assume your bash script only removes \ns, but the editor you are using to view the file is showing the \r as a linebreak.
Therefore, you could pipe everything through tr -d \\r\\n which would remove all line breaks.
But there is a better alternative: Extract only the important part instead of whole lines.
luftdruck=$(
wget 'https://www.fg-wetter.de/aktuelle-messwerte/' -O - |
grep -o '[^>]*hPa' | tr -dc 0-9.
)
echo "$luftdruck $(date)" >> ausgabe.txt

Shorten sed sustitution or possible alternative

I've some data being fed in few files. The requirement is to format the textual contents in these files and add newlines post formatting.
Requirement of substitution:
Text | Substituted
-----------------------
#Network | #Network
# Network | #Network
#Daemon | #Daemon
# Daemon | #Daemon
#Service | #Service
# Service | #Service
----------------------
I've tried using sed to do this, but the command gets huge and cluttered, as the substitution is not limited to only letters N,D & S and more and more Capital Alphabets gets added day by day in the requirement.
cat results_090316.out | sed -e 's/ //g' -e 's/#N/#N/g' -e 's/#S/#S/g' -e 's/#D/#D/g' -e 's/# N/#N/g' -e 's/# S/#S/g' -e 's/# D/#D/g' | tr '#' '\n'
If sed is not the proper tool to perform such substitutions, could you suggest an alternative?
The code is written in bash on RHEL 6 / Solaris 10 OS.
You can shorten it using a character class and optional space matching:
sed 's/ //g; s/# *\([NDS]\)/#\1/g' results_090316.out
Your choice of tool is alright, but you're not using the full power of regular expressions. For example, below I use a "character class" to create a custom group of characters to match, e.g. [NSD], and then use a "backreference" (\1) by first "capturing" a piece of the search (with \( and \)):
cat results_090316.out | sed -e 's/ //g' -e 's/#\([NSD]\)/#\1/g' -e 's/# \([NSD]\)/#\1/g' | tr '#' '\n'
But we can do better and use the ? "quantifier" (zero or one of the predecessing atom) to combine even the no-space and space cases:
cat results_090316.out | sed -e 's/ //g' -e 's/# \?\([NSD]\)/#\1/g' | tr '#' '\n'

Removing text in unix shell

Sorry, I'm pretty new to coding. I'm just trying to remove the CST that follows the end of the string. The final output that I'm trying to get says "Sunset: 4:38 PM CST". Exclude the quotation marks.
Here is the code that I'm using within the shell.
curl http://m.wund.com/US/MN/Winona.html | grep 'Sunset' | sed -e :a -e 's/<[^>]*>//g;/</N;//ba' | sed -e 's/Sunset/Sunset: /g' | sed -e 's/PST//g'
Just change:
... | sed -e 's/PST//g'
to
... | sed -e 's/CST//g'
You might also want to invoke curl -s instead of just curl to omit all the downloading stuff.

Concise and portable "join" on the Unix command-line

How can I join multiple lines into one line, with a separator where the new-line characters were, and avoiding a trailing separator and, optionally, ignoring empty lines?
Example. Consider a text file, foo.txt, with three lines:
foo
bar
baz
The desired output is:
foo,bar,baz
The command I'm using now:
tr '\n' ',' <foo.txt |sed 's/,$//g'
Ideally it would be something like this:
cat foo.txt |join ,
What's:
the most portable, concise, readable way.
the most concise way using non-standard unix tools.
Of course I could write something, or just use an alias. But I'm interested to know the options.
Perhaps a little surprisingly, paste is a good way to do this:
paste -s -d","
This won't deal with the empty lines you mentioned. For that, pipe your text through grep, first:
grep -v '^$' | paste -s -d"," -
This sed one-line should work -
sed -e :a -e 'N;s/\n/,/;ba' file
Test:
[jaypal:~/Temp] cat file
foo
bar
baz
[jaypal:~/Temp] sed -e :a -e 'N;s/\n/,/;ba' file
foo,bar,baz
To handle empty lines, you can remove the empty lines and pipe it to the above one-liner.
sed -e '/^$/d' file | sed -e :a -e 'N;s/\n/,/;ba'
How about to use xargs?
for your case
$ cat foo.txt | sed 's/$/, /' | xargs
Be careful about the limit length of input of xargs command. (This means very long input file cannot be handled by this.)
Perl:
cat data.txt | perl -pe 'if(!eof){chomp;$_.=","}'
or yet shorter and faster, surprisingly:
cat data.txt | perl -pe 'if(!eof){s/\n/,/}'
or, if you want:
cat data.txt | perl -pe 's/\n/,/ unless eof'
Just for fun, here's an all-builtins solution
IFS=$'\n' read -r -d '' -a data < foo.txt ; ( IFS=, ; echo "${data[*]}" ; )
You can use printf instead of echo if the trailing newline is a problem.
This works by setting IFS, the delimiters that read will split on, to just newline and not other whitespace, then telling read to not stop reading until it reaches a nul, instead of the newline it usually uses, and to add each item read into the array (-a) data. Then, in a subshell so as not to clobber the IFS of the interactive shell, we set IFS to , and expand the array with *, which delimits each item in the array with the first character in IFS
I needed to accomplish something similar, printing a comma-separated list of fields from a file, and was happy with piping STDOUT to xargs and ruby, like so:
cat data.txt | cut -f 16 -d ' ' | grep -o "\d\+" | xargs ruby -e "puts ARGV.join(', ')"
I had a log file where some data was broken into multiple lines. When this occurred, the last character of the first line was the semi-colon (;). I joined these lines by using the following commands:
for LINE in 'cat $FILE | tr -s " " "|"'
do
if [ $(echo $LINE | egrep ";$") ]
then
echo "$LINE\c" | tr -s "|" " " >> $MYFILE
else
echo "$LINE" | tr -s "|" " " >> $MYFILE
fi
done
The result is a file where lines that were split in the log file were one line in my new file.
Simple way to join the lines with space in-place using ex (also ignoring blank lines), use:
ex +%j -cwq foo.txt
If you want to print the results to the standard output, try:
ex +%j +%p -scq! foo.txt
To join lines without spaces, use +%j! instead of +%j.
To use different delimiter, it's a bit more tricky:
ex +"g/^$/d" +"%s/\n/_/e" +%p -scq! foo.txt
where g/^$/d (or v/\S/d) removes blank lines and s/\n/_/ is substitution which basically works the same as using sed, but for all lines (%). When parsing is done, print the buffer (%p). And finally -cq! executing vi q! command, which basically quits without saving (-s is to silence the output).
Please note that ex is equivalent to vi -e.
This method is quite portable as most of the Linux/Unix are shipped with ex/vi by default. And it's more compatible than using sed where in-place parameter (-i) is not standard extension and utility it-self is more stream oriented, therefore it's not so portable.
POSIX shell:
( set -- $(cat foo.txt) ; IFS=+ ; printf '%s\n' "$*" )
My answer is:
awk '{printf "%s", ","$0}' foo.txt
printf is enough. We don't need -F"\n" to change field separator.

Add Tab Separator to Grep

I am new to grep and awk, and I would like to create tab separated values in the "frequency.txt" file output (this script looks at a large corpus and then outputs each individual word and how many times it is used in the corpus - I modified it for the Khmer language). I've looked around ( grep a tab in UNIX ), but I can't seem to find an example that makes sense to me for this bash script (I'm too much of a newbee).
I am using this bash script in cygwin:
#!/bin/bash
# Create a tally of all the words in the corpus.
#
echo Creating tally of word frequencies...
#
sed -e 's/[a-zA-Z]//g' -e 's/​/ /g' -e 's/\t/ /g' \
-e 's/[«|»|:|;|.|,|(|)|-|?|។|”|“]//g' -e 's/[0-9]//g' \
-e 's/ /\n/g' -e 's/០//g' -e 's/១//g' -e 's/២//g' \
-e 's/៣//g' -e 's/៤//g' -e 's/៥//g' -e 's/៦//g' \
-e 's/៧//g' -e 's/៨//g' -e 's/៩//g' dictionary.txt | \
tr [:upper:] [:lower:] | \
sort | \
uniq -c | \
sort -rn > frequency.txt
grep -Fwf dictionary.txt frequency.txt | awk '{print $2 "," $1}'
Awk is printing with a comma, but that is only on-screen. How can I place a tab (a comma would work as well), between the frequency and the term?
Here's a small part of the dictionary.txt file (Khmer does not use spaces, but in this corpus there is a non-breaking space between each word which is converted to a space using sed and regular expressions):
ព្រះ​វិញ្ញាណ​នឹង​ប្រពន្ធ​ថ្មោង​ថ្មី​ពោល​ថា
អញ្ជើញ​មក ហើយ​អ្នក​ណា​ដែល​ឮ​ក៏​ថា
អញ្ជើញ​មក​ដែរ អ្នក​ណា​ដែល​ស្រេក
នោះ​មាន​តែ​មក ហើយ​អ្នក​ណា​ដែល​ចង់​បាន
មាន​តែ​យក​ទឹក​ជីវិត​នោះ​ចុះ
ឥត​ចេញ​ថ្លៃ​ទេ។
Here is an example output of frequency.txt as it is now (frequency and then term):
25605 នឹង 25043 ជា 22004 បាន 20515 នោះ
I want the output frequency.txt to look like this (where TAB is an actual tab character):
25605TABនឹង 25043TABជា 22004TABបាន 20515TABនោះ
Thanks for your help!
You should be able to replace the whole lengthy sed command with this:
tr -d '[a-zA-Z][0-9]«»:;.,()-?។”“|០១២៣៤៥៦៧៨៩'
tr '\t' ' '
Comments:
's/​/ /g' - the first two slashes mean re-use the previous match which was [a-z][A-Z] and replace them with spaces, but they were deleted so this is a no-op
's/[«|»|:|;|.|,|(|)|-|?|។|”|“]//g' - the pipe characters don't delimit alternatives inside square brackets, they are literal (and more than one is redundant), the equivalent would be 's/[«»:;.,()-?។”“|]//g' (leaving one pipe in case you really want to delete them)
's/ /\n/g' - earlier, you replaced tabs with spaces, now you're replacing the spaces with newlines
You should be able to have the tabs you want by inserting this in your pipeline right after the uniq:
sed 's/^ *\([0-9]\+\) /\1\t/'
If you want the AWK command to output a tab:
awk 'BEGIN{OFS='\t'} {print $2, $1}'
What about writing awk to file with "<"?
The following script should get you where you need to go. The pipe to tee will let you see output on the screen while at the same time writing the output to ./outfile
#!/bin/sh
sed ':a;N;s/[a-zA-Z0-9។០១២៣៤៥៦៧៨៩\n«»:;.,()?”“-]//g;ta' < dictionary.txt | \
gawk '{$0=toupper($0);for(i=1;i<=NF;i++)a[$i]++}
END{for(item in a)printf "%s\t%d ", item, a[item]}' | \
tee ./outfile

Resources