Using sed to replace tabs if input is not guaranteed to contain tabs? - bash

I'm trying to extract a list of names from a website using sed, but I'm not sure how to go about replacing the tab characters separating them.
This code:
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | html2text
gives me the names for September 12th, but they are separated by a tab character:
Åsa Åslög
If I change the sed script to replace tabs with comma and space, like this:
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text
it works as expected:
Åsa, Åslög
However, if I try on a day that only has one name, such as September 13th:
curl -s "https://namnidag.se/?year=2022&month=9&day=13" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#" -e 's/\t/, /p' | html2text
I get no output; the first sed script without the tab replacement works fine in this case though. What am I doing wrong here?
I'm using GNU sed 4.8, if that helps.
Thanks!

You need to remove the p
curl -s "https://namnidag.se/?year=2022&month=9&day=12" | sed -nE -e "s#<div class='names'>([^<]*)</div>#\1#p" | sed -e 's/\t/, /'

curl -s "https://namnidag.se/?year=2022&month=9&day=12" > f1
cat > ed1 <<EOF
71W f2
q
EOF
ed -s f1 < ed1
cat f2 | tail -c +20 | head -c -6 > file
rm -v ./ed1
rm -v ./f2
This will give you the names, whether there are two of them or not; and if there are, you can just seperate them with cut.

Related

User input into variables and grep a file for pattern

H!
So I am trying to run a script which looks for a string pattern.
For example, from a file I want to find 2 words, located separately
"I like toast, toast is amazing. Bread is just toast before it was toasted."
I want to invoke it from the command line using something like this:
./myscript.sh myfile.txt "toast bread"
My code so far:
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w "$keyword_first""$keyword_second" )
echo $find_keyword
i have tried a few different ways. Directly from the command line I can make it run using:
cat myfile.txt | grep -E 'toast|bread'
I'm trying to put the user input into variables and use the variables to grep the file
You seem to be looking simply for
grep -E "$2|$3" "$1"
What works on the command line will also work in a script, though you will need to switch to double quotes for the shell to replace variables inside the quotes.
In this case, the -E option can be replaced with multiple -e options, too.
grep -e "$2" -e "$3" "$1"
You can pipe to grep twice:
find_keyword=$(cat $text_file | grep -w "$keyword_first" | grep -w "$keyword_second")
Note that your search word "bread" is not found because the string contains the uppercase "Bread". If you want to find the words regardless of this, you should use the case-insensitive option -i for grep:
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
In a full script:
#!/bin/bash
#
# usage: ./myscript.sh myfile.txt "toast" "bread"
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
echo $find_keyword

How to write a script that will use regex to output only the heading and paragraph text from the http://example.com website

I am a beginner in scripting and i am working on the bash scripting for my work.
for this task i tried the sed command which didn't work
for your problem, following would work:
#!/bin.bash
curl -s http://example.com/ | grep -P "\s*\<h1\>.*\<\/h1\>" |sed -n 's:.*<h1>\(.*\)</h1>.*:\1:p'
curl -s http://example.com/ | grep -P "\s*\<p\>.*\<\/p\>" |sed -n 's:.*<p>\(.*\)</p>.*:\1:p'
The first line scrapes via curl and grep the <h1>..</h1> part(assuming theres only one as we are considering your example) and using sed extract the first capturing group( (.*) ) by :\1:
The second line does the same but for <p1> tag.
I could cram these 2 lines in one grep but these'll work fine!
Edit:
If <p> tag end on different lines, above wouldn't, you may have to use pcregrep
curl -s http://example.com/ | pcregrep -M "\s*\<p\>(\n|.)*\<\/p\>"
You can use the following one liner :
curl -s http://example.com/ | sed -n '2,$p' > /tmp/tempfile && cat /tmp/tempfile | xmllint --xpath '/html/head/title/text()' - && echo ; cat /tmp/tempfile | xmllint --xpath '/html/body/div/p/text()' -
This uses xmllint's xpath command to extract the text within <title> and <p> tags.

Curl and xargs in piped commands

I want to process an old database where password are plain text (comma separated ; passwd is the 5th field in the csv file where the database has been exported) to crypt them for further use by dokuwiki. Here is my bash command (grep and sed are there to extract the crypted passwd from curl output) :
cat users.csv | awk 'FS="," { print $4 }' | xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' | xargs | grep -o '<tt.*tt>' | sed -e 's/tt//g' | sed -e 's/<[^>]*>//g'
I get the following comment from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
And only the first line of the file is processed, and nothing appends then.
Using the -0 option, and playing around with quotes, doesn't solve anything. Where am I wrong in the command line ? May be a more advanced language will be more adequate to do this.
Thank for help, LM
In general, if you have such a long pipe of commands, it is better to split them if things go wrong. Going through your pipe:
cat users.csv |
Nothing unexpected there.
awk 'FS="," { print $4 }' |
You probably wanted to do awk 'BEGIN {FS=","} { print $4 }'. Try the first two commands in the pipe and see if they produce the correct answer.
xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' |
Nothing wrong there, although there might be better ways to do an MD5 hash.
xargs |
What is this xargs doing in the pipe? It should be removed.
grep -o '<tt.*tt>' |
Note that this will produce two lines:
<tt>$1$17ab075e$0VQMuM3cr5CtElvMxrPcE0</tt>
<tt><your_docuwiki_root>/conf/users.auth.php</tt>
which is probably not what you expected.
sed -e 's/tt//g' |
sed -e 's/<[^>]*>//g'
which will remove the html-tags, though
sed 's/<tt>//;s/<.tt>//'
will do the same.
So I'd say a wrong awk and an xargs too many.

Removing text in unix shell

Sorry, I'm pretty new to coding. I'm just trying to remove the CST that follows the end of the string. The final output that I'm trying to get says "Sunset: 4:38 PM CST". Exclude the quotation marks.
Here is the code that I'm using within the shell.
curl http://m.wund.com/US/MN/Winona.html | grep 'Sunset' | sed -e :a -e 's/<[^>]*>//g;/</N;//ba' | sed -e 's/Sunset/Sunset: /g' | sed -e 's/PST//g'
Just change:
... | sed -e 's/PST//g'
to
... | sed -e 's/CST//g'
You might also want to invoke curl -s instead of just curl to omit all the downloading stuff.

How to compose custom command-line argument from file lines?

I know about the xargs utility, which allows me to convert lines into multiple arguments, like this:
echo -e "a\nb\nc\n" | xargs
Results in:
a b c
But I want to get:
a:b:c
The character : is used for an example. I want to be able to insert any separator between lines to get a single argument. How can I do it?
If you have a file with multiple lines than you want to change to a single argument changing the NEWLINES by a single character, the paste command is what you need:
$ echo -en "a\nb\nc\n" | paste -s -d ":"
a:b:c
Then, your command becomes:
your_command "$(paste -s -d ":" your_file)"
EDIT:
If you want to insert more than a single character as a separator, you could use sed before paste:
your_command "$(sed -e '2,$s/^/<you_separator>/' your_file | paste -s -d "")"
Or use a single more complicated sed:
your_command "$(sed -n -e '1h;2,$H;${x;s/\n/<you_separator>/gp}' your_file)"
The example you gave is not working for me. You would need:
echo -e "a\nb\nc\n" | xargs
to get a b c.
Coming back to your need, you could do this:
echo "a b c" | awk 'OFS=":" {print $1, $2, $3}'
it will change the separator from space to : or whatever you want it to be.
You can also use sed:
echo "a b c" | sed -e 's/ /:/g
that will output a:b:c.
After all these data processing, you can use xargs to perform the command you want to. Just | xargs and do whatever you want.
Hope it helps.
You can join the lines using xargs and then replace the space(' ' ) using sed.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'
will result in
a:b:c
obviously you can use this output as argument for other command using another xargs.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'|xargs

Resources