Why is sed not able to process piped output from xpath? - macos

While trying to run the following in Terminal in Mac OS Lion, rather than getting the first line as output, I simply get the output from xpath.
curl -s http://wordsmith.org/awad/rss1.xml | xpath //item/description | sed q
Outputs:
Found 1 nodes:
-- NODE --
<description>...</description>
Instead of:
Found 1 nodes:
Why is sed not able to process the output from xpath? What am I missing?

I don't have Mac OS but I can guess your problem. If I do the equivalent under Linux I get the following output:
$ curl -s http://wordsmith.org/awad/rss1.xml | xpath -e "//item/description" | sed q
Found 1 nodes in stdin:
-- NODE --
<description>Ending life for humane reasons, such as to avoid pain from an incurable condition.</description>
That's because part of the output is going to stdout and part is going to stderr. So if I redirect everything to stdout, I get this,
$ curl -s http://wordsmith.org/awad/rss1.xml | xpath -e "//item/description" 2>&1 | sed q
Found 1 nodes in stdin:

I do not have the exact answer, but I have come up against this exact problem. Although I was using awk not sed. The solution was setting the -q flag. Also you forgot the -e flag to identify the expression. This might have something to do with me being on ubuntu and you being on osx. but my output was the same.
so what you want is
curl -s http://wordsmith.org/awad/rss1.xml | xpath -q -e //item/description | sed q
SYNOPSIS
xpath [-s suffix] [-p prefix] [-q] -e query [-e query] ... [file] ...
-q
Be quiet. Output only errors (and no separator) on stderr.

On OSX 10.7.4
I am not exactly sure what you wanted as an output. I wanted to get rid of the STDERR ("Found X nodes ...") and only print out the actual item (the actual title and description). Hopefully this helps.
> cat wordsmith.sh
#!/bin/bash
/usr/bin/curl -s http://wordsmith.org/awad/rss1.xml > file.xml
title=`xpath file.xml //item/title 2> /dev/null | sed 's/<[^>]\*>//g'`
description=`xpath file.xml //item/description 2> /dev/null | sed 's/<[^>]*>//g'`
echo $title : $description
/bin/rm file.xml
> ./wordsmith.sh
versal : Universal; whole.

I have additional info that may provide more clarity to the question and even solve some problems, as it did mine. The errors encountered are partially related to the version of xpath that you are running.
The -q quiet flag is available on the version that I installed on my Ubuntu system via apt-get, but not available on the versions installed either on OSX or RHEL. There are also slight syntax differences between the versions like the order of the query and the input are reversed.
But the most helpful part is that you can copy an Ubuntu-installed version to the other systems and it works fine with the rest of the already installed xpath library. You need to have xpath installed and can then just migrate the core xpath script (usually at /usr/bin/xpath). Then you can take advantage of the extremely helpful -q parameter and skip the sed/regex post processing.

If you don't have the -q flag on OS X, you could comment out those lines that print "-- NODE --" and "Found x nodes". Something like this:
murphy:~ pdurbin$ diff -u /usr/bin/xpath5.12.orig /usr/bin/xpath5.12
--- /usr/bin/xpath5.12.orig 2012-12-06 06:29:14.000000000 -0500
+++ /usr/bin/xpath5.12 2014-05-15 14:32:14.000000000 -0400
## -48,17 +48,18 ##
}
if ($nodes->size) {
- print STDERR "Found ", $nodes->size, " nodes:\n";
+ #print STDERR "Found ", $nodes->size, " nodes:\n";
foreach my $node ($nodes->get_nodelist) {
- print STDERR "-- NODE --\n";
+ #print STDERR "-- NODE --\n";
print $node->toString;
+ print "\n";
}
}
else {
print STDERR "No nodes found";
}
-print STDERR "\n";
+#print STDERR "\n";
exit;
murphy:~ pdurbin$

A lot late, but I had a similar problem in a bash script recently in which I was trying to suppress "Found # nodes:" followed by "---NODE---" for every item returned, above the values the command was inserting into an array.
For example:
Found 9 nodes:
-- NODE --
-- NODE --
-- NODE --
-- NODE --
-- NODE --
-- NODE --
-- NODE --
-- NODE --
-- NODE --
Please select your option from the menu:
1) Option 1
2) Option 2
etc.
I fixed it by redirecting STDERR to 2>/dev/null in my xpath argument. This eliminated the "Found # nodes" and only returned the array I loaded into a select menu. Hope this helps whoever stumbles across this in the future.

Related

When I pipe the service --status-all command to grep, why do extra lines show up in the output?

For example,
sudo service --status-all | grep -oE 'php[0-9]+.[0-9]+'
generates the following output.
[ ? ] hwclock.sh
[ ? ] networking
php7.0
php7.3
My goal is to extract the version of another software package and put it into a configuration script so that the script won't break if that package gets upgraded or downgraded. If my understanding of regular expressions and the piping operator is correct, the first two lines shouldn't even show up in the output.
Can anyone explain to me why this is happening?
Redirecting the output to stderr eliminated the unwanted lines. I also piped the output to tail -1 to get the last line only--the one with the latest version.
sudo service --status-all 2>/dev/null | grep -oE 'php[0-9]+.[0-9]+' | tail -1

Writing a Bash script that takes a text file as input and pipes the text file through several commands

I keep text files with definitions in a folder. I like to convert them to spoken word so I can listen to them. I already do this manually by running a few commands to insert some pre-processing codes into the text files and then convert the text to spoken word like so:
sed 's/\..*$/[[slnc 2000]]/' input.txt inserts a control code after first period
sed 's/$/[[slnc 2000]]/' input.txt" inserts a control code at end of each line
cat input.txt | say -v Alex -o input.aiff
Instead of having to retype these each time, I would like to create a Bash script that pipes the output of these commands to the final product. I want to call the script with the script name, followed by an input file argument for the text file. I want to preserve the original text file so that if I open it again, none of the control codes are actually inserted, as the only purpose of the control codes is to insert pauses in the audio file.
I've tried writing
#!/bin/bash
FILE=$1
sed 's/$/ [[slnc 2000]]/' FILE -o FILE
But I get hung up immediately as it says sed: -o: No such file or directory. Can anyone help out?
If you just want to use foo.txt to generate foo.aiff with control characters, you can do:
#!/bin/sh
for file; do
test "${file%.txt}" = "${file}" && continue
sed -e 's/\..*$/[[slnc 2000]]/' "$file" |
sed -e 's/$/[[slnc 2000]]/' |
say -v Alex -o "${file%.txt}".aiff
done
Call the script with your .txt files as arguments (eg, ./myscript *.txt) and it will generate the .aiff files. Be warned, if say overwrites files, then this will as well. You don't really need two sed invocations, and the sed that you're calling can be cleaned up, but I don't want to distract from the core issue here, so I'm leaving that as you have it.
This will:-
a} Make a list of your text files to process in the current directory, with find.
b} Apply your sed commands to each text file in the list, but only for the current use, allowing you to preserve them intact.
c} Call "say" with the edited files.
I don't have say, so I can't test that or the control codes; but as long as you have Ed, the loop works. I've used it many times. I learned it as a result of exposure to FORTH, which is a language that still permits unterminated loops. I used to have problems with remembering to invoke next at the end of the script in order to start it, but I got over that by defining my words (functions) first, in FORTH style, and then always placing my single-use commands at the end.
#!/bin/sh
next() {
[[ -s stack ]] && main
end
}
main() {
line=$(ed -s stack < edprint+.txt)
infile=$(cat "${line}" | sed 's/\..*$/[[slnc 2000]]/' | sed 's/$/[[slnc 2000]]/')
say "${infile}" -v Alex -o input.aiff
ed -s stack < edpop+.txt
next
}
end() {
rm -v ./stack
rm -v ./edprint+.txt
rm -v ./edpop+.txt
exit 0
}
find *.txt -type -f > stack
cat >> edprint+.txt << EOF
1
q
EOF
cat >> edpop+.txt << EOF
1d
wq
EOF
next

How to get the highest numbered link from curl result?

i have create small program consisting of a couple of shell scripts that work together, almost finished
and everything seems to work fine, except for one thing of which i'm not really sure how to do..
which i need, to be able to finish this project...
there seem to be many routes that can be taken, but i just can't get there...
i have some curl results with lots of unused data including different links, and between all data there is a bunch of similar links
i only need to get (into a variable) the link of the highest number (without the always same text)
the links are all similar, and have this structure:
always same text
always same text
always same text
i was thinking about something like;
content="$(curl -s "$url/$param")"
linksArray= get from $content all links that are in the href section of the links
that contain "always same text"
declare highestnumber;
for file in $linksArray
do
href=${1##*/}
fullname=${href%.html}
OIFS="$IFS"
IFS='_'
read -a nameparts <<< "${fullname}"
IFS="$OIFS"
if ${nameparts[1]} > $highestnumber;
then
highestnumber=${nameparts[1]}
fi
done
echo ${nameparts[1]}_${highestnumber}.html
result:
https://always/same/link/unique-name_19.html
this was just my guess, any working code that can be run from bash script is oke...
thanks...
update
i found this nice program, it is easily installed by:
# 64bit version
wget -O xidel/xidel_0.9-1_amd64.deb https://sourceforge.net/projects/videlibri/files/Xidel/Xidel%200.9/xidel_0.9-1_amd64.deb/download
apt-get -y install libopenssl
apt-get -y install libssl-dev
apt-get -y install libcrypto++9
dpkg -i xidel/xidel_0.9-1_amd64.deb
it looks awsome, but i'm not really sure how to tweak it to my needs.
based on that link and the below answer, i guess a possible solution would be..
use xidel, or use "$ sed -n 's/.href="([^"]).*/\1/p' file" as suggested in this link, but then tweak it to get the link with html tags like:
< a href="https://always/same/link/same-name_17.html">always same text< /a>
then filter out all that doesn't end with ( ">always same text< /a> )
and then use the grep sort as mentioned below.
Continuing from the comment, you can use grep, sort and tail to isolate the highest number of your list of similar links without too much trouble. For example, if you list of links is as you have described (I've saved them in a file dat/links.txt for the purpose of the example), you can easily isolate the highest number in a variable:
Example List
$ cat dat/links.txt
always same text
always same text
always same text
Parsing the Highest Numbered Link
$ myvar=$(grep -o 'https:.*[.]html' dat/links.txt | sort | tail -n1); \
echo "myvar : '$myvar'"
myvar : 'https://always/same/link/same-name_19.html'
(note: the command above is all one line separate by the line-continuation '\')
Applying Directly to Results of curl
Whether your list is in a file, or returned by curl -s, you can apply the same approach to isolate the highest number link in the returned list. You can use process substitution with the curl command alone, or you can pipe the results to grep. E.g. as noted in my original comment,
$ myvar=$(grep -o 'https:.*[.]html' < <(curl -s "$url/$param") | sort | tail -n1); \
echo "myvar : '$myvar'"
or pipe the result of curl to grep,
$ myvar=$(curl -s "$url/$param" | grep -o 'https:.*[.]html' | sort | tail -n1); \
echo "myvar : '$myvar'"
(same line continuation note.)
Why not use Xidel with xquery to sort the links and return the last?
xidel -q links.txt --xquery "(for $i in //#href order by $i return $i)[last()]" --input-format xml
The input-format parameter makes sure you don't need any html tags at the start and ending of your txt file.
If I'm not mistaken, in the latest Xidel the -q (quiet) param is replaced by -s (silent).

Prepend message to rsstail

I am trying to prepend a message to the output of rsstail, this is what I have right now:
rsstail -o -i 15 --initial 0 http://feeds.bbci.co.uk/news/world/europe/rss.xml | awk -v time=$( date +\[%H:%M:%S_%d/%m/%Y\] ) '{print time,$0}' | tee someFile.txt
which should give me the following:
[23:46:49_23/10/2014] Title: someTitle
After the command I have a | while read line do ... end which never gets called because the above command does not output a single thing. What am I doing wrong?
PS: I am using the python version of rsstail, since the other one kept on crashing (https://github.com/gvalkov/rsstail.py)
EDIT:
As requested in the comments the command:
rsstail -o -i 15 --initial 0 http://feeds.bbci.co.uk/news/world/europe/rss.xml
Will give back a message like the following when a new article is found
Title: Sweden calls off search for sub
It seems that my rsstail is different from yours, but mine supports the option
-Z x add heading 'x'
so that
rsstail -Z"$( date +\[%H:%M:%S_%d/%m/%Y\] ) " ...
does the job without awk; on the other hand, you do have some problem with buffering, is it possible to ask rsstail to stop after a given number of titles?

Connecting Two Bash Commands

I have Ubuntu Linux. I found one command will let me download unread message subjects from Gmail:
curl -u USERNAME:PASSWORD --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | sed -n "s/<title>\(.*\)<\/title.*name>\(.*\)<\/name>.*/\2 - \1/p"
...and then another command to let me send mail easily (once I installed the sendemail command via apt-get):
sendEmail -f EMAIL#DOMAIN.COM -v -t PHONE#SMS.COM -u Gmail Notifier -m test -s MAILSERVER:PORT -xu EMAIL#DOMAIN.COM -xp PASSWORD
(Note when in production I'll probably swap -v above with -q.)
So, if one command downloads one line subjects, how can I pipe these into the sendEmail command?
For instance, I tried using a pipe character between the two, where I used "$1" after the -m parameter, but what happened was that when I had no unread emails it would still send me at least one empty message.
If you help me with this, I'll use this information to share on StackOverflow how to build a Gmail Notifier that one can hook up to SMS messages on their phone.
I think if you mix viraptor & DigitalRoss' answers you get what you want. I created a sample test by creating a fake file with the following input:
File contents:
foo
bar
baz
Then I ran this command:
% cat ~/tmp/baz | while read x; do if [[ $x != "" ]]; then echo "x: '$x'"; fi; done
This will only print lines with input out. I'm not familiar with sendEmail; does it need the body to be on stdin or can you pass it on the cmdline?
You do know you can do that directly in gmail by using a filter and your SMS email gateway, right?
But back to the question...
You can get control in a shell script for command output with the following design pattern:
command1 | while read a b c restofline; do
: execute commands here
: command2
done
Read puts the first word in a, the second in b, and the rest of the line in restofline. If the loop consists of only a single command, the xargs program will probably just do what you want. Read in particular about the -I parameter which allows you to place the substituted argument anywhere in the command.
Sometimes the loop looks like ... | while read x; do, which puts the entire line into x.
Try this structure:
while read line
do
sendemailcommand ... -m $line ...
done < <(curlcommand)
I'd look at the xargs command, which provides all the features you need (as far as I can tell).
http://unixhelp.ed.ac.uk/CGI/man-cgi?xargs
Maybe something like this:
curl_command > some_file
if [[ `wc -l some_file` != "0 some_file" ]] ; then
email_command < some_file
fi

Resources