Getting the first paragraph of Wikipedia, and storing it into a text file

Getting the first paragraph of Wikipedia, and storing it into a text file - bash

I wanted to make a system in which we give something to be search onto the terminal of a Raspberry Pi and the Pi gives a voice output.
I've solved the text-to-speech conversion problem using pico TTS. Now what I wanted to do is go to the Wikipedia page of the term to be searched, and store the first paragraph of the page to a text file.
For example, the result for input Tiger in Simple English should make a text file containing -
The tiger (Panthera tigris) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.
I tried using this but it didn't seem to work.
Error message for
$ pip install wikipedia
...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build-qdTIZY/wikipedia/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-9CPD6D-record/install-record.txt --single-version-externally-managed --compile
failed with error code 1 in /tmp/pip-build-qdTIZY/wikipedia
Storing debug log for failure in /home/pi/.pip/pip.log

this seems to work:
title=Tiger
n_sentences=2
curl -s http://simple.wikipedia.org/w/api.php?action=query&prop=extracts&titles="$title"&exsentences="$n_sentences"&explaintext=&format=json |
sed 's/.*"extract":"\|"}}}}$//g'
it correctly yields:
The tiger (Panthera tigris) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae.
Also tested with title=Albert_Einstein:
Albert Einstein (14 March 1879 \u2013 18 April 1955) was a German-born theoretical physicist who developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics).\nHe received the Nobel Prize in Physics in 1921, but not for relativity.
(Note that title="Albert Einstein", title=albert_einstein, and title=albert%20einstein all don't work, so you'll eventually want another command to find the best matching real simple.wikipedia article title.)
the curl command makes an http request to simple.wikipedia.org. to see this in action, try this:
curl http://simple.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Tiger&exsentences=2&explaintext=&format=json
the sed command then extracts the desired part of the response.
updated to increase chance of working with raspberry's curl & sed: changed https to http and rewrote sed command without -e.
ref:
MediaWiki API?

Related

How to generate flamegraphs from macOS process samples?

Anyone have a clean process for converting samples on macOS to FlameGraphs?
After a bit of fiddling I thought I could perhaps use a tool such as flamegraph-sample, but it seems to give me some trouble and so I thought perhaps there may be other more up-to-date options that I'm missing insomuch that this tool gives an error:
$ sudo sample PID -file ~/tmp/sample.txt -fullPaths 1
Sampling process 198 for 1 second with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Sample analysis of process 35264 written to file ~/tmp/sample.txt
$ python stackcollapse-sample.py ~/tmp/sample.txt > ~/tmp/sample_collapsed.txt
$ flamegraph.pl ~/tmp/sample_collapsed.txt > ~/tmp/sample_collapsed_flamegraph.svg
Ignored 2335 lines with invalid format
ERROR: No stack counts found

lcov + gcov-9 performance regression because of json usage

I have updated my build environment compiler from gcc 5.5.0 to gcc 9.3.0 and noticed coverage calculation performance regression.
It became 10 times slower (5 hours vs 48 hours for whole project).
My investigation shows that in gcov-9 they started to use json format instead of intermediate text format.
This slowed down intermediate gcov-files creation and parsing.
Minimal example below:
> time geninfo --gcov-tool gcov-5 test5/CPrimitiveLayerTest.cpp.gcno
Found gcov version: 5.5.0
Using intermediate gcov format
Processing test5/CPrimitiveLayerTest.cpp.gcno
Finished .info-file creation
real 0m0.351s
user 0m0.298s
sys 0m0.047s
> time geninfo --gcov-tool gcov-9 test9/CPrimitiveLayerTest.cpp.gcno
Found gcov version: 9.3.0
Using intermediate gcov format
Processing test9/CPrimitiveLayerTest.cpp.gcno
Finished .info-file creation
real 0m8.024s
user 0m7.929s
sys 0m0.084s
I didn't find the way to return to old format but maybe there are any workarounds or patches.
P.S. I know about gcov's argument --json-format, but lcov1.15 can process either json format or so-called intermediate text format. At the same time gcov9 can output either json format or so-called logfile format files

Further investigation shows that this is because of lcov 1.15 uses JSON:PP module for json parsing.
Replacing of JSON:PP to JSON:XS (fast parser) gives required speedup.
So I use next commands to reach it:
# patch geninfo to use fast json parser
> sudo sed -i 's/use JSON::PP/use JSON::XS/g' /usr/local/bin/geninfo
# install perl module
> sudo cpan install JSON:XS

FreeBSD script to show active connections and append number remote file

I am using NetScaler FreeBSD, which recognizes many of the UNIX like commands, grep, awk, crontab… etc.
I run the following command to get the number of connected users that we have on the system
#> nsconmsg -g aaa_cur_ica_conn -d stats
OUTPUT (numbered lines):
Line1: Displaying current counter value information
Line2: NetScaler V20 Performance Data
Line3: NetScaler NS11.1: Build 63.9.nc, Date: Oct 11 2019, 06:17:35
Line4:
Line5: reltime:mili second between two records Sun Jun 28 23:12:15 2020
Line6: Index reltime counter-value symbol-name&device-no
Line7: 1 2675410 605 aaa_cur_ica_conn
…
…
From above output - I only need the number of connected users (represented in Line 7, 3rd column (605 to be precise), along with the Hostname and Time (of the running script)
Now, to extract this important 3rd column number i.e. 605, along with the hostname, and time of data collected - I wrote the following script:
printf '%s - %s - %s\n' "$(hostname)" "$(date '+%H:%M')" "$(nsconmsg -g aaa_cur_ica_conn -d stats | grep aaa_cur_ica_conn | awk '{print $3}')"
The result is perfect, showing hostname, time, and the number of connected users as follows:
Hostname - 09:00 – 605
Now can anyone please shed light on how I can:
Run this script every day - 5am to 5pm (12hours)?
Each time scripts runs - append a file on a remote Unix share with the output?
I appreciate this might be a bit if a challenge... however would be grateful for any bash scripts wizards out there that can create magic!
Thanks in advance!

I would suggest a quick look into the FreeBSD Handbook or For People New to Both FreeBSD and UNIX® so that you could get familiar with the operating system and tools that could help you achieve better what you want.
For example, there is a utility/command named cron
The software utility cron is a time-based job scheduler in Unix-like computer operating systems.
For example, to run something all days between 5am to 5pm every minute, you could use something like:
* 05-17 * * * command
Try more options here: https://crontab.guru/#*_05-17_*_*_*.
There are more tools for scheduling commands, for example at (https://en.wikipedia.org/wiki/At_(command)) but this something you need to evaluate and read more about it.
Now regarding the command, you are using to get the "number of connected users", you could avoid the grep and just used awk for example:
awk '/aaa_cur_ica_conn/ {print $3}'
This will print only column 3 if line contains aaa_cur_ica_conn, but as before I invite you to read more about the topic so that you could bet a better overview and better understand the commands.
Last but not least, check this link How do I ask a good question? the better you could format, and elaborate your question the easy for others to give an answer.

Create PCL by hand?

Is it more or less feasible to create PCL files "artisanally", by hand? I have done it for PostScript and found it not particularly difficult, though it takes a lot of time and effort to create even a simple drawing. Now I am faced with an OKI C823 that is connected to an Ubuntu PC, it prints ok but does not understand PostScript - which might explain why it was so inexpensive... (for such a big printer)
I did find the below sample in the "PCL XL Feature Reference" but when I fed it to the printer, the text just printed as text instead of drawing the intended line.
eInch Measure
600 600 UnitsPerMeasure
BeginSession // attribute: basic measure for the session is inches
// attribute: 600 units in both X and Y direction
// operator: begin the imaging session
ePortraitOrientation Orientation
eLetterPaper MediaSize
BeginPage // attribute: page orientation is portrait
// attribute: size of media for page is letter
// operator: begin the page description
1200 800 Point
SetCursor // attribute: point a which to set the current cursor
// operator: set the cursor
2400 800 EndPoint
LinePath // attribute: endpoint of a 2 inch line
// operator: add the line to the current path
PaintPath // operator: paint the current path
EndPage // operator: end the page description
EndSession // operator: end the imaging session

Edit
You can convert ps to pcl with ghostscript
sudo apt-get install ghostscript
gs -o ~/test.pcl -sDEVICE=pxlcolor -f ~/test.ps
or
gs -o ~/test.pcl -sDEVICE=pxlmono -f ~/test.ps
If you need to go backward for some reason--convert pcl to ps--then see the more complicated instructions below
You can convert from pcl6 to ps using GhostPDL from Ghostscript. It's a separate product from Ghostscript, and afaik the only way to install it is to build it from source.
Build It
I'm using ubuntu 18 LTS.
some prereqs I needed, your system might already have them
sudo apt-get install autoconf
sudo apt-get install g++
sudo apt-get install make
download the source, untar, and build
wget https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs950/ghostpdl-9.50.tar.gz
tar xvf ghostpdl-9.50.tar.gz
cd ghostpdl-9.50
sh ./autogen.sh
make
The binaries are in the bin folder
cd ./bin
Sample Usage
I copied a test.ps file from wikipedia that prints "Hello World" in courier.
Convert ps to pcl, convert the pcl back to pdf
./gs -o ~/test.pcl -sDEVICE=pxlcolor -f ~/test.ps
./gpcl6 -o ~/test.pdf -sDEVICE=pdfwrite ~/test.pcl
And everything worked as expected.

OS X Yosemite - Adding a Printer - UI Vs lpadmin

My problem is that when I add a printer using the Printers and Scanners UI printing works, when I add the same printer using lpadmin it doesn't.
To Add it through the UI I did the following:
From Printers and Scanners I selected the IP tab.
Address: 10.20.30.40, Protocol HP Jetdirect - Socket, Queue left blank, Name: TEST_01, Location "Top Floor", Use -> Select software -> HP LaserJet P3010 Series
After doing this, the Printer works as expected.
This is a (segment from a) script containing my lpadmin command that doesn't work
SUBNET=socket://10.20.30.
TEST_01=40
PPD_DIR=/Library/Printers/PPDs/Contents/Resources
TEST_01_PPD="hp LaserJet P3010 Series.gz"
lpadmin -E -p TEST_01 -v $SUBNET$TEST_01 -P "$PPD_DIR/$TEST_01_PPD" -D "TEST_01" -L "Top Floor"
The printer appears correctly in the UI but shows as paused.
I did find a message in system.log that may or may not be relevant - I was using Notes to test the printer:
Notes[502]: Failed to connect (_delegate) outlet from (com_hp_psdriver_19_11_0_PDEView) to (com_hp_psdriver_19_11_0_PDEAccountingController): missing setter or instance variable
Notes[2198]: Printing failed because PMSessionEndDocumentNoDialog() returned -30871.
The reason I want to use a script is that there are 20 printers to add on each of 30 new Macs. The actual script uses a series of arrays with lpadmin in a for loop. Everything I have read says it should work. What am I missing?

I think -E specified before the printer name enables encryption, whereas specified after it Enables the printer - effectively "unpausing" it. Madness- I know!
Mad Apple Documentation - see second sentence
I think you want:
lpadmin -p TEST_01 -v $SUBNET$TEST_01 -P "$PPD_DIR/$TEST_01_PPD" -D "TEST_01" -L "Top Floor" -E

I don't have a direct answer, but I can suggest an alternate approach: set up all 20 printers by hand on one computer, then copy the /etc/cups directory from that one to the other 29.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Getting the first paragraph of Wikipedia, and storing it into a text file - bash

Related

How to generate flamegraphs from macOS process samples?

lcov + gcov-9 performance regression because of json usage

FreeBSD script to show active connections and append number remote file

Create PCL by hand?

OS X Yosemite - Adding a Printer - UI Vs lpadmin

Categories

Resources