Highlight part of code block - python-sphinx

I have a very large code block in my .rst file, which I would like to highlight just a small portion of and make it bold. Consider the following rst:
wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.
wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.wall of text. wall of text. wall of text.
**Example 1: Explain showing a table scan operation**::
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "53.80" # This query costs 53.80 cost units
},
"table": {
"table_name": "Country",
"access_type": "ALL", # ALL is a table scan
"rows_examined_per_scan": 239, # Accessing all 239 rows in the table
"rows_produced_per_join": 11,
"filtered": "4.76",
"cost_info": {
"read_cost": "51.52",
"eval_cost": "2.28",
"prefix_cost": "53.80",
"data_read_per_join": "2K"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
],
"attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
}
}
}
When it converts to html, it syntax highlights by default (good), but I also want to specify a few lines that should be bold (the ones with comments on them, but possibly others too.)
I was thinking of adding a trailing character sequence on the line (.e.g. ###) and then writing a post-parser script to modify the html files generated. Is there a better way?

The code-block directive has an emphasize-lines option. The following highlights the lines with comments in your code.
**Example 1: Explain showing a table scan operation**
.. code-block:: python
:emphasize-lines: 7, 11, 12
EXPLAIN FORMAT=JSON
SELECT * FROM Country WHERE continent='Asia' and population > 5000000;
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "53.80" # This query costs 53.80 cost units
},
"table": {
"table_name": "Country",
"access_type": "ALL", # ALL is a table scan
"rows_examined_per_scan": 239, # Accessing all 239 rows in the table
"rows_produced_per_join": 11,
"filtered": "4.76",
"cost_info": {
"read_cost": "51.52",
"eval_cost": "2.28",
"prefix_cost": "53.80",
"data_read_per_join": "2K"
},
"used_columns": [
"Code",
"Name",
"Continent",
"Region",
"SurfaceArea",
"IndepYear",
"Population",
"LifeExpectancy",
"GNP",
"GNPOld",
"LocalName",
"GovernmentForm",
"HeadOfState",
"Capital",
"Code2"
],
"attached_condition": "((`world`.`Country`.`Continent` = 'Asia') and (`world`.`Country`.`Population` > 5000000))"
}
}
}

Related

TextMate Grammar - Problem with `end` expression

I'm having huge problems with the end portion of a regex in TextMate:
It looks like end becomes the part of the pattern that's returned between begin and end
Trying to apply multiple endings with one negative lookbehind proves unsuccessful
Here is an example code:
property_name: {
test1: [1, 50, 5000]
test2: something ;;
test3: [
1,
50,
5000
]
test4: "string"
test5: [
"text",
"text2"
]
test6: something2
test7: something3
}
I'm using the following code:
"begin": "\\b([a-z_]+):",
"beginCaptures": {
"1": {
"name" : "parameter.name"
}
}
"end": "(?<!,)\\n(?!\\])",
"patterns": [
{
"name": "parameter.value",
"match": "(.+)"
}
]
My logic for the end regular expression is to consider it ended if there's a new line but only if it's not preceded by a comma (list of values in an array) or followed by a closing square bracket (last value in an array).
Unfortunately it's not working as expected.
What I would like to achieve is that all property_name# and test are matched as parameter.name and the values are matched as parameter.value apart from ;;

Looping through URL array to parse html, does not loop

I am trying to extract products description, the first loop runs through each product and nested loop enters each product page and grabs description to extract.
for page in range(1, 2):
guitarPage =
requests.get('https://www.guitarguitar.co.uk/guitars/acoustic/page-
{}'.format(page)).text
soup = BeautifulSoup(guitarPage, 'lxml')
guitars = soup.find_all(class_='col-xs-6 col-sm-4 col-md-4 col-lg-3')
this is the loop for each product
for guitar in guitars:
title_text = guitar.h3.text.strip()
print('Guitar Name: ', title_text)
price = guitar.find(class_='price bold small').text.strip()
print('Guitar Price: ', price)
priceSave = guitar.find('span', {'class': 'price save'})
if priceSave is not None:
priceOf = priceSave.text
print(priceOf)
else:
print("No discount!")
image = guitar.img.get('src')
print('Guitar Image: ', image)
productLink = guitar.find('a').get('href')
linkProd = url + productLink
print('Link of product', linkProd)
here i am adding the links collected to an array
productsPage.append(linkProd)
here is my attempt at entering each product page and extracting the description
for products in productsPage:
response = requests.get(products)
soup = BeautifulSoup(response.content, "lxml")
productsDetails = soup.find("div", {"class":"description-preview"})
if productsDetails is not None:
description = productsDetails.text
# print('product detail: ', description)
else:
print('none')
time.sleep(0.2)
if None not in(title_text,price,image,linkProd, description):
products = {
'title': title_text,
'price': price,
'discount': priceOf,
'image': image,
'link': linkProd,
'description': description,
}
result.append(products)
with open('datas.json', 'w') as outfile:
json.dump(result, outfile, ensure_ascii=False, indent=4, separators=(',', ': '))
# print(result)
print('--------------------------')
time.sleep(0.5)
The outcome should be
{
"title": "Yamaha NTX700 Electro Classical Guitar (Pre-Owned) #HIM041005",
"price": "£399.00",
"discount": null,
"image": "https://images.guitarguitar.co.uk/cdn/large/150/PXP190415342158006-3115645f.jpg?h=190&w=120&mode=crop&bg=ffffff&quality=70&anchor=bottomcenter",
"link": "https://www.guitarguitar.co.uk/product/pxp190415342158006-3115645--yamaha-ntx700-electro-classical-guitar-pre-owned-him",
"description": "\nProduct Overview\nThe versatile, contemporary styled NTX line is designed with thinner bodies, narrower necks, 14th fret neck joints, and cutaway designs to provide greater comfort and playability f... read more\n"
},
but the description works for the first one and does not change later on.
[
{
"title": "Yamaha APX600FM Flame Maple Tobacco Sunburst",
"price": "£239.00",
"discount": "Save £160.00",
"image": "https://images.guitarguitar.co.uk/cdn/large/150/190315340677008f.jpg?h=190&w=120&mode=crop&bg=ffffff&quality=70&anchor=bottomcenter",
"link": "https://www.guitarguitar.co.uk/product/190315340677008--yamaha-apx600fm-flame-maple-tobacco-sunburst",
"description": "\nProduct Overview\nOne of the world's best-selling acoustic-electric guitars, the APX600 series introduces an upgraded version with a flame maple top. APX's thinline body combines incredible comfort,... read more\n"
},
{
"title": "Yamaha APX600FM Flame Maple Amber",
"price": "£239.00",
"discount": "Save £160.00",
"image": "https://images.guitarguitar.co.uk/cdn/large/150/190315340676008f.jpg?h=190&w=120&mode=crop&bg=ffffff&quality=70&anchor=bottomcenter",
"link": "https://www.guitarguitar.co.uk/product/190315340676008--yamaha-apx600fm-flame-maple-amber",
"description": "\nProduct Overview\nOne of the world's best-selling acoustic-electric guitars, the APX600 series introduces an upgraded version with a flame maple top. APX's thinline body combines incredible comfort,... read more\n"
},
{
"title": "Yamaha AC1R Acoustic Electric Concert Size Rosewood Back And Sides with SRT Pickup",
"price": "£399.00",
"discount": "Save £267.00",
"image": "https://images.guitarguitar.co.uk/cdn/large/105/11012414211132.jpg?h=190&w=120&mode=crop&bg=ffffff&quality=70&anchor=bottomcenter",
"link": "https://www.guitarguitar.co.uk/product/11012414211132--yamaha-ac1r-acoustic-electric-concert-size-rosewood-back-and-sid",
"description": "\nProduct Overview\nOne of the world's best-selling acoustic-electric guitars, the APX600 series introduces an upgraded version with a flame maple top. APX's thinline body combines incredible comfort,... read more\n"
}
]
this is the result I am getting, It changes all the time, sometimes it shows the previous description of the product
It does loop but it seems there are protective measures in place server side and the pages which fail change. The pages which did fail I checked and they had the searched for content. No single measure seems to suffice in my testing (I didn't try sleep over 2 but did try some IP and User-Agent changes with sleeps <=2.)
You could try alternating IPs and User-Agents, back off retries, changing time between requests.
Changing proxies: https://www.scrapehero.com/how-to-rotate-proxies-and-ip-addresses-using-python-3/
Changing User-Agent: https://pypi.org/project/fake-useragent/

Ruby, Parsing a JSON response for an array of values

I'm new to ruby so please excuse any ignorance I may bear. I was wondering how to parse a JSON reponse for every value belonging to a specific key. The response is in the format,
[
{
"id": 10008,
"name": "vpop-fms-inventory-ws-client",
"msr": [
{
"key": "blocker_violations",
"val": 0,
"frmt_val": "0"
},
]
},
{
"id": 10422,
"name": "websample Maven Webapp",
"msr": [
{
"key": "blocker_violations",
"val": 0,
"frmt_val": "0"
}...
There's some other entries in the response, but for the sake of not having a huge block of code, I've shortened it.The code I've written is:
require 'uri'
require 'net/http'
require 'JSON'
url = URI({my url})
http = Net::HTTP.new(url.host, url.port)
request = Net::HTTP::Get.new(url)
request["cache-control"] = 'no-cache'
request["postman-token"] = '69430784-307c-ea1f-a488-a96cdc39e504'
response = http.request(request)
parsed = response.read_body
h = JSON.parse(parsed)
num = h["msr"].find {|h1| h1['key']=='blocker_violations'}['val']
I am essentially looking for the val for each blocker violation (the json reponse contains hundreds of entries, so im expecting hundreds of blocker values). I had hoped num would contain an array of all the 'val's. If you have any insight in this, it would be of great help!
EDIT! I'm getting a console output of
scheduler caught exception:
no implicit conversion of String into Integer
C:/dashing/test_board/jobs/issue_types.rb:20:in `[]'
C:/dashing/test_board/jobs/issue_types.rb:20:in `block (2 levels) in <top (requi
red)>'
C:/dashing/test_board/jobs/issue_types.rb:20:in `select'
I suspect that might have too much to do with the question, but some help is appreciated!
You need to do 2 things. Firstly, you're being returned an array and you're only interested in a subset of the elements. This is a common pattern that is solved by a filter, or select in Ruby. Secondly, the condition by which you wish to select these elements also depends on the values of another array, which you need to filter using a different technique. You could attempt it like this:
res = [
{
"id": 10008,
"name": "vpop-fms-inventory-ws-client",
"msr": [
{
"key": "blocker_violations",
"val": 123,
"frmt_val": "0"
}
]
},
{
"id": 10008,
"name": "vpop-fms-inventory-ws-client",
"msr": [
{
"key": "safe",
"val": 0,
"frmt_val": "0"
}
]
}
]
# define a lambda function that we will use later on to filter out the blocker violations
violation = -> (h) { h[:key] == 'blocker_violations' }
# Select only those objects who contain any msr with a key of blocker_violations
violations = res.select {|h1| h1[:msr].any? &violation }
# Which msr value should we take? Here I just take the first.
values = violations.map {|v| v[:msr].first[:val] }
The problem you may have with this code is that msr is an array. So theoretically, you could end up with 2 objects in msr, one that is a blocker violation and one that is not. You have to decide how you handle that. In my example, I include it if it has a single blocker violation through the use of any?. However, you may wish to only include them if all msr objects are blocker violations. You can do this via the all? method.
The second problem you then face is, which value to return? If there are multiple blocker violations in the msr object, which value do you choose? I just took the first one - but this might not work for you.
Depending on your requirements, my example might work or you might need to adapt it.
Also, if you've never come across the lambda syntax before, you can read more about it here

Terminfo: left arrow is mapped to backspace code (8) on default OSX terminal. Why is it so? And how to change it?

I am writing simple terminal so I am trying to understand some basic things about control symbols, escape sequences and how all that is related to terminfo and termios.
What I find awkward is that according to the input I have from tty and according to tput left key is mapped to Backspace code (ASCII code 8)
tput cub1 | od -tx1
0000000 08
0000001
while I would expect it to be \033[D because
$ tput cuf1 | od -tx1
0000000 1b 5b 43
0000003
which is \033[C code that is right key indeed according to various documents about terminal codes.
So the absence of symmetry here is quite confusing to me. Is there a specific reason behind it?
The second question: is there way to change it?
I have created example of raw terminal which demonstrates this behaviour: RawTerminal.
The cub1 capability is not left arrow, but instead a cursor movement capability. Referring to terminfo(5), you may find
cursor_left cub1 le move left one space
which is named similarly to the key marked "left-arrow":
key_left kcub1 kl left-arrow key
The cursor-movement capabilities do just that: move the cursor around on the screen. In some cases, the similarly-named cursor-movement and cursor-keys have the same string, just because (long ago) it was useful to have keys which one could setup to echo locally (rather than send to a host application).
In this particular case, the two are different because the main use of these terminal descriptions is for curses applications, which minimize the number of characters sent to the screen (as well as minimizing the time spent sending the characters). Making cub1 send an ASCII backspace is fewer characters than the escape sequence.
While there is no requirement, longstanding convention tells you that terminfo capabilities which begin with "k" are probably for the keyboard.
bash uses readline for reading keys and updating the line which you are
typing on. Checking its source-code, e.g., from bash-4.2 I am looking
at lib/readline/terminal.c, it has a table of the termcap strings that
it may use:
static const struct _tc_string tc_strings[] =
{
{ "#7", &_rl_term_at7 },
{ "DC", &_rl_term_DC },
{ "IC", &_rl_term_IC },
{ "ce", &_rl_term_clreol },
{ "cl", &_rl_term_clrpag },
{ "cr", &_rl_term_cr },
{ "dc", &_rl_term_dc },
{ "ei", &_rl_term_ei },
{ "ic", &_rl_term_ic },
{ "im", &_rl_term_im },
{ "kD", &_rl_term_kD }, /* delete */
{ "kH", &_rl_term_kH }, /* home down ?? */
{ "kI", &_rl_term_kI }, /* insert */
{ "kd", &_rl_term_kd },
{ "ke", &_rl_term_ke }, /* end keypad mode */
{ "kh", &_rl_term_kh }, /* home */
{ "kl", &_rl_term_kl },
{ "kr", &_rl_term_kr },
{ "ks", &_rl_term_ks }, /* start keypad mode */
{ "ku", &_rl_term_ku },
{ "le", &_rl_term_backspace },
{ "mm", &_rl_term_mm },
{ "mo", &_rl_term_mo },
{ "nd", &_rl_term_forward_char },
{ "pc", &_rl_term_pc },
{ "up", &_rl_term_up },
{ "vb", &_rl_visible_bell },
{ "vs", &_rl_term_vs },
{ "ve", &_rl_term_ve },
};
Using "infocmp -Cr xterm", I can see this:
xterm|xterm terminal emulator (X Window System):\
:am:bs:km:mi:msn:\
:co#80:it#8:li#24:\
:AL=\E[%dLC=\E[%dPL=\E[%dMO=\E[%dB:IC=\E[%d#:\
:K2=\EOE:LE=\E[%dD:RI=\E[%dC:SF=\E[%dS:SR=\E[%dT:\
:UP=\E[%dA:ae=\E(B:al=\E[L:as=\E(0:bl=^G:bt=\E[Z:cd=\E[J:\
:ce=\E[K:cl=\E[H\E[2J:cm=\E[%i%d;%dH:cr=^M:\
:cs=\E[%i%d;%dr:ct=\E[3gc=\E[Pl=\E[Mo=^J:ec=\E[%dX:\
:ei=\E[4l:ho=\E[H:im=\E[4h:is=\E[!p\E[?3;4l\E[4l\E>:\
:k1=\EOP:k2=\EOQ:k3=\EOR:k4=\EOS:k5=\E[15~:k6=\E[17~:\
:k7=\E[18~:k8=\E[19~:k9=\E[20~:kD=\E[3~:kI=\E[2~:kN=\E[6~:\
:kP=\E[5~:kb=\177:kd=\EOB:ke=\E[?1l\E>:kh=\EOH:kl=\EOD:\
:kr=\EOC:ks=\E[?1h\E=:ku=\EOA:le=^H:mb=\E[5m:md=\E[1m:\
:me=\E[0m:mh=\E[2m:mm=\E[?1034h:mo=\E[?1034l:mr=\E[7m:\
:nd=\E[C:rc=\E8:sc=\E7:se=\E[27m:sf=^J:so=\E[7m:sr=\EM:\
:st=\EH:ta=^I:te=\E[?1049l:ti=\E[?1049h:ue=\E[24m:up=\E[A:\
:us=\E[4m:vb=\E[?5h\E[?5l:ve=\E[?12l\E[?25h:vi=\E[?25l:\
:vs=\E[?12;25h:
or with "infocmp -Cr nsterm":
nsterm|Apple_Terminal|AppKit Terminal.app:\
:am:hs:mi:msno:\
:co#80:it#8:li#24:ws#50:\
:AL=\E[%dLC=\E[%dPL=\E[%dMO=\E[%dB:IC=\E[%d#:\
:K1=\EOq:K2=\EOr:K3=\EOs:K4=\EOp:K5=\EOn:LE=\E[%dD:\
:RI=\E[%dC:UP=\E[%dA:ae=^O:al=\E[L:as=^N:bl=^G:cd=\E[J:\
:ce=\E[K:cl=\E[H\E[J:cm=\E[%i%d;%dH:cr=^M:cs=\E[%i%d;%dr:\
:ct=\E[3gc=\E[Pl=\E[Mo=^Js=\E]2;\007:ei=\E[4l:\
:fs=^G:ho=\E[H:ic=\E[#:im=\E[4h:k1=\EOP:k2=\EOQ:k3=\EOR:\
:k4=\EOS:k5=\E[15~:k6=\E[17~:k7=\E[18~:k8=\E[19~:\
:k9=\E[20~:kD=\E[3~:kN=\E[6~:kP=\E[5~:kb=\177:kd=\EOB:\
:ke=\E[?1l\E>:kh=\EOH:kl=\EOD:kr=\EOC:ks=\E[?1h\E=:\
:ku=\EOA:le=^H:mb=\E[5m:md=\E[1m:me=\E[0m:mh=\E[2m:\
:mr=\E[7m:nd=\E[C:rc=\E8:\
:rs=\E>\E[?3l\E[?4l\E[?5l\E[?7h\E[?8h:sc=\E7:se=\E[m:\
:sf=^J:so=\E[7m:sr=\EM:st=\EH:ta=^I:te=\E[2J\E[?47l\E8:\
:ti=\E7\E[?47h:ts=\E]2;:ue=\E[m:up=\E[A:us=\E[4m:\
:vb=\E[?5h\E[?5l:ve=\E[?25h:vi=\E[?25l:
The ":le=^H:" part is what you are seeing.
From (ncurses) terminfo(5):
cursor_left cub1 le move left one space
if you made a termcap (for "xterm") setting that to \e[D, then bash should
echo back \e[D rather than ^H. But ncurses uses ^H to reduce the number of
characters from from 3 (\e[D) to 1 (^H)
Rather than modify the terminal description, you should modify your program,
e.g., to read the termcap strings and handle those.

JSON to CSV via FasterCSV

I'm new to Ruby and had a question. I'm trying to create a .rb file that converts JSON to CSV.
I came across some disparate sources that got me to make:
require "rubygems"
require 'fastercsv'
require 'json'
csv_string = FasterCSV.generate({}) do |csv|
JSON.parse(File.open("small.json").read).each do |hash|
csv << hash
end
end
puts csv_string
Now, it does in fact output text but they are all squashed together without spaces, commas etc. How do I make it more customised, clear for a CSV file so I can export that file?
The JSON would look like:
{
"results": [
{
"reportingId": "s",
"listingType": "Business",
"hasExposureProducts": false,
"name": "Medeco Medical Centre World Square",
"primaryAddress": {
"geoCodeGranularity": "PROPERTY",
"addressLine": "Shop 9.01 World Sq Shopng Cntr 644 George St",
"longitude": "151.206172",
"suburb": "Sydney",
"state": "NSW",
"postcode": "2000",
"latitude": "-33.876416",
"type": "VANITY"
},
"primaryContacts": [
{
"type": "PHONE",
"value": "(02) 9264 8500"
}
]
},xxx
}
The CSV to just have something like:
reportingId, s, listingType, Business, name, Medeco Medical...., addressLine, xxxxx, longitude, xxxx, latitude, xxxx, state, NSW, postcode, 2000, type, phone, value, (02) 92648544
Since your JSON structure is a mix of hashes and lists, and also has levels of different heights, it is not as trivial as the code you show. However (assuming your input files always look the same) it shouldn't be hard to write an appropriate converter. On the lowest level, you can transform a hash to CSV by
hash.to_a.flatten
E.g.
input = JSON.parse(File.open("small_file.json").read)
writer = FasterCSV.open("out.csv", "w")
writer << input["results"][0]["primaryAddress"].to_a.flatten
will give you
type,VANITY,latitude,-33.876416,postcode,2000,state,NSW,suburb,Sydney,longitude,151.206172,addressLine,Shop 9.01 World Sq Shopng Cntr 644 George St,geoCodeGranularity,PROPERTY
Hope that guides you the direction.
Btw, your JSON looks invalid. You should change the },xxx line to }].

Resources