How to specify ranges in YAML? - syntax

I can express
3rd page is the title page
in YAML
title: 3
What about the following?
Pages 10 to 15 contains chapter 1
One way is
chapter 1: [10, 11, 12, 13, 14, 15]
I would prefer a range here. Is there anything like that in YAML?
chapter 1: (10..15)
** Update **
The following would be my alternative if there is no such thing as range in YAML
chapter 1:
start page: 10
end page: 15

There is not direct way to specify ranges in YAML, but some YAML can store serialized objects, for example in Ruby:
...
normal range: !ruby/range 10..20
exclusive range: !ruby/range 11...20
negative range: !ruby/range -1..-5
...
Look here

Range is application specific. The following may be meaningful for some applications:
-1 .. Q
a .. Щ
23 .. -23.45
1 .. 12:01:14 (both are integers in YAML !)
But the ruby way is also unclear since it does not say whether the end values are included or not: 10 .. 15
(Are you only talking about ranges of integers ?)

Andrey is right - there is no such thing as a basic range. Ranges can be defined on top of totally ordered data types. YAML does not even know the concept of ordering so it makes no sense to talk about ranges in YAML. YAML only knows the concept of node types, the concept of equality, and some predefined kinds of links between nodes. By the way I don't know any other data serialization lange (JSON, XML, CSV, Hessian, Protocol Buffers...) that natively supports ranges.

Related

Is there a way to restart figure numbering in Sphinx?

I have a set of three related documents in Sphinx (4.2.0). The top-level table of contents looks like this:
.. toctree::
:maxdepth: 2
Requirements_Specification/index.rst
User_Guide/index.rst
Release_Report/index.rst
In conf.py, I have:
numfig = True
numfig_secnum_depth = 1
This numbers all the figures and tables consecutively across the three documents. For example, if there are 10 figures in each document, then:
the figures in the Requirements Specification doc are numbered 1 to 10
the figures in the User Guide doc are numbered 11 to 20
the figures in the Release Report doc are numbered 21 to 30.
Instead, I would like the numbering to start at 1 in each doc, so in this example each doc would have figures numbered 1 to 10. Is there some way of achieving this?
I've tried putting :numbered: in the toctree directive, but that treats each doc as a chapter (i.e. Requirements Specification is chapter 1, User Guide is chapter 2, Release Report is chapter 3). I need each doc to be able to stand alone. I build the docs both as one set of HTML and also as three separate PDFs.

Extracting data from text file in AMPL without adding indexes

I'm new to AMPL and I have data in a text file in matrix form from which I need to use certain values. However, I don't know how to use the matrices directly without having to manually add column and row indexes to them. Is there a way around this?
So the data I need to use looks something like this, with hundreds of rows and columns (and several more matrices like this), and I would like to use it as a parameter with index i for rows and j for columns.
t=1
0.0 40.95 40.36 38.14 44.87 29.7 26.85 28.61 29.73 39.15 41.49 32.37 33.13 59.63 38.72 42.34 40.59 33.77 44.69 38.14 33.45 47.27 38.93 56.43 44.74 35.38 58.27 31.57 55.76 35.83 51.01 59.29 39.11 30.91 58.24 52.83 42.65 32.25 41.13 41.88 46.94 30.72 46.69 55.5 45.15 42.28 47.86 54.6 42.25 48.57 32.83 37.52 58.18 46.27 43.98 33.43 39.41 34.0 57.23 32.98 33.4 47.8 40.36 53.84 51.66 47.76 30.95 50.34 ...
I'm not aware of an easy way to do this. The closest thing is probably the table format given in section 9.3 of the AMPL Book. This avoids needing to give indices for every term individually, but it still requires explicitly stating row and column indices.
AMPL doesn't seem to do a lot with position-based input formats, probably because it defaults to treating index sets as unordered so the concept of "first row" etc. isn't meaningful.
If you really wanted to do it within AMPL, you could probably put together a work-around along these lines:
declare a single-index param with length equal to the total size of your matrix (e.g. if your matrix is 10 x 100, this param has length 1000)
edit the beginning and end of your "matrix" data file to turn it into appropriate format for a single-index parameter indexed from 1 to n
then define your matrix something like this:
param m{i in 1..nrows,j in 1..ncols} := x[j+i*(ncols-1)];
(not tested, I won't promise that I have rows and columns the right way around there!)
But you're probably better off editing the input file into one of the standard AMPL matrix formats. AMPL isn't really designed for data wrangling - you can do it in a pinch but if you're doing this kind of thing repeatedly it may be less trouble to code it in a general-purpose language e.g. Python.

How do I properly parse the range information in a unified diff?

Basically, what I want to do is look at the range information of a unified diff and know exactly which lines of code I should pay attention to.
For instance, this:
## -1827,7 +1827,7 ##
This tells me that in total only 1 line has changed, because the diff shows 3 lines above and below the change (so 7 - 6 = 1), and it also points me to the line 1830 (i.e. 1827 + 3).
To be more pedantic, this particular range information actually tells me that at line 1830, a line was removed (-), and at line 1830 a line was added (+).
Or to make that more obvious consider this range information for another diff:
## -878,15 +878,13 ##
What this is telling me is that at line 881 (878 + 3) 9 lines were deleted (15 - 6), but at line 881 only 7 lines were added (13 - 6).
So the question is, using a regex or some other Ruby string method, how do I pull out the above information easily?
i.e. how do I easily pull out this info:
Both The line numbers (i.e. just the 1827 or 878), which I can then add + 3 to determine the actual inline number I care about. It has to be both because both lines may not always be identical.
The number of lines affected (aka the 7, 15 or 13 right after the , in the above examples)
While I do that, how do I make sure to track the operation (addition or deletion) for each of the operations.
I tried slicing the string and going directly for a character -- e.g. myString[3] which gives me -, but that's the only character it reliably works for because the line numbers can be 1, 10, 100, 1000, 10000, etc. So the only way is to just scan the string and then parse it.
Edit 1
To add some code to show what I have tried.
Assume I have the contents of a diff in a variable called #diff_lines:
#diff_lines.each do |diff_line|
if diff_line.start_with?("##")
del_line_num_start = diff_line.split(/## /).second.split.first.split(/-/).second.split(/,/).first.to_i + 3
num_deleted_lines = diff_line.split(/## /).second.split.first.split(/-/).second.split(/,/).second.to_i - 6
add_line_num_start = diff_line.split(/## /).second.split.second.split(/\+/).second.split(/,/).first.to_i + 3
num_added_lines = diff_line.split(/## /).second.split.second.split(/\+/).second.split(/,/).second.to_i - 6
As you can see, the above works....but it is quite horrendous to look at and is OBVIOUSLY not very DRY.
Ideally I would like to be able to achieve the same thing, but just cleaner.
The general idea is to write a regular expression that has capture groups in it ((...)) to pick apart that string into something useful. For example:
diff_line.match(/\A##\s+\-(\d+),(\d+)\s+\+(\d+),(\d+)\s+##/)
This yields a MatchData object on a successful match. You can then apply this to some variables like:
if (m = diff_line.match(...))
a_start, a_len, b_start, b_len = m[1..4].map(&:to_i)
end
Then you can do whatever computations you need to do with these numbers.
If you're ever having trouble visualizing what a regular expression does, try a tool like Rubular to better illustrate the internals.

Python3 Make tie-breaking lambda sort more pythonic?

As an exercise in python lambdas (just so I can learn how to use them more properly) I gave myself an assignment to sort some strings based on something other than their natural string order.
I scraped apache for version number strings and then came up with a lambda to sort them based on numbers I extracted with regexes. It works, but I think it can be better I just don't know how to improve it so it's more robust.
from lxml import html
import requests
import re
# Send GET request to page and parse it into a list of html links
jmeter_archive_url='https://archive.apache.org/dist/jmeter/binaries/'
jmeter_archive_get=requests.get(url=jmeter_archive_url)
page_tree=html.fromstring(jmeter_archive_get.text)
list_of_links=page_tree.xpath('//a[#href]/text()')
# Filter out all the non-md5s. There are a lot of links, and ultimately
# it's more data than needed for his exercise
jmeter_md5_list=list(filter(lambda x: x.endswith('.tgz.md5'), list_of_links))
# Here's where the 'magic' happens. We use two different regexes to rip the first
# and then the second number out of the string and turn them into integers. We
# then return them in the order we grabbed them, allowing us to tie break.
jmeter_md5_list.sort(key=lambda val: (int(re.search('(\d+)\.\d+', val).group(1)), int(re.search('\d+\.(\d+)', val).group(1))))
print(jmeter_md5_list)
This does have the desired effect, The output is:
['jakarta-jmeter-2.5.1.tgz.md5', 'apache-jmeter-2.6.tgz.md5', 'apache-jmeter-2.7.tgz.md5', 'apache-jmeter-2.8.tgz.md5', 'apache-jmeter-2.9.tgz.md5', 'apache-jmeter-2.10.tgz.md5', 'apache-jmeter-2.11.tgz.md5', 'apache-jmeter-2.12.tgz.md5', 'apache-jmeter-2.13.tgz.md5']
So we can see that the strings are sorted into an order that makes sense. Lowest version first and highest version last. Immediate problems that I see with my solution are two-fold.
First, we have to create two different regexes to get the numbers we want instead of just capturing groups 1 and 2. Mainly because I know there are no multiline lambdas, I don't know how to reuse a single regex object instead of creating a second.
Secondly, this only works as long as the version numbers are two numbers separated by a single period. The first element is 2.5.1, which is sorted into the correct place but the current method wouldn't know how to tie break for 2.5.2, or 2.5.3, or for any string with an arbitrary number of version points.
So it works, but there's got to be a better way to do it. How can I improve this?
This is not a full answer, but it will get you far along the road to one.
The return value of the key function can be a tuple, and tuples sort naturally. You want the output from the key function to be:
((2, 5, 1), 'jakarta-jmeter')
((2, 6), 'apache-jmeter')
etc.
Do note that this is a poor use case for a lambda regardless.
Originally, I came up with this:
jmeter_md5_list.sort(key=lambda val: list(map(int, re.compile('(\d+(?!$))').findall(val))))
However, based on Ignacio Vazquez-Abrams's answer, I made the following changes.
def sortable_key_from_string(value):
version_tuple = tuple(map(int, re.compile('(\d+(?!$))').findall(value)))
match = re.match('^(\D+)', value)
version_name = ''
if match:
version_name = match.group(1)
return (version_tuple, version_name)
and this:
jmeter_md5_list.sort(key = lambda val: sortable_key_from_string(val))

Ruby on Rails - generating bit.ly style identifiers

I'm trying to generate UUIDs with the same style as bit.ly urls like:
http://bit [dot] ly/aUekJP
or cloudapp ones:
http://cl [dot] ly/1hVU
which are even smaller
how can I do it?
I'm now using UUID gem for ruby but I'm not sure if it's possible to limitate the length and get something like this.
I am currently using this:
UUID.generate.split("-")[0] => b9386070
But I would like to have even smaller and knowing that it will be unique.
Any help would be pretty much appreciated :)
edit note: replaced dot letters with [dot] for workaround of banned short link
You are confusing two different things here. A UUID is a universally unique identifier. It has a very high probability of being unique even if millions of them were being created all over the world at the same time. It is generally displayed as a 36 digit string. You can not chop off the first 8 characters and expect it to be unique.
Bitly, tinyurl et-al store links and generate a short code to represent that link. They do not reconstruct the URL from the code they look it up in a data-store and return the corresponding URL. These are not UUIDS.
Without knowing your application it is hard to advise on what method you should use, however you could store whatever you are pointing at in a data-store with a numeric key and then rebase the key to base32 using the 10 digits and 22 lowercase letters, perhaps avoiding the obvious typo problems like 'o' 'i' 'l' etc
EDIT
On further investigation there is a Ruby base32 gem available that implements Douglas Crockford's Base 32 implementation
A 5 character Base32 string can represent over 33 million integers and a 6 digit string over a billion.
If you are working with numbers, you can use the built in ruby methods
6175601989.to_s(30)
=> "8e45ttj"
to go back
"8e45ttj".to_i(30)
=>6175601989
So you don't have to store anything, you can always decode an incoming short_code.
This works ok for proof of concept, but you aren't able to avoid ambiguous characters like: 1lji0o. If you are just looking to use the code to obfuscate database record IDs, this will work fine. In general, short codes are supposed to be easy to remember and transfer from one medium to another, like reading it on someone's presentation slide, or hearing it over the phone. If you need to avoid characters that are hard to read or hard to 'hear', you might need to switch to a process where you generate an acceptable code, and store it.
I found this to be short and reliable:
def create_uuid(prefix=nil)
time = (Time.now.to_f * 10_000_000).to_i
jitter = rand(10_000_000)
key = "#{jitter}#{time}".to_i.to_s(36)
[prefix, key].compact.join('_')
end
This spits out unique keys that look like this: '3qaishe3gpp07w2m'
Reduce the 'jitter' size to reduce the key size.
Caveat:
This is not guaranteed unique (use SecureRandom.uuid for that), but it is highly reliable:
10_000_000.times.map {create_uuid}.uniq.length == 10_000_000
The only way to guarantee uniqueness is to keep a global count and increment it for each use: 0000, 0001, etc.

Resources