Case-insensitive text list sorting - elisp

There is a list of "спсВыбора":
("ФайлыКаталоги" "Клиент Проверка Существования Каталога" "Клиент Проверка Существование Файла" "СтандартныеСтруктурыМодуля" "стндОбрОтв" "элКлючаОтветУспехОбрОтв" "элКлючаОтветОшибкаОбрОтв" "элКлючаОтветПроцедура" "элКлючаОтветМодуль" "стндОтчОтв")
To sort, use the command:
(setq спсВыбора (sort спсВыбора (lambda (a b) (string> a b))))
As a result, the list of "спсВыбора":
("элКлючаОтветУспехОбрОтв" "элКлючаОтветПроцедура" "элКлючаОтветОшибкаОбрОтв" "элКлючаОтветМодуль" "стндОтчОтв" "стндОбрОтв" "ФайлыКаталоги" "СтандартныеСтруктурыМодуля" "Клиент Проверка Существования Каталога" "Клиент Проверка Существование Файла")
Sorting takes into account the separate order of lower and upper case letters. Tell me how to sort the list by removing the case order. Example:
"caB" => "aBc"

Use string-collate-lessp as the predicate:
string-collate-lessp is a built-in function in ‘src/fns.c’.
(string-collate-lessp S1 S2 &optional LOCALE IGNORE-CASE)
Return t if first arg string is less than second in collation order.
Symbols are also allowed; their print names are used instead.
This function obeys the conventions for collation order in your
locale settings. For example, punctuation and whitespace characters
might be considered less significant for sorting:
(sort '("11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp)
=> ("11" "1 1" "1.1" "12" "1 2" "1.2")
The optional argument LOCALE, a string, overrides the setting of your
current locale identifier for collation. The value is system
dependent; a LOCALE "en_US.UTF-8" is applicable on POSIX systems,
while it would be, e.g., "enu_USA.1252" on MS-Windows systems.
If IGNORE-CASE is non-nil, characters are converted to lower-case
before comparing them.
To emulate Unicode-compliant collation on MS-Windows systems,
bind ‘w32-collate-ignore-punctuation’ to a non-nil value, since
the codeset part of the locale cannot be "UTF-8" on MS-Windows.
If your system does not support a locale environment, this function
behaves like ‘string-lessp’.

As you noticed, string< is case-sensitive. My suggestion in your case to obtain case-insensitive sorting would be to upcase/downcase operands of that comparator, so that it is effectively case-insensitive:
(setq спсВыбора (sort спсВыбора (lambda (a b) (string> (downcase a) (downcase b)))))
Note that that is reverse-alphabetical per your example.

Related

What does the ' represent in '+ in racket?

I have been banging around on google and drRacket trying to understand what the apostrophe ' before a procedure means in racket and how I could remove it. What I'm trying to do is take a + from inside a list i.e. '(+ 1 2). However, every time I do something like (first x) (where x is the list in the example) I receive '+ instead of just + (notice the apostrophe). How can I remove the apostrophe and what is its purpose?
The ' apostrophe, pronounced quote, mean that the stuff inside will be interpreted as data for an s-expression, not evaluated as code.
'x, 'hello, and '+ are all symbols, which are like strings with unique-identity properties. They contain only text, not "meaning", so the '+ symbol does not contain a reference to the + function.
If you use parentheses under a ' quote, it will create a list with all the elements also ' quoted. In other words, '(x y z) is equivalent to (list 'x 'y 'z). You can think of this as quote "distributing itself" over all the elements inside it.
In your case, '(+ 1 2) is equivalent to (list '+ '1 '2), which is the same as (list '+ 1 2) because numbers are already literal data.
In most cases, the best way to get rid of a ' quote is to not add one on the outside in the first place. Instead of '(+ 1 2) you could use list: (list + 1 2), or the more advanced forms ` quasiquote and , unquote: `(,+ 1 2). In either of these cases, the + never gets put under a quote in the first place. It never becomes a symbol like '+. The + outside of any quote has meaning as the addition function.
In other cases, you can't avoid having the symbol '+ because it comes from intrinsically textual data. In this case you can assign meaning to it with an interpreter. Somewhere in that interpreter you might want code like this
(match sym ['+ +] ['- -] ['* *] ['/ /] [_ (error "unrecognized symbol")])
Something is needed to assign meaning externally, because the symbol '+ does not have that meaning internally. You can either define the interpreter yourself or use an existing one such as eval, as long as all the meanings in the interpreter correspond exactly to what you intend.

Use variable padding in iteration directive in FORMAT

Is there a way to do something like the following?
(format t "~{~va~}" '("aa" "bb" "cc") 4)
I need to iterate through a list. Each element of that list should be padded with a variable number of spaces (specified at runtime, so I cannot use "~4a").
Or more generally, is there a way to refer to a specific argument in the argument list of FORMAT?
By nesting format function, you can do what you want.
(format t (format nil "~~{~~~Aa~~}" 4) '("aa" "bb" "cc"))
;; returns: aa bb cc
Here the inner format directive:
The nil as first argument, format returns a string.
(format nil "~~{~~~Aa~~}" 4)
;; returns: "~{~4a~}" - and this is exactly what you want to give
;; to the outer `format` as second argument!
You can of course write a function for this:
(defun format-by-padding-over (lst padding)
(format t (format nil "~~{~~~Aa~~}" padding) lst))
And then:
(format-by-padding-over '("aa" "bb" "cc") 4)
;; aa bb cc
;; NIL
I learned this trick here from #Sylwester (many thanks!).
You could also interleave the list with repetitions of the padding:
(format t "~{~va~}"
(mapcan (lambda (element)
(list 4 element))
list))
You can build the format control string using nested format functions, but then you have to take care about escaping tildes. When working with regular expressions (using CL-PPCRE), one can define regular expressions using trees, like (:alternation #\\ #\*), which helps preventing bugs and headaches related to escaping special characters. The same can be done with format strings, using format-string-builder, available in Quicklisp:
(lambda (v)
(make-format-string `((:map () (:str ,v)))))
Returns a closure, which can be used to build format strings:
(funcall * 10)
=> "~{~10a~}"

How to extract the last character of a string of unknown length?

I am writing a function that takes stringA and stringB as parameters and compares the first character of stringB with the last character of StringA. If they are equal, then the function returns true, else false is returned.
I have nearly the whole function ready, however I can't find a way to take the last character of stringA because its length is unknown. I checked the documentation and I found nothing. Any suggestions?
(cond
[(string=? (substring stringA ???) (substring stringB 0 2))"True"]
[else "False"])
You can get the last character position of a string using string-length (or rather one less than):
(string-ref str (sub1 (string-length str)))
Note that a character is different from a string of length 1. Thus the correct way to extract a character is with string-ref or the like, rather than substring.
It seems Chris answered your question. Just a reminder, to use the string-ref, which returns a character, you should use the comparison function char=? (or equal?).
I'd like to add another solution which I find more elaborate, but requires to download a collection from the planet racket (after installing package collections). Using the collections package, you can use the same function with any collection rather then just strings, using the (last ..) and (first ..) functions of the module.
(require data/collection)
(let ([stringA "abcd"]
[stringB "dcba"])
(cond
[(equal? (last stringA)
(first stringB)) "True"]
[else "False"]))
You could also use the SRFI-13 function string-take-right, which returns the last n characters of the argument string as a string.
every language has a length function for a string. in Racket I found this :
https://docs.racket-lang.org/reference/strings.html#%28def.%28%28quote.~23~25kernel%29._string-length%29%29
there is this : string-length str
so just run that it will give you the length and then you can extract the last character

Emacs Lisp: getting ascii value of character

I'd like to translate a character in Emacs to its numeric ascii code, similar to casting char a = 'a'; int i = (int)a in c. I've tried string-to-number and a few other functions, but none seem to make Emacs read the char as a number in the end.
What's the easiest way to do this?
To get the ascii-number which represents the character --as Drew said-- put a question mark before the character and evaluate that expression
?a ==> 97
Number appears in minibuffer, with C-u it's written behind expression.
Also the inverse works
(insert 97) will insert an "a" in the buffer.
BTW In some cases the character should be quoted
?\" will eval to 34
A character is a whole number in Emacs Lisp. There is no separate character data type.
Function string-to-char is built-in, and does what you want. (string-to-char "foo") is equivalent to (aref "foo" 0), which is #abo-abo's answer --- but it is coded in C.
String is an array.
(aref "foo" 0)

RegExp Counting System

I'm trying to create a system where I can convert RegEx values to integers and vice versa. where zero would be the most basic regex ( probably "/./" ), and any subsequent numbers would be more complex regex's
My best approach so far was to stick all the possible values that could be contained within a regex into an array:
values = [ "!", ".", "\/", "[", "]", "(", ")", "a", "b", "-", "0", "9", .... ]
and then to take from that array as follows:
def get( integer )
if( integer.zero? )
return '';
end
integer = integer - 1;
if( integer < values.length )
return values[integer]
end
get(( integer / values.length ).floor) + get( integer % values.length);
end
sample_regex = /#{get( 100 )}/;
The biggest problem with this approach is that a invalid RegExp can easily be generated.
Is there an already established algorithm to achieve what I'm trying? if not, any suggestions?
Thanx
Steve
Since regular expressions can be formally defined by recursively applying a finite number of elements, this can be done: instead of simply concatenating elements, combine them according to the rules of regular expressions. Because the regular language is also recursively enumerable, this is guaranteed to work.
However, it's quite probably overkill to implement this. What do you need this for? Would a simple dictionary of Number -> RegExp key-value pairs not be better suited to associate regular expressions with unique numbers?
I would say that // is the simplest regex (it matches anything). /./ is fairly complex since it is just shorthand for /[^\n]/, which itself is just shorthand for a much longer expression (what that expression is depends on your character set). The next simplest expression would be /a/ where a is the first character in your character set. That last statement brings up an interesting problem for your enumeration: what character set will you use? Any enumeration will be tied to a given character set. Assuming you start with // as 0, /\x{00}/ (match the nul character) as 1, /\x{01}/ as 2, etc. Then you would start to get into interesting regexes (ones that match more than one string) around 129 if you used the ASCII set, but it would take up to 1114112 for UNICODE 5.0.
All in all, I would say a better solution is treat the number as a sequence of bytes, map those bytes into whatever character set you are using, use a regex compiler to determine if that number is a valid regex, and discard numbers that are not valid.

Resources