How to sort the strings as what I want? - sorting

I'm following this demo
Here is strings..
a
Aaa
áa
Aa
A
á
I want to order
a
A
Aa
Aaa
á
áa
But the ICU demo works for
a
A
á
Aa
áa
Aaa
How do I apply it to correct sort as what I want?

You can use .localeCompare() in Javascript to implement the compare function in .sort(), there you can pass in any locale of your choice.
let stringToBeSorted = "a Aaa áa Aa A á";
let sortedString = stringToBeSorted.split(" ").sort((obj1, obj2) => obj1.localeCompare(obj2)).join(" ");
console.log(sortedString);

Related

golang sort.Strings vs unix sort, how do I sort according to a given collation?

I was puzzled for a while because the two methods of sorting behave differently, or at least in my environment. After a while, I realized it's because of different sorting rules.
A quick experiment illustrates my point:
$ foo() { echo "o'4" "o'neil" o-ciclo-music ó-do-forró o-1 o-2 o-3 "o'" "o'1" "o'2" aaa b bb; }
$ foo | s2n | LC_ALL=C sort > /tmp/s.c
$ foo | s2n | LC_ALL="en_US.UTF-8" sort > /tmp/s.u
$ paste /tmp/s.{c,u} | awk '{ printf "%16s %s\n", $1 , $2 ; }'
aaa aaa
b b
bb bb
o' o'
o'1 o'1
o'2 o-1
o'4 o'2
o'neil o-2
o-1 o-3
o-2 o'4
o-3 o-ciclo-music
o-ciclo-music ó-do-forró
ó-do-forró o'neil
The left column is sorted according to ascii-rules while the right column is sorted according to utf8 rules. The golang sort.Strings() behaves according to ascii rules. That function is based on the < and = operators and they in turn operates on integers. So the words are being sorted according to the ascii ordinal values of their letters.
It it possible to influence how sort.Strings() behaves based on some environment variable? I have tried with the LC_ALL variables, but the 'a' < 'b' behavior in Golang is not influenced by that. This is probably for the better.
I am wondering which package provides a function with the same signature (and return type)
as strings.Compare(), but will heed some arbitrary collation rule-set (or at least unicode).
I want a function that indicates that o'neil should come after ó-do-forró.
foobar.SetCollation(unicode)
foobar.Compare(`o'neil`, `ó-do-forró`) -> 1
What package am I looking for?
import "golang.org/x/text/collate"
(...)
cull := collate.New(language.English,
collate.IgnoreCase, collate.IgnoreDiacritics, collate.IgnoreWidth,
collate.Loose,
collate.Force,
collate.Numeric)
println(cull.CompareString( "o'neil", "ó-do-forró", )) // prints 1
println(cull.CompareString( "b", "a", )) // prints 1

replace multiple words in string with specific words from list

How can I, using M-language, replace specific words in a string with other specific words that are specified in a table?
See my example data:
Source code:
let
someTable = Table.FromColumns({{"aa &bb &cc dd","&ee ff &gg hh &ii"}, {Table.FromColumns({{"&bb","&cc"}, {"ReplacementForbb", "ccReplacement"}},{"StringToFind", "ReplaceWith"}), Table.FromColumns({{"&ee", "&gg","&ii"}, {"OtherReplacementForee", "SomeReplacementForgg", "Replacingii"}},{"StringToFind", "ReplaceWith"})}, {"aa ReplacementForbb ccReplacement dd","OtherReplacementForee ff SomeReplacementForgg hh Replacingii"}},{"OriginalString", "Replacements", "WantedResult"})
in
someTable
This is a neat question. You can do this with some table and list M functions as a custom column like this:
= Text.Combine(
List.ReplaceMatchingItems(
Text.Split([OriginalString], " "),
List.Transform(Table.ToList([Replacements]),
each Text.Split(_,",")
)
),
" ")
I'll walk through how this works using the first row as an example.
The [OriginalString] is "aa &bb &cc dd" and we use Text.Split to convert it to a list.
"aa &bb &cc dd" --Text.Split--> {"aa", "&bb", "&cc", "dd"}
Now we need to work on the [Replacements] table and convert it into a list of lists. It starts out:
StringToFind ReplaceWith
------------------------------
&bb ReplacementForbb
&bb ccReplacement
Using Table.ToList this becomes a two element list (since the table had two rows).
{"&bb,ReplacementForbb","&cc,ccReplacement"}
Using Text.Split on the comma, we can transform each element into a list to get
{{"&bb","ReplacementForbb"},{"&cc","ccReplacement"}}
which is the form we need for the List.ReplaceMatchingItems function.
List.ReplaceMatchingItems(
{"aa", "&bb", "&cc", "dd"},
{{"&bb","ReplacementForbb"},{"&cc","ccReplacement"}}
)
This does the replacement and returns the list
{"aa","ReplacementForbb","ccReplacement","dd"}
Finally, we use Text.Combine to concatenate the list above into a single string.
"aa ReplacementForbb ccReplacement dd"

How to replace a character by regex in ruby

How to replace a letter 'b' to 'c' after a duplicate letter 'a' base on 2 times?
for example :
ab => ab
aab => aac
aaab => aaab
aaaab => aaaac
aaaabaaabaab => aaacaabaac
You should check groups of aa followed by b and then replace captured groups accordingly.
Regex: (?<!a)((?:a{2})+)b
Explanation:
(?<!a) checks for presence of an odd numbered a. If present whole match fails.
((?:a{2})+)b captures an even number of a followed by b. Outer group is captured and numbered as \1.
Replacement: \1c i.e first captured group followed by c.
Test String:
ab
aab
aaab
aaaab
aaaabaaabaab
After replacement:
ab
aac
aaab
aaaac
aaaacaaabaac
Regex101 Demo

How to select words that are made up of the same letter using regex?

I have a dictionary text file that contains some words that I don't want.
Example:
aa
aaa
aaaa
bb
b
bbb
etc
I want to use a regular expression to select these words and remove them. However,
what I have seems to be getting too long and there must be a more efficient approach.
Here is my code so far:
/^a{1,6}$|^b{1,6}$|^c{1,6}$|^d{1,6}$|^e{1,6}$|^f{1,6}$|^g{1,6}$|^[i]{2,3}$/
It seems that I have to do this for every letter. How could I do this more succinctly?
It's a lot easier to collapse the word down to unique letters and remove all of those with just one letter in them:
words = "aa aaa aaaa bb b bbb etc aab abcabc"
words.split(/\s+/).select do |word|
word.chars.uniq.length > 1
end
# => ["etc", "aab", "abcabc"]
This splits your string into words, then selects only those words that have more than one type of character in them (.chars.uniq)
^([a-z])\1?\1?\1?\1?\1?$
Match any single letter, followed by 5 optional backreferences to the initial letter.
This might work too:
^([a-z])\1{,5}$
Try this
\b([a-zA-Z])\1*\b
if you want (in addition to letters) to include also repeated digits or underscores, use this code:
\b([\w])\1*\b
Update:
To exclude I from being removed:
(?i)ii+|\b((?i)[a-hj-z])\1*\b
(?i) is added above to make letters not case sensitive.
Demo:
https://regex101.com/r/gFUWE8/7
You can try with this regex:
\b([a-z])\1{0,}\b
and replace by empty
Ruby code sample:
re = /\b([a-z])\1{0,}\b/m
str = 'aa aaa aaaa bb b bbb abc aa a pqaaa '
result = str.gsub(re,'')
puts result
Run the code here

What are the available variable assignments and expand methods that can be used in Makefile files?

I am trying to parse some Makefile files to read some configs them them and I encountered a wide range of expressions like:
AAA := Some, text
BBB_NAME := #AAA# (c)
CCC = value
DDD = Some other $(CCC) xxx
I would like to know if all of these are valid and what if there is any difference between them (so I can properly parse them).
They are all valid, as you can tell by putting them in a Makefile and running it. If you want to know what values they actually take, you can try
$(info $(AAA))
(Note that the only real problem is with the (c) in BBB_NAME, it can cause problems if you pass it into other functions.)
The one tricky part is the difference between = and := (and other assignment operators). Full details are in the manual, but basically := evaluates the right-hand side at once, while = holds off until the left-hand side is evaluated somewhere. Consider
CCC = value
DDD := Some other $(CCC) xxx
EEE = Some other $(CCC) xxx
The value of DDD is now Some other value xxx, while the value of EEE is Some other $(CCC) xxx. If you use them somewhere:
$(info $(DDD))
$(info $(EEE))
Make expands $(DDD) and $(EEE) to the same thing and you see
Some other value xxx
Some other value xxx
But there are differences:
CCC = value
DDD := Some other $(CCC) xxx
EEE = Some other $(CCC) xxx
DDD := $(DDD) and yyy # This is perfectly legal.
EEE := $(EEE) and yyy # Infinite recursion. Make will not allow this.
CCC = dimension
$(info $(DDD)) # Produces "Some other value xxx and yyy"
$(info $(EEE)) # Produces "Some other dimension xxx"

Resources