CharInSet is much slower than IN, should I fix W1050 warning hint? - performance

I use IN a lot in my project and I have lots of these warnings:
[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in
set expressions. Consider using CharInSet function in SysUtils unit.
I made a quick test and using CharInSet instead of IN is from 65%-100% slower:
if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then
vs
if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then
Here is code for 2 tests, one works with loop through shorter strings, one loops once through a large string:
Adding 2 buttons on form I tested this for short string:
procedure TForm1.Button1Click(Sender: TObject);
var s1: string;
t1, t2: TStopWatch;
a, i, cnt, vMaxLoop: Integer;
begin
s1 := '[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in set expressions. Consider using CharInSet function in SysUtils unit.';
vMaxLoop := 10000000;
cnt := 0;
t1 := TStopWatch.Create;
t1.Start;
for a := 1 to vMaxLoop do
for i := 1 to Length(s1) do
if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then
inc(cnt);
t1.Stop;
cnt := 0;
t2 := TStopWatch.Create;
t2.Start;
for a := 1 to vMaxLoop do
for i := 1 to Length(s1) do
if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then
inc(cnt);
t2.Stop;
Button1.Caption := inttostr(t1.ElapsedMilliseconds) + ' - ' + inttostr(t2.ElapsedMilliseconds);
end;
And this for 1 long string:
procedure TForm1.Button2Click(Sender: TObject);
var s1: string;
t1, t2: TStopWatch;
a, i, cnt, vMaxLoop: Integer;
begin
s1 := '[DCC Warning] Unit1.pas(40): W1050 WideChar reduced to byte char in set expressions. Consider using CharInSet function in SysUtils unit.';
s1 := DupeString(s1, 1000000);
s1 := s1 + s1 + s1 + s1; // DupeString is limited, use this to create longer string
cnt := 0;
t1 := TStopWatch.Create;
t1.Start;
for i := 1 to Length(s1) do
if s1[i] in ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] then
inc(cnt);
t1.Stop;
cnt := 0;
t2 := TStopWatch.Create;
t2.Start;
for i := 1 to Length(s1) do
if CharInSet(s1[i], ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']) then
inc(cnt);
t2.Stop;
Button2.Caption := inttostr(t1.ElapsedMilliseconds) + ' - ' + inttostr(t2.ElapsedMilliseconds);
end;
Why do they recommend slower option, or how can I fix this warning without penalty in performance?

The warning is telling you that your code may be defective. Because sets can only be based on types with ordinality of 256 or less, the base type is truncated to that size. Now, Char is an alias for WideChar and has ordinality 65536. So the warning is there to tell you that your program may not behave as you expect. For instance, one might ask what this expression evaluates to:
['A', chr(256)] = ['A']
One might expect it to evaluate false, but in fact it evaluates true. So I think you should certainly take heed of the compiler when it issues this warning.
Now, it so happens that your set, which can and should be written more concisely as ['A'..'Z'], is made up entirely of ASCII characters. And it happens (thanks to commentors Andreas and ventiseis) that in that case the compiler generates correct code for such a set, regardless of the ordinal value of the character to the left of the in operator. So
if s1[i] in ['A'..'Z'] then
will result in correct code, in spite of the warning. And the compiler is able to detect that the set's elements are contiguous and generate efficient code.
Note that this does depend on the set being a literal and so the optimisation can be performed by the compiler. And that is why it can perform so much better than CharInSet. Because CharInSet is a function, and the Delphi optimiser has limited power, CharInSet is not able to take advantage of the contiguous nature of this specific set literal.
The warning is annoying though, and do you really want to rely on remembering the very specific details of when this warning can safely be ignored. Another way to implement the test, and sidestep this warning is to use inequality operators:
if (c >= 'A') and (c <= 'Z') then
....
You'd probably wrap this in an inlined function to make the code even easier to read.
function IsUpperCaseEnglishLetter(c: Char): Boolean; inline;
begin
Result := (c >= 'A') and (c <= 'Z');
end;
You should also ask yourself whether or not this code is a performance bottleneck. You should time your real program rather than such an artificial program. I'll bet that this code isn't a bottleneck and if so you should not treat performance as the key driver.

Related

Nesting depth and check of valid or invalid values

I have a script for counting the paranthese depth of a text. My function counts the depth and checks for open parantheses, and is supposed to return the following tupel: (depth, valid, balanced).
The depth is controlling how many valid parantheses the text contains.
Valid checks if there are to many or any closing paranthese missing their counterpart, of if its a negative value.
Balanced check controlls if the value is 0 or not.
s = '((This teXt)((is)(five deep((, valid))and))balanced)\t'
te = ''.join(s).lower()
par = ''
for ch in te:
if ch not in (' ', '\n', ',', '.', '-', '–', '—', '*',
'«', '»', ':', ';', '’', '?', "'", '"',
'/', '!', '…', '´', '`', '+', '[', ']',
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9','a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z', 'æ', 'ø', 'å', '\t'):
par += ch
def max_dep():
count = 0
max_num = 0
for i in par:
if i == '(':
count += 1
if max_num < count:
max_num = count
if i == ')':
count -= 1
val = 0
for t in par:
if t == '(':
val += 1
if t == ')':
val -= 1
if val == 0:
val = True
else:
val = False
bal = 0
for x in par:
if x == '(':
bal += 1
if x == ')':
bal -= 1
if bal == 0:
bal = True
else:
bal = False
return max_num, val, bal
print(max_dep())
Since 'val' = 0 and 'bal' = 0, I was hoping on the print (5, True, True), but as I hvae come to understand, 0 is never True. Is there any hope to get this function to print True for 0 or do I have to start over?
In short: The solution was to pull back the check/ statements from the for-loop.
'''s = '((This teXt)((is)(five deep((, valid))and))balanced)\t'
te = ''.join(s).lower()
par = ''
for ch in te:
if ch not in (' ', '\n', ',', '.', '-', '–', '—', '*',
'«', '»', ':', ';', '’', '?', "'", '"',
'/', '!', '…', '´', '`', '+', '[', ']',
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9','a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z', 'æ', 'ø', 'å', '\t'):
par += ch
def max_dep():
count = 0
max_num = 0
for i in par:
if i == '(':
count += 1
if max_num < count:
max_num = count
if i == ')':
count -= 1
val = 0
for t in par:
if t == '(':
val += 1
if t == ')':
val -= 1
if val < 0:
val = False
else:
val = True
bal = 0
for x in par:
if x == '(':
bal += 1
if x == ')':
bal -= 1
if bal == 0:
bal = True
else:
bal = False
return max_num, val, bal
print(max_dep())'''

Mistake where it shouldn't be

function generateRandomString($minlen = 7, $maxlen = 10, $randomCase = 0) {
$length = rand($minlen, $maxlen);
$symbols = array('A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J', 'K', 'L',
'M', 'N', 'O', 'P', 'R', 'S',
'T', 'U', 'V', 'X', 'Y', 'Z',
'1', '2', '3', '4', '5', '6',
'7', '8', '9', '0');
$string = '';
for ($i = 0; $i < $length; $i++) {
$index = rand(0, strlen($symbols) - 1);
$symbol = $symbols[$index];
if ($randomCase)
$symbol = (rand(0, 1)) ? strtolower($symbol) : $symbol;
$string .= $symbol;
}
return $string;
Where can there be an error in the code?

can someone please tell me why is "previousValue" here in the code is "array type"?

// logic to console duplicate in a new array
let myArray = ['a', 'b', 'a', 'b', 'c', 'e', 'e', 'c', 'd', 'd', 'd', 'd']
let myArrayWithNoDuplicates = myArray.reduce(function (previousValue, currentValue) {
if (previousValue.indexOf(currentValue) === -1) {
previousValue.push(currentValue)
}
return previousValue
}, [])
console.log(myArrayWithNoDuplicates)
previousValue is the previous value of your accumulator. Your accumulator is [] an array so previousValue has to be array

Arranging Tasks within a specified duration Algorithmic puzzle

I have an interesting DS problem, I have a fixed duration (1 hour) and a variable duration (1 to 2 hours) and I am given a certain number of tasks as detailed below. How would I find the correct multiple grouping of tasks such that they sum up to the specified durations.
Task examples: -
Task A - 25 min,
Task B - 20 min,
Task C - 25 min,
Task D - 35 min.
Task E - 25 min,
Task F - 30 min,
Task G - 30 min.
For the 1 hour duration, a sample answer would be : -
Task F + G
Task D + E
For the 1 to 2 hour duration, a sample answer would be : -
Task F + G
Task A + B + C
The proper algorithm would help me to identify all combindations.
Try this
from itertools import chain, combinations
tasks = {
"A": 25,
"B": 20,
"C": 25,
"D": 35,
"E": 25,
"F": 30,
"G": 30
}
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s) + 1))
def get_possible_tasks(fixed_duration):
for comb in powerset(tasks):
if sum([tasks[i] for i in comb]) == fixed_duration:
yield comb
def get_possible_tasks_inrange(min_duration, max_duration):
for comb in powerset(tasks):
if min_duration <= sum([tasks[i] for i in comb]) <= max_duration:
yield comb
print("TASKS completed in onehour")
for i in get_possible_tasks(60):
print(i)
print("TASKS In RANGE 60-120 minutes")
for i in get_possible_tasks_inrange(60,120):
print(i)
Output:
TASKS completed in onehour
('A', 'D')
('C', 'D')
('D', 'E')
('F', 'G')
TASKS In RANGE 60-120 minutes
('A', 'D')
('C', 'D')
('D', 'E')
('D', 'F')
('D', 'G')
('F', 'G')
('A', 'B', 'C')
('A', 'B', 'D')
('A', 'B', 'E')
('A', 'B', 'F')
('A', 'B', 'G')
('A', 'C', 'D')
('A', 'C', 'E')
('A', 'C', 'F')
('A', 'C', 'G')
('A', 'D', 'E')
('A', 'D', 'F')
('A', 'D', 'G')
('A', 'E', 'F')
('A', 'E', 'G')
('A', 'F', 'G')
('B', 'C', 'D')
('B', 'C', 'E')
('B', 'C', 'F')
('B', 'C', 'G')
('B', 'D', 'E')
('B', 'D', 'F')
('B', 'D', 'G')
('B', 'E', 'F')
('B', 'E', 'G')
('B', 'F', 'G')
('C', 'D', 'E')
('C', 'D', 'F')
('C', 'D', 'G')
('C', 'E', 'F')
('C', 'E', 'G')
('C', 'F', 'G')
('D', 'E', 'F')
('D', 'E', 'G')
('D', 'F', 'G')
('E', 'F', 'G')
('A', 'B', 'C', 'D')
('A', 'B', 'C', 'E')
('A', 'B', 'C', 'F')
('A', 'B', 'C', 'G')
('A', 'B', 'D', 'E')
('A', 'B', 'D', 'F')
('A', 'B', 'D', 'G')
('A', 'B', 'E', 'F')
('A', 'B', 'E', 'G')
('A', 'B', 'F', 'G')
('A', 'C', 'D', 'E')
('A', 'C', 'D', 'F')
('A', 'C', 'D', 'G')
('A', 'C', 'E', 'F')
('A', 'C', 'E', 'G')
('A', 'C', 'F', 'G')
('A', 'D', 'E', 'F')
('A', 'D', 'E', 'G')
('A', 'D', 'F', 'G')
('A', 'E', 'F', 'G')
('B', 'C', 'D', 'E')
('B', 'C', 'D', 'F')
('B', 'C', 'D', 'G')
('B', 'C', 'E', 'F')
('B', 'C', 'E', 'G')
('B', 'C', 'F', 'G')
('B', 'D', 'E', 'F')
('B', 'D', 'E', 'G')
('B', 'D', 'F', 'G')
('B', 'E', 'F', 'G')
('C', 'D', 'E', 'F')
('C', 'D', 'E', 'G')
('C', 'D', 'F', 'G')
('C', 'E', 'F', 'G')
('D', 'E', 'F', 'G')
static void showCombinations(int flag,List<String> task,int sum,int start, List<Integer> arr, int n, int r, int eAdded, List<Integer> data, List<Integer> endList) {
int k = start + 1;
if (eAdded == r) {
int temp=0;
for(int i=0;i<data.size();i++){
temp+=data.get(i);
}
if(temp==sum&&flag==0){
task.forEach((it)->{
System.out.print(it);
});
System.out.println();
}
else if(temp>=sum && temp<=120&&flag==1){
task.forEach((it)->{
System.out.print(it);
});
System.out.println();
}
} else {
for (int i = start; i <= endList.get(eAdded); i++) {
data.add(arr.get(i));
task.add(Character.toString((char) (i+ 65)));
showCombinations(flag,task,sum,k, arr, n, r, eAdded + 1, data, endList);
task.remove(task.size() - 1);
data.remove(data.size() - 1);
k++;
}
}
}
public static void main(String args[]){
List<Integer> resultList = new ArrayList<>();
ArrayList<ArrayList<Integer>> values = new ArrayList<>();
List<Integer> data = new ArrayList<>();
resultList.add(25); // For each element in resultList the element is
resultList.add(20); // taken by a particular task
resultList.add(25); //
resultList.add(35); //
resultList.add(25); //
resultList.add(30); //
resultList.add(30); //
int n=resultList.size();
List<Integer> end = new ArrayList<>();// List that contains end value for each starting index
List<String> task=new ArrayList<>();
System.out.println("These completes within an hour: ");
for(int k=1;k<=resultList.size();k++){
for (int i = 0; i <= k; i++) {
end.add(n - (k) + i);
}
showCombinations(0,task, 60, 0, resultList, resultList.size(), k, 0, data, end);
data.clear();
end.clear();
task.clear();
}
System.out.println("These completes within 1 to 2 hour(both Inclusive) : ");
for(int k=1;k<=resultList.size();k++){
for (int i = 0; i <= k; i++) {
end.add(n - (k) + i);
}
showCombinations(1,task, 60, 0, resultList, resultList.size(), k, 0, data, end);
data.clear();
end.clear();
task.clear();
}
}
Output:
These completes within an hour:
AD
CD
DE
FG
These completes within 1 to 2 hour(both Inclusive) :
AD
CD
DE
DF
DG
FG
ABC
ABD
ABE
ABF
ABG
ACD
ACE
ACF
ACG
ADE
ADF
ADG
AEF
AEG
AFG
BCD
BCE
BCF
BCG
BDE
BDF
BDG
BEF
BEG
BFG
CDE
CDF
CDG
CEF
CEG
CFG
DEF
DEG
DFG
EFG
ABCD
ABCE
ABCF
ABCG
ABDE
ABDF
ABDG
ABEF
ABEG
ABFG
ACDE
ACDF
ACDG
ACEF
ACEG
ACFG
ADEF
ADEG
ADFG
AEFG
BCDE
BCDF
BCDG
BCEF
BCEG
BCFG
BDEF
BDEG
BDFG
BEFG
CDEF
CDEG
CDFG
CEFG
DEFG

Algorithm for generating huge wordlist

Alright, I know this is going to sound bad, like I'm going to use this for un-ethical things, but you have my word that I am not.
I am writing a paper for my Computer and Information Security course and the topic I chose was hashing methods. One of the points that I go over in my paper is MD5 being only one-way and the only way to crack an MD5 hash is to continuously make strings and use an MD5 function, then compare it with the hash you want to crack.
I would like to build a really simple mock-up program to show alongside my paper (we do a presentation and this would be an awesome thing to have), so I wanted to work out an algorithm that makes a string with every possible character combination up to 8 characters. For example the output will be:
a, b, c, ..., aa, ab, ac, ... ba, bb, bc etc etc etc.
It need to include letters, numbers and symbols if possible.
I got partly through the algorithm for this, but unfortunately my programming skills are not up to the task. If anyone can provide a complete algorithm for this I'd be extremely thankful.
Again, if you think I'm a liar and I'm going to use this for hacking purposes you don't have to leave an answer.
Thank you. :)
In Python, itertools.product does almost all you require -- though it does it for just one "number of repeats", so you'll have to iterate from 1 to 8 (not hard;-). In essence:
import itertools
import string
# whatever you wish as alphabet (lower/upper, digits, punct, &c)
myalphabet = string.ascii_lowercase + string.ascii_digits
def prods(maxlen, alphabet=myalphabet):
for i in range(1, maxlen+1):
for s in itertools.product(alphabet, repeat=i):
yield ''.join(s)
Of course, for an alphabet of length N and K repetitions (8 in your case) this does produce N + N^2 + ... + N^K possibilities (2,901,713,047,668 possibilities for N=36 and K=8), but, what's a few trillion outputs among friends!-)
To implement this i would probably encode integers to base 36 (or more if you wanted symbols).
1 = 1
2 = 2
...
a = 10
b = 12
..
and so on.
then you would have a number, like 38 and do some divisions, ie:
38/36 = 1 remaider 2 = 12 in base 36
then just run a for loop to your max number you want to encode, something very large and output your encoded numbers.
just for fun i wrote this for you: http://pastebin.antiyes.com/index.php?id=327
It is not true that "the only way to crack an MD5 hash" is to generate every possible string and look for collisions. In fact, if you have access to the original it is possible to modify it so that its MD5 matches that of another file you can create. This is described in a paper at infosec.edu.
Even if you cannot modify the original file, rainbow tables of MD5 checksums exist which can be used to generate collisions.
These facts make MD5 unsuitable for passwords or cryptography, and in fact the U.S. government has forbidden its continued use for secure applications.
If you already have access to the hashed version of the password, then MD5 is broken to begin with. That said, when it comes to breaking a hashed value, you'd likely be better off using Rainbow Tables, Dictionary Attacks, and Social Engineering over your brute force method. That said, since you asked for an algorithm to generate all the values, maybe the following will be beneficial (C#):
using System;
using System.Text;
namespace PossibiltyIterator
{
class Program
{
static readonly char[] Symbols = {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q',
'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '!', '#', '#', '$', '%', '^', '&',
'*', '(', ')', '-', '_', '+', '=', '/', '\\', '[', ']', '{', '}', ';', ':', '\'', '"',
',', '.', '<', '>', '?', '`', '~'
};
const int MaxLength = 8;
static void BuildWord(int currentLength, int desiredLength, char[] word)
{
if (currentLength == desiredLength)
{
Console.WriteLine(word);
}
else
{
for (int value = 0; value < Symbols.Length; ++value)
{
word[currentLength] = Symbols[value];
BuildWord(currentLength + 1, desiredLength, word);
}
}
}
static void Main(String[] args)
{
double totalValues = (Math.Pow(Symbols.Length, MaxLength + 1) - Symbols.Length)/(Symbols.Length - 1);
Console.WriteLine("Warning! You are about to print: {0} values", totalValues);
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true /* intercept */);
for (int desiredLength = 1; desiredLength <= MaxLength; ++desiredLength)
{
BuildWord(0 /* currentLength */, desiredLength, new char[MaxLength]);
}
}
}
}
To be completely honest, this can be optimized further. Because it builds all the "words" of length 1, then does that work a second time in building the words of length 2. It would be smarter to build the words of length MaxLength, then truncate one letter to build a word of MaxLength-1.
Here is the optimized version... note that it does NOT return the words in the order originally requested.
using System;
using System.Text;
namespace PossibiltyIterator
{
class Program
{
static readonly char[] Symbols = {
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q',
'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
'I', 'J', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '!', '#', '#', '$', '%', '^', '&',
'*', '(', ')', '-', '_', '+', '=', '/', '\\', '[', ']', '{', '}', ';', ':', '\'', '"',
',', '.', '<', '>', '?', '`', '~'
};
const int MaxLength = 8;
static void BuildWord(int currentLength, int desiredLength, char[] word)
{
if (currentLength != desiredLength)
{
for (int value = 0; value < Symbols.Length; ++value)
{
word[currentLength] = Symbols[value];
BuildWord(currentLength + 1, desiredLength, word);
}
word[currentLength] = '\0';
}
Console.WriteLine(word);
}
static void Main(String[] args)
{
double totalValues = (Math.Pow(Symbols.Length, MaxLength + 1) - Symbols.Length)/(Symbols.Length - 1);
char[] word = new char[MaxLength];
Console.WriteLine("Warning! You are about to print: {0} values", totalValues);
Console.WriteLine("Press any key to continue...");
Console.ReadKey(true /* intercept */);
BuildWord(0 /* currentLength */, MaxLength, new char[MaxLength]);
}
}
}
To complete the post with a Java example which will print out the Base64 encoded MD5's of all possible character combinations using only 0-9 and a-z characters:
MessageDigest digest = MessageDigest.getInstance("MD5");
int i = 0;
while (true)
{
String raw = Integer.toString(i, Character.MAX_RADIX);
byte[] md5 = digest.digest(raw.getBytes());
String base64 = new BigInteger(1, md5).toString(16);
System.out.println(raw + " = " + base64);
i++;
}

Resources