Count number of different values in chosen (large) range in VBA? - algorithm

How can I count the number of different values (numbers and strings mixed) in a chosen (large) range in VBA?
I think about this in this way:
1. Read in data into one dimensional array.
2. Sort array (quick or merge sort) need to test which
3. Simply count number of different values if sorted array : if(a[i]<>a[i+1]) then counter=counter+1.
Is it the most efficient way to solve this problem?
Edit: I want to do it in Excel.

Here is a VBA Solution
You don't need an Array to get this done. You can also use a collection. Example
Sub Samples()
Dim scol As New Collection
With Sheets("Sheet1")
For i = 1 To 100 '<~~ Assuming the range is from A1 to A100
On Error Resume Next
scol.Add .Range("A" & i).Value, Chr(34) & _
.Range("A" & i).Value & Chr(34)
On Error GoTo 0
Next i
End With
Debug.Print scol.Count
'For Each itm In scol
' Debug.Print itm
'Next
End Sub
FOLLOWUP
Sub Samples()
Dim scol As New Collection
Dim MyAr As Variant
With Sheets("Sheet1")
'~~> Select your range in a column here
MyAr = .Range("A1:A10").Value
For i = 1 To UBound(MyAr)
On Error Resume Next
scol.Add MyAr(i, 1), Chr(34) & _
MyAr(i, 1) & Chr(34)
On Error GoTo 0
Next i
End With
Debug.Print scol.Count
'For Each itm In scol
' Debug.Print itm
'Next
End Sub

Instead of steps 2 and 3, perhaps you could use a Scripting.Dictionary and add each value to the dictionary. Any duplicate entries would cause a runtime error which you could either trap or ignore (resume next). Finally, you could then just return the dictionary's count which would give you the count of unique entries.
Here's a scrap of code I hurriedly threw together:
Function UniqueEntryCount(SourceRange As Range) As Long
Dim MyDataset As Variant
Dim dic As Scripting.Dictionary
Set dic = New Scripting.Dictionary
MyDataset = SourceRange
On Error Resume Next
Dim i As Long
For i = 1 To UBound(MyDataset, 1)
dic.Add MyDataset(i, 1), ""
Next i
On Error GoTo 0
UniqueEntryCount = dic.Count
Set dic = Nothing
End Function
I know that resume next can be considered a 'code smell', but the alternative could be to use the exists function of the dictionary to test whether the specified key already exists and then add the value if did not. I just have a feeling that when I did a similar thing in the past that it was faster to just ignore any errors raised for duplicate keys rather than using exists YMMY. For completeness, here's the other method using exists:
Function UniqueEntryCount(SourceRange As Range) As Long
Dim MyDataset As Variant
Dim dic As Scripting.Dictionary
Set dic = New Scripting.Dictionary
MyDataset = SourceRange
Dim i As Long
For i = 1 To UBound(MyDataset, 1)
if not dic.Exists(MyDataset(i,1)) then dic.Add MyDataset(i, 1), ""
Next i
UniqueEntryCount = dic.Count
Set dic = Nothing
End Function
Whilst the above code is simpler than your proposed method, it would be worth to test the performance of it against your solution.

Building on the idea presented by i_saw_drones, I strongly recommend the Scripting.Dictionary. However, this can be done without On Error Resume Next as shown below. Also, his example requires linking the Microsoft Scripting Runtime library. My example will demonstrate how to do this without needing to do any linking.
Also, since you're doing this in Excel, then you don't need to create the array in step 1 at all. The function below will accept a range of cells, which will be iterated through completely.
(i.e. UniqueCount = UniqueEntryCount(ActiveSheet.Cells) or UniqueCount = UniqueEntryCount(MySheet.Range("A1:D100"))
Function UniqueEntryCount(SourceRange As Range) As Long
Dim MyDataset As Variant
Dim MyRow As Variant
Dim MyCell As Variant
Dim dic As Object
Dim l1 As Long, l2 As Long
Set dic = CreateObject("Scripting.Dictionary")
MyDataset = SourceRange
For l1 = 1 To UBound(MyDataset)
' There is no function to get the UBound of the 2nd dimension
' of an array (that I'm aware of), so use this division to
' get this value. This does not work for >=3 dimensions!
For l2 = 1 To SourceRange.Count / UBound(MyDataset)
If Not dic.Exists(MyDataset(l1, l2)) Then
dic.Add MyDataset(l1, l2), MyDataset(l1, l2)
End If
Next l2
Next l1
UniqueEntryCount = dic.Count
Set dic = Nothing
End Function
It might also be important to note that the above will count a null string "" as a distinct value. If you do not want this to be the case, simply change the code to this:
For l1 = 1 To UBound(MyDataset)
For l2 = 1 To SourceRange.Count / UBound(MyDataset)
If Not dic.Exists(MyDataset(l1, l2)) And MyDataset(l1, l2) <> "" Then
dic.Add MyDataset(l1, l2), MyDataset(l1, l2)
End If
Next l2
Next l1

Sorry this is written in C#. This is how I would do it.
// first copy the array so you don't lose any data
List<value> copiedList = new List<value>(yourArray.ToList());
//for through your list so you test every value
for (int a = 0; a < copiedList.Count; a++)
{
// copy instances to a new list so you can count the values and do something with them
List<value> subList = new List<value>(copiedList.FindAll(v => v == copiedList[i]);
// do not do anything if there is only 1 value found
if(subList.Count > 1)
// You would want to leave 1 'duplicate' in
for (int i = 0; i < subList.Count - 1; i++)
// remove every instance from the array but one
copiedList.Remove(subList[i]);
}
int count = copiedList.Count; //this is your actual count
Have not tested it, please try.
You should wrap this inside a method so there is no messing around with the garbage. Otherwise you would lose the copy of the array only later. (return count)
EDIT: You need a list for this to work, use Array.ToList();

Related

How to check the key is exists in collection or not

I want to check collection variable contains the key or not in visual basic 6.0
Below is the collection variable I am having
pcolFields As Collection
and I want to check whether it contains the field Event_Code. I am doing this as below but it not worked for me.
If IsMissing(pcolFields("Event_Code")) = False Then
'Do Something
End If
Here is an example solution with try-catch:
Private Function IsMissing(col As Collection, field As String)
On Error GoTo IsMissingError
Dim val As Variant
val = col(field)
IsMissing = False
Exit Function
IsMissingError:
IsMissing = True
End Function
Use it like this:
Private Sub Form_Load()
Dim x As New Collection
x.Add "val1", "key1"
Dim testkey As String
testkey = "key2"
If IsMissing(x, testkey) Then
Debug.Print "Key is Missing"
Else
Debug.Print "Val is " + x(testkey)
End If
Exit Sub
End Sub
You could also try a to Implement or Subclass the Collection and add a "has" Function
Collections are not useful if you need to check for existence, but they're useful for iteration. However, collections are sets of Variants and so are inherently slower than typed variables.
In nearly every case it's more useful (and more optimal) to use a typed array. If you need to have a keyed collection you should use the Dictionary object.
Some examples of general ways of using typed arrays:
Dim my_array() As Long ' Or whichever type you need
Dim my_array_size As Long
Dim index As Long
Dim position As Long
' Add new item (push)
ReDim Preserve my_array(my_array_size)
my_array(my_array_size) = 123456 ' something to add
my_array_size = my_array_size + 1
' Remove item (pop)
my_array_size = my_array_size - 1
If my_array_size > 0 Then
ReDim Preserve my_array(my_array_size - 1)
Else
Erase my_array
End If
' Remove item (any position)
position = 3 'item to remove
For index = position To my_array_size - 2
my_array(index) = my_array(index + 1)
Next
my_array_size = my_array_size - 1
ReDim Preserve my_array(my_array_size - 1)
' Insert item (any position)
ReDim Preserve my_array(my_array_size)
my_array_size = my_array_size + 1
For index = my_array_size - 1 To position + 1 Step -1
my_array(index) = my_array(index - 1)
Next
my_array(position) = 123456 ' something to insert
' Find item
For index = 0 To my_array_size - 1
If my_array(index) = 123456 Then
Exit For
End If
Next
If index < my_array_size Then
'found, position is in index
Else
'not found
End If
Whilst it may seem like a lot code. It is way faster. Intellisense will also work, which is a bonus. The only caveat is if you have very large data sets, then redim starts to get slow and you have to use slightly different techniques.
You can also use a Dictionary, be sure to include the Microsoft Scripting Runtime reference in your project:
Dim dict As New Dictionary
Dim value As Long
dict.Add "somekey", 123456
dict.Remove "somekey"
value = dict.Item("somekey")
If dict.Exists("somekey") Then
' found!
Else
' not found
End If
Dictionaries like collections just hold a bunch of Variants, so can hold objects etc.
We can check following code into vb.net code
If Collection.ContainsKey(KeyString) Then
'write code
End if
Collection is variable of Dictionary and KeyString is a key string which we need to find into collection
The method from efkah will fail if the Collection contains objects rather than primitive types. Here is a small adjustment:
'Test if a key is available in a collection
Public Function HasKey(coll As Collection, strKey As String) As Boolean
On Error GoTo IsMissingError
Dim val As Variant
' val = coll(strKey)
HasKey = IsObject(coll(strKey))
HasKey = True
On Error GoTo 0
Exit Function
IsMissingError:
HasKey = False
On Error GoTo 0
End Function

vba loop performance too much time

I have 250.000 rows and I wanted to erase all rows that have a 0 in col AR. This takes too much time using a filter and deleting only visible cells, so I wrote a code. But still takes 1 minute for 1000 lines. So I will have to take 250 minutes!!! Besides after the first 6 minutes (6k lines) the number showed in AS3 (see code below) freezes, so I don't know if it's still running.
Is there a way to do this more efficiently (using less time)?
My code is:
Sub delrow()
Application.Calculation=xlCalculationManual
With Sheets("bners")
LR3 = Range("A" & Rows.Count).End(xlUp).Row
For i3 = 3 To LR3
range("AS2")=i3
a = Sheets("bners").Range("AR" & i3).Value
If a = 0 Then
Rows(i3).Delete
Else
End If
Next i3
End With
Application.calculate
End Sub
thanks!
Yes, definitely Step -1. But does that alone make it fast?
This batches the deletes 10 at a time (if needed now).
Option Explicit
Dim ws as Range
Sub delrow1()
Dim LR3&, i3&, a&
Set ws = Sheets("bners")
LR3 = ws.Range("A" & Rows.Count).End(xlUp).Row
For i3 = LR3 To 3 Step -1
a = ws.Cells(i3, "AR").Value
If a = 0 Then
Call delrow2(i3)
End If
Next i3
Call delrow2(0) ' flush
End Sub
Sub delrow2(delRow&) ' deletes 10 rows at a time
Static a1&(10), na1&
Dim i1&, zRange As Range
If delRow = 0 Then ' finish;end;flush
For i1 = 1 To na1
ws.Rows(a1(i1)).Delete
Next i1
na1 = 0
Else ' store row in array a1
na1 = na1 + 1
a1(na1) = delRow
If na1 = 10 Then ' del 10 rows
Set zRange = Union( _
Rows(a1(1)), Rows(a1(2)), Rows(a1(3)), Rows(a1(4)), Rows(a1(5)), _
Rows(a1(6)), Rows(a1(7)), Rows(a1(8)), Rows(a1(9)), Rows(a1(10)))
ws.Range(zRange).Rows.Delete
na1 = 0
End If
End If
I liked this method I found a couple weeks ago but didn't remember until last night http://goo.gl/NYtY9R that could easily be adapted for yours
Sub RowKiller()
Dim F As Range, rKill As Range
Set F = Range("A2:A250000")
Set rKill = Nothing
For Each r In F
v = r.Text
If InStr(1, v, "0") = 1 Then
If rKill Is Nothing Then
Set rKill = r
Else
Set rKill = Union(r, rKill)
End If
End If
Next r
If Not rKill Is Nothing Then
rKill.EntireRow.Delete
End If
End Sub
To me very efficient in that it builds up into the Union and then deletes all at once instead of deleting one at a time.
in the example , you with sheets() is totally useless, as you forgot every dot "." before the words cells or range or rows.
I'll try an other approach, by using two VBA arrays (not tested, and might memory overflow).
first array is original data before the macro.
second array is data after the macro
I won't delete rows, i just write my second array from the good lines of the 1rst array,
and then paste it over the sheet
Sub RowKill()
'Declaring Variables :
Dim MaxRows as long 'number of lines in the First Array
Dim NewRows as Long 'number of lines in the Second Array
Dim q as long 'simple loop counter
Dim i as long 'simple loop counter , for the purpose of copying line
Dim Rg As Range 'Range of the original Data (number of lines = MaxRows-2, because the Original example code starts at 3, not 1)
Dim Sh as Worksheet
Dim Array1() as variant 'First VBA Array
Dim Array2() as variant 'Second VBA Array
with Application
.enableevents=false
.screnupdating=false
.Calculation=xlCalculationManual
end with
set Sh=thisworkbook.Sheets("bners")
with Sh
MaxRows = .Range( .Rows.Count , 44).End(xlUp).Row ' note the .rows, and i read on cloumn 44 and not 1
Set Rg = .Range( .cells(3,44) , .cells ( MaxRows,44) ) '44 is the column of .range("AR")
'The Range Rg is important , later we delete the whole thing ^^
Redim Array1 ( 1 to MaxRows, 1 to 44) 'Only if "AR" is your last column
Array1 = Rg.value2 'if you work with dates or time format in your cells, please replace by : Array1 = Rg.value
for q= 3 to MaxRows
if Array1 (q , 44) <> 0 Then 'wasn't sure, because empty cells will trigger too, in wich case: <>"" would be better, or: If not IsEmpty( Array1 (q,44)) .....
call CopyRowToSecondArray ( q , NewRows , Array2)
End If
next q
End With 'Sh
'Rg.delete 'old version
With Sh
.range ( .cells(1,1) , .cells (44 , NewRows).Value2 = Array2 ' again use .value, if you have date or time formating inside the data cells
if NewRows<MaxRows then .range ( .cells(1,NewRows+1) , .cells (44 , MaxRows).Value2 = ""
End with
with Application
.enableevents= True
.screnupdating= True
.Calculation=xlCalculationAutomatic
end with
Set Rg = Nothing
Ser Sh = Nothing
Erase Array1, Array 2
End Sub
Sub CopyRowToSecondArray ( byval q as long , byref NewRows as long , byref Array2 as variant)
Dim i as long
NewRows=NewRows+1
Redim Preserve Array2 (1 to NewRows, 1 to 44)
for i = 1 to 44 'this entire for i loop, might be faster with unknown vba array function (i'm new), please share with me
Array2 ( NewRows , i) = Array1 ( q , i )
next i
end sub
Maybe there is a better way to simply copy a whole line from one array to an other, i don't know...
The code is untested, and , I assumed 44 is the last column (change only in loops and Rg if needed), so copy your work before testing my code.
Hope this helps, and is faster.

Excel copy/sort data while counting/removing duplicates

Ok so I've searched and searched and can't quite find what I'm looking for.
I have a workbook and what I'm basically trying to do is take the entries from certain ranges (Sheet1 - E4:E12, E14:E20, I4:I7, I9:I12, I14:I17, & I19:I21) and put them in a separate list on Sheet2. I then want the new list on Sheet2 to be sorted by how many times an entry appeared on Sheet1 as well as display the amount.
example http://demonik.doomdns.com/images/excel.png
Obviously as can be seen by the ranges I listed above, this sample is much smaller lol, was just having trouble trying to figure out how to describe everything and figured an image would help.
Basically I am trying to use VBA (the update would be initialized by hitting a button) to copy data from Sheet1 and put all the ranges into one list in Sheet2 that is sorted by how many times it appeared on Sheet1, and then alphabetically.
If a better discription is needed just comment and let me know, I've always been horrible at trying to describe stuff like this lol.
Thanks in advance!
Another detail: I cant have it search for specific things as the data in the ranges on Sheet1 may change. Everything must be dynamic.
I started out with this data
and used the following code to read it into an array, sort the array, and count the duplicate values, then output the result to sheet2
Sub Example()
Dim vCell As Range
Dim vRng() As Variant
Dim i As Integer
ReDim vRng(0 To 0) As Variant
Sheets("Sheet2").Cells.Delete
Sheets("Sheet1").Select
For Each vCell In ActiveSheet.UsedRange
If vCell.Value <> "" Then
ReDim Preserve vRng(0 To i) As Variant
vRng(i) = vCell.Value
i = i + 1
End If
Next
vRng = CountDuplicates(vRng)
Sheets("Sheet2").Select
Range(Cells(1, 1), Cells(UBound(vRng), UBound(vRng, 2))) = vRng
Rows(1).Insert
Range("A1:B1") = Array("Entry", "Times Entered")
ActiveSheet.UsedRange.Sort Range("B1"), xlDescending
End Sub
Function CountDuplicates(List() As Variant) As Variant()
Dim CurVal As String
Dim NxtVal As String
Dim DupCnt As Integer
Dim Result() As Variant
Dim i As Integer
Dim x As Integer
ReDim Result(1 To 2, 0 To 0) As Variant
List = SortAZ(List)
For i = 0 To UBound(List)
CurVal = List(i)
If i = UBound(List) Then
NxtVal = ""
Else
NxtVal = List(i + 1)
End If
If CurVal = NxtVal Then
DupCnt = DupCnt + 1
Else
DupCnt = DupCnt + 1
ReDim Preserve Result(1 To 2, 0 To x) As Variant
Result(1, x) = CurVal
Result(2, x) = DupCnt
x = x + 1
DupCnt = 0
End If
Next
Result = WorksheetFunction.Transpose(Result)
CountDuplicates = Result
End Function
Function SortAZ(MyArray() As Variant) As Variant()
Dim First As Integer
Dim Last As Integer
Dim i As Integer
Dim x As Integer
Dim Temp As String
First = LBound(MyArray)
Last = UBound(MyArray)
For i = First To Last - 1
For x = i + 1 To Last
If MyArray(i) > MyArray(x) Then
Temp = MyArray(x)
MyArray(x) = MyArray(i)
MyArray(i) = Temp
End If
Next
Next
SortAZ = MyArray
End Function
End Result:
Here is a possible solution that I have started for you. What you are asking to be done gets rather complicated. Here is what I have so far:
Option Explicit
Sub test()
Dim items() As String
Dim itemCount() As String
Dim currCell As Range
Dim currString As String
Dim inArr As Boolean
Dim arrLength As Integer
Dim iterator As Integer
Dim x As Integer
Dim fullRange As Range
Set fullRange = Range("E1:E15")
iterator = 0
For Each cell In fullRange 'cycle through the range that has the values
inArr = False
For Each currString In items 'cycle through all values in array, if
'values is found in array, then inArr is set to true
If currCell.Value = currString Then 'if the value in the cell we
'are currently checking is in the array, then set inArr to true
inArr = True
End If
Next
If inArr = False Then 'if we did not find the value in the array
arrLength = arrLength + 1
ReDim Preserve items(arrLength) 'resize the array to fit the new values
items(iterator) = currCell.Value 'add the value to the array
iterator = iterator + 1
End If
Next
'This where it gets tricky. Now that you have all unique values in the array,
'you will need to count how many times each value is in the range.
'You can either make another array to hold those values or you can
'put those counts on the sheet somewhere to store them and access them later.
'This is tough stuff! It is not easy what you need to be done.
For x = 1 To UBound(items)
Next
End Sub
All that this does so far is get unique values into the array so that you can count how many times each one is in the range.

Random selection from list

I have a list of items in an Excel worksheet, A1-B115. At the moment I can enter 10 variables which retrieves the correct data from the list.
Code now:
C1=1 - run through A1-A115 and check for the value to be between 1000-2000; if so, copy the B value somewhere.
C2=1 - run through A1-A115 and check for the value to be between 2001-3000; if so, copy the B value somewhere.
....
What I would like to do is that I can enter a value (example: 25 or 30) and that my macro randomly selects the right amount of values.
Code I would like to do: C1: 30 -> randomly selects 30 values from B1-B115
This will do the trick.
Sub PickRandomItemsFromList()
Const nItemsToPick As Long = 10
Const nItemsTotal As Long = 115
Dim rngList As Range
Dim varRandomItems() As Variant
Dim i As Long
Set rngList = Range("B1").Resize(nItemsTotal, 1)
ReDim varRandomItems(1 To nItemsToPick)
For i = 1 To nItemsToPick
varRandomItems(i) = rngList.Cells(Int(nItemsTotal * Rnd + 1), 1)
Next i
' varRandomItems now contains nItemsToPick random items from range rngList.
End Sub
As discussed in the comments, this will allow individual items to be picked more than once within the nItemsToPick picked, if for example number 63 happens to be randomly picked twice. If you don't want this to happen, then an additional loop will have to be added to check whether the item about to be picked is already in the list, for example like so:
Sub PickRandomItemsFromList()
Const nItemsToPick As Long = 10
Const nItemsTotal As Long = 115
Dim rngList As Range
Dim idx() As Long
Dim varRandomItems() As Variant
Dim i As Long
Dim j As Long
Dim booIndexIsUnique As Boolean
Set rngList = Range("B1").Resize(nItemsTotal, 1)
ReDim idx(1 To nItemsToPick)
ReDim varRandomItems(1 To nItemsToPick)
For i = 1 To nItemsToPick
Do
booIndexIsUnique = True ' Innoncent until proven guilty
idx(i) = Int(nItemsTotal * Rnd + 1)
For j = 1 To i - 1
If idx(i) = idx(j) Then
' It's already there.
booIndexIsUnique = False
Exit For
End If
Next j
If booIndexIsUnique = True Then
Exit Do
End If
Loop
varRandomItems(i) = rngList.Cells(idx(i), 1)
Next i
' varRandomItems now contains nItemsToPick unique random
' items from range rngList.
End Sub
Note that this will loop forever if nItemsToPick > nItemsTotal !
I would use a collection to make sure you don't get any duplicates.
Function cItemsToPick(NrOfItems As Long, NrToPick As Long) As Collection
Dim cItemsTotal As New Collection
Dim K As Long
Dim I As Long
Set cItemsToPick = New Collection
If NrToPick > NrOfItems Then Exit Function
For I = 1 To NrOfItems
cItemsTotal.Add I
Next I
For I = 1 To NrToPick
K = Int(cItemsTotal.Count * Rnd + 1)
cItemsToPick.Add cItemsTotal(K)
cItemsTotal.Remove (K)
Next I
Set cItemsTotal = Nothing
End Function
You can test this function with the following code:
Sub test()
Dim c As New Collection
Dim I As Long
Set c = cItemsToPick(240, 10)
For I = 1 To c.Count
Debug.Print c(I)
Next I
End Sub

Excel VBA: Sorting a range using a permutation array as the key

I'd like to sort a range using a permutation stored in an array. I tried something like this:
Sub PermSort()
Dim Perm() As Variant
Perm = Array(1, 6, 7, 8, 5, 2, 4, 3)
Range("A1:A8").Sort Key1:=Perm, order1:=xlAscending
End Sub
but it doesn't work, since Key1 is expected to be a Range.
Is it possible to somehow use the build in .Sort method?
From here:
The VBA language has no support for sorting the values stored in an array. One method that can be used to sort arrays is to put the data on to a worksheet, sort the data on the worksheet, and then read back the values from the worksheet into the array. The other method for sorting arrays is to use the QSort algorithm to sort the array in place. This page describes both methods, with variations on the QSort method.
HTH!
PS:
The code for the first method is:
Sub SortViaWorksheet()
Dim Arr(1 To 5) As String ' this is the array to be sorted
Dim WS As Worksheet ' temporary worksheet
Dim R As Range
Dim N As Long
' fill up the array with some
' aribtrary values.
Arr(1) = "aaa"
Arr(2) = "zzz"
Arr(3) = "mmm"
Arr(4) = "ttt"
Arr(5) = "bbb"
Application.ScreenUpdating = False
' create a new sheet
Set WS = ThisWorkbook.Worksheets.Add
' put the array values on the worksheet
Set R = WS.Range("A1").Resize(UBound(Arr) - LBound(Arr) + 1, 1)
R = Application.Transpose(Arr)
' sort the range
R.Sort key1:=R, order1:=xlAscending, MatchCase:=False
' load the worksheet values back into the array
For N = 1 To R.Rows.Count
Arr(N) = R(N, 1)
Next N
' delete the temporary sheet
Application.DisplayAlerts = False
WS.Delete
Application.DisplayAlerts = True
Application.ScreenUpdating = True
' test/debug/confirmation
For N = LBound(Arr) To UBound(Arr)
Debug.Print Arr(N)
Next N
End Sub
For the second one there is a Qsort standard routine. You may download it from the original site.

Resources