Sorting Algorithm for expensive swapping? - algorithm

I came across the following problem in the application I'm developing:
I'm given two lists:
list1 = { Z, K, A, B, A, C }
list2 = { A, A, B, C, K, Z }
list2 is the guaranteed to be the sorted version of list1.
My objective is to sort list1 only by swapping elements within list1. So for example, I cannot iterate through list2 and simply assign every element i in list1 to every element j in list2.
Using list2 as a resource, I need to sort list1 in the absolute minimum number of swaps possible.
Is there a set of algorithms specifically for this purpose? I've not heard of such a thing.

I wrote this code in java in order to do the minimal swaps,
Since the second list is guaranteed to be sorted we can look up for each element in it and find its index from the first list then do a swap between the current indexed element and the one that we found.
Update: I modified findLastElementIndex as it checks if the swapped element will be in the right index after swapping based on list2.
public class Testing {
private static String[] unorderedList = {"Z", "C", "A", "B", "A", "K"};
private static String[] orderedList = {"A", "A", "B", "C", "K", "Z"};
private static int numberOfSwaps;
public static void main(String[] args) {
for (int i = 0; i < unorderedList.length; i++) {
if (!unorderedList[i].equals(orderedList[i])) {
int index = findElementToSwapIndex(i, orderedList[i]);
swapElements(unorderedList, i, index);
}
}
System.out.println(numberOfSwaps);
}
private static void swapElements(String[] list, int indexOfFirstElement, int IndexOfSecElement) {
String temp = list[indexOfFirstElement];
list[indexOfFirstElement] = list[IndexOfSecElement];
list[IndexOfSecElement] = temp;
numberOfSwaps++;
}
private static int findElementToSwapIndex(int currentIndexOfUnorderedList , String letter) {
int lastElementToSwapIndex = 0;
for (int i = 0; i < unorderedList.length; i++) {
if (unorderedList[i].equals(letter)) {
lastElementToSwapIndex = i;
if(unorderedList[currentIndexOfUnorderedList].equals(orderedList[lastElementToSwapIndex])){// check if the swapped element will be in the right place in regard to list 2
return lastElementToSwapIndex;
}
}
}
return lastElementToSwapIndex;
}
}
min number of swaps for this code was the same as in https://stackoverflow.com/a/40507589/6726632
Hopefully this can help you.

Related

Sort two lists the same way

I need to sort a list of DateTime from earliest to latest.
List<DateTime> list = [2021-01-15 12:26:40.709246, 2021-02-25 13:26:40.709246, 2021-02-20 19:26:40.709246];
datetimeList.sort();
I have another list of Strings.
List<String> list = ["one", "two", "three"];
The indexes of stringList have to match the indexes of datetimeList. So the index of "one" always has to be the same as the index of 2021-01-15 12:26:40.709246 and so on.
If I sort the lists individually, the DateTime is sorted by DateTime and the Strings are sorted alphabetically. This way, the String does not go with its initial date anymore.
How can I sort one list (datetimeList) with the other list (stringList) sorting exactly the same way?
The easiest solution would be to create a struct/class to combine both variables so you don't have to worry about keeping the objects in the arrays aligned. The last thing you need to do is to sort the array ob new objects by the date. For that, I cannot help you due to missing knowledge about Dart.
You could us a SplayTreeMap as well.https://api.dart.dev/stable/2.8.4/dart-collection/SplayTreeMap-class.html.
SplayTreeMap ensures that its keys are in sorted order.You could use your datetime as key and the its contents of other list as value.
main() {
final SplayTreeMap<DateTime, String> map =
new SplayTreeMap<DateTime, String>();
map[DateTime.parse("2021-01-15 12:26:40.709246")] = "one";
map[DateTime.parse("2021-02-25 13:26:40.709246")] = "three";
map[DateTime.parse("2021-02-20 19:26:40.709246")] = "two";
for (final DateTime key in map.keys) {
print("$key : ${map[key]}");
}
}
I recommend the simpler suggestions given here.
For completeness, I'll provide one more approach: Compute the permutation by sorting a list of indices:
List<int> sortedPermutation<T>(List<T> elements, int compare(T a, T b)) =>
[for (var i = 0; i < elements.length; i++) i]
..sort((i, j) => compare(elements[i], elements[j]));
Then you can reorder the existing lists to match:
List<T> reorder<T>(List<T> elements, List<int> permutation) =>
[for (var i = 0; i < permutation.length; i++) elements[permutation[i]]];
If you do:
var sorted = reorder(original, sortedPermutation(original, compare));
it should give you a sorted list.
It's less efficient than sorting in-place because you create a new list,
but you can apply the same reordering to multiple lists afterwards.
Fast and very effective way.
void main() {
final l1 = [3, 1, 2];
final l2 = ['three', 'one', 'two'];
final l3 = ['drei', 'ein', 'zwei'];
print(l1);
print(l2);
print(l3);
myCompare(int x, int y) => x.compareTo(y);
l1.sortLists([l2, l3], myCompare);
print('============');
print(l1);
print(l2);
print(l3);
}
extension SortListByList<E> on List<E> {
sortLists(Iterable<List> lists, int Function(E, E) compare) {
for (final list in lists) {
if (list.length != length) {
throw StateError('The length of lists must be equal');
}
}
final rules = <int>[];
sort((x, y) {
final rule = compare(x, y);
rules.add(rule);
return rule;
});
for (final list in lists) {
var rule = 0;
list.sort((x, y) => rules[rule++]);
}
}
}
Output:
[3, 1, 2]
[three, one, two]
[drei, ein, zwei]
============
[1, 2, 3]
[one, two, three]
[ein, zwei, drei]

Remove duplicates in O(n) by hand

I need to remove all duplicates in a list, but only if the item in list a is the same in list b aswell. This is my current code, but at 100k items it's taking literal days, is there a fast way to do this?
Any help appreciated.
List<int> ind = new List<int>();
List<int> used = new List<int>();
for (int i = 0; i < a.Count; i++)
{
for (int j = 0; j < a.Count; j++)
{
if (i != j&&!used.Contains(i))
{
if (a[j] == a[i] && b[i] == b[j])
{
ind.Add(j);
used.Add(j);
}
}
}
}
List<string> s2 = new List<string>();
List<string> a2 = new List<string>();
for (int i = 0; i < a.Count; i++)
{
if (!ind.Contains(i))
{
s2.Add(a[i]);
a2.Add(b[i]);
}
}
The key to many such problems is the correct data structure. To avoid duplicates, you need to use Sets, as they remove duplicates automatically.
Here is the code in Java, I hope it is similar in C#:
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Random;
import java.util.Set;
class Duplicates
{
static List<Integer> list1 = new ArrayList<>();
static List<Integer> list2 = new ArrayList<>();
static final int SIZE = 100_000;
static final int MAX_VALUE = 1000_000;
public static void main(String[] args)
{
// populate the lists with random values for testing
Random r = new Random();
for(int i=0; i<SIZE; i++)
{
list1.add(r.nextInt(MAX_VALUE));
list2.add(r.nextInt(MAX_VALUE));
}
Set<Integer> set1 = new HashSet<>(list1);
Set<Integer> set2 = new HashSet<>(list2);
// items that are in both lists
Set<Integer> intersection = new HashSet<>(set1);
intersection.retainAll(set2);
Set<Integer> notSeenYet = new HashSet<>(intersection);
List<Integer> list1Unique = new ArrayList<Integer>();
for(int n: list1)
{
if(intersection.contains(n)) // we may have to skip this one
{
if(notSeenYet.contains(n)) // no, don't skip, it's the first occurrence
{
notSeenYet.remove(n);
}
else
{
continue;
}
}
list1Unique.add(n);
}
System.out.println("list 1 contains "+list1Unique.size()+" values after removing all duplicates that are also in list 2");
}
}
It takes less than a second for 100k values.
Output
list 1 contains 99591 values after removing all duplicates that are
also in list 2
Create a HashSet.
First, iterate through the list b and add all elements into the HashSet.
Then, iterate through each element of the list a. When you visit an element, ask the HashSet if it already contains that element. If if doesn't, it's a new element, so just keep it. If it does, it is a duplicate and you can remove it from a.
HashSets can perform the Do you have this element? question in O(1), so for the whole list, you have O(n).
For more information, check the documentation.
Here is a general algorithm to consider. We can start by sorting both lists in ascending order. Using a good sorting algorithm such as merge sort, this would take O(NlgN) time, where N is the length of the list. Once we have paid this penalty, we need only maintain two pointers in each of the lists. The general algorithm would basically involve walking up both lists, searching for duplicates in the first a list, should the value in question match the pointer into the b list. If there be a match, then duplicates would be removed from the a list, otherwise we keep walking until reaching the end of the a list. This process would be only O(N), making the biggest penalty the initial sort, which is O(NlgN).
To "remove duplicates" I understand to mean "from n identical items, leave the first and remove the remaining n - 1". If so then this is the algorithm:
Convert list b to set B. Also introduce set A_dup. Run through list a and for each item:
if item is found in A_dup then remove it from a,
else if item is found in set B then add it to A_dup.
Repeat.
Checking for existence in sets (both A_dup and B) is O(1) operation, also to add new item in the set. So, you're left with iterating through list a, which in total gives us O(n).
I think what you are trying to do is find distinct pairs, right?
If so, you can do that in one line using Zip and Distinct and a C# Tuple (or use an anonymous type).
var result = a.Zip(b, (x,y) => (x, y)).Distinct();
import java.util.*;
import java.util.stream.Collectors;
public class Test {
public static void main(String args[]) {
List<String> dupliKhaneList = new ArrayList<>();
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Khan");
dupliKhaneList.add("Vaquar");
dupliKhaneList.add("Zidan");
// Solution 1 if want to remove in list
List<String> uniqueList = dupliKhaneList.stream().distinct().collect(Collectors.toList());
System.out.println("DupliKhane => " + dupliKhaneList);
System.out.println("Unique 1 => " + uniqueList);
// Solution 2 if want to remove using 2 list
List<String> list1 = new ArrayList<>();
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Vaquar");
list1.add("Khan");
list1.add("Vaquar");
list1.add("Zidan");
List<String> list2 = new ArrayList<>();
list2.add("Zidan");
System.out.println("list1 => " + list1);
System.out.println("list2 => " + list2);
list1.removeAll(list2);
System.out.println("removeAll duplicate => " + list1);
}
}
Results :
DupliKhane => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
Unique 1 => [Vaquar, Khan, Zidan]
list1 => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar, Zidan]
list2 => [Zidan]
removeAll duplicate => [Vaquar, Khan, Vaquar, Vaquar, Khan, Vaquar]

Generate ordered list of sum between elements in large lists

I'm not sure whether this question should be posted in math of overflow, but here we go.
I have an arbitrary amount of ordered lists (say 3 for example) with numerical values. These lists can be long enough that trying all combinations of values becomes too computationally heavy.
What I need is to get an ordered list of possible sums when picking one value from each of the lists. Since the lists can be large, I only want the N smallest sums.
What I've considered is to step down one of the lists for each iteration. This however misses many cases that would have been possible if another list would have been chosen for that step.
An alternative would be a recursive solution, but that would generate many duplicate cases instead.
Is there any known methods that could solve such a problem?
Let we have K lists.
Make min-heap.
a) Push a structure contaning sum of elements from every list (the first ones at this elements) and list of indexes key = Sum(L[i][0]), [ix0=0, ix1=0, ix2=0]
b) Pop the smallest element from the heap, output key (sum) value
c) Construct K new elements from popped one - for every increment corresponding index and update sum
key - L[0][ix0] + L[0][ix0 + 1], [ix0 + 1, ix1, ix2]
key - L[1][ix1] + L[1][ix1 + 1], [ix0, ix1 + 1, ix2]
same for ix2
d) Push them into the heap
e) Repeat from b) until N smallest sums are extracted
A Java implementation of the min heap algorithm with a simple test case:
The algorithm itself is just as described by #MBo.
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
import java.util.PriorityQueue;
class MinHeapElement {
int sum;
List<Integer> idx;
}
public class SumFromKLists {
public static List<Integer> sumFromKLists(List<List<Integer>> lists, int N) {
List<Integer> ans = new ArrayList<>();
if(N == 0) {
return ans;
}
PriorityQueue<MinHeapElement> minPq = new PriorityQueue<>(new Comparator<MinHeapElement>() {
#Override
public int compare(MinHeapElement e1, MinHeapElement e2) {
return e1.sum - e2.sum;
}
});
MinHeapElement smallest = new MinHeapElement();
smallest.idx = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
smallest.sum += lists.get(i).get(0);
smallest.idx.add(0);
}
minPq.add(smallest);
ans.add(smallest.sum);
while(ans.size() < N) {
MinHeapElement curr = minPq.poll();
if(ans.get(ans.size() - 1) != curr.sum) {
ans.add(curr.sum);
}
List<MinHeapElement> candidates = nextPossibleCandidates(lists, curr);
if(candidates.size() == 0) {
break;
}
minPq.addAll(candidates);
}
return ans;
}
private static List<MinHeapElement> nextPossibleCandidates(List<List<Integer>> lists, MinHeapElement minHeapElement) {
List<MinHeapElement> candidates = new ArrayList<>();
for(int i = 0; i < lists.size(); i++) {
List<Integer> currList = lists.get(i);
int newIdx = minHeapElement.idx.get(i) + 1;
while(newIdx < currList.size() && currList.get(newIdx) == currList.get(newIdx - 1)) {
newIdx++;
}
if(newIdx < currList.size()) {
MinHeapElement nextElement = new MinHeapElement();
nextElement.sum = minHeapElement.sum + currList.get(newIdx) - currList.get(minHeapElement.idx.get(i));
nextElement.idx = new ArrayList<>(minHeapElement.idx);
nextElement.idx.set(i, newIdx);
candidates.add(nextElement);
}
}
return candidates;
}
public static void main(String[] args) {
List<Integer> list1 = new ArrayList<>();
list1.add(2); list1.add(4); list1.add(7); list1.add(8);
List<Integer> list2 = new ArrayList<>();
list2.add(1); list2.add(3); list2.add(5); list2.add(8);
List<List<Integer>> lists = new ArrayList<>();
lists.add(list1); lists.add(list2);
sumFromKLists(lists, 11);
}
}

Best approach to fit numbers

I have the following set of integers {2,9,4,1,8}. I need to divide this set into two subsets so that the sum of the sets results in 14 and 10 respectively. In my example the answer is {2,4,8} and {9,1}. I am not looking for any code. I am pretty sure there must be a standard algorithm to solve this problem. Since i was not successful in googling and finding out that myself, i posted my query here. So what will be the best way to approach this problem?
My try was like this...
public class Test {
public static void main(String[] args) {
int[] input = {2, 9, 4, 1, 8};
int target = 14;
Stack<Integer> stack = new Stack<>();
for (int i = 0; i < input.length; i++) {
stack.add(input[i]);
for (int j = i+1;j<input.length;j++) {
int sum = sumInStack(stack);
if (sum < target) {
stack.add(input[j]);
continue;
}
if (target == sum) {
System.out.println("Eureka");
}
stack.remove(input[i]);
}
}
}
private static int sumInStack(Stack<Integer> stack) {
int sum = 0;
for (Integer integer : stack) {
sum+=integer;
}
return sum;
}
}
I know this approach is not even close to solve the problem
I need to divide this set into two subsets so that the sum of the sets results in 14 and 10 respectively.
If the subsets have to sum to certain values, then it had better be true that the sum of the entire set is the sum of those values, i.e. 14+10=24 in your example. If you only have to find the two subsets, then the problem isn't very difficult — find any subset that sums to one of those values, and the remaining elements of the set must sum to the other value.
For the example set you gave, {2,9,4,1,8}, you said that the answer is {9,1}, {2,4,8}, but notice that that's not the only answer; there's also {2,8}, {9,4,1}.

Efficient tuple search algorithm

Given a store of 3-tuples where:
All elements are numeric ex :( 1, 3, 4) (1300, 3, 15) (1300, 3, 15) …
Tuples are removed and added frequently
At any time the store is typically under 100,000 elements
All Tuples are available in memory
The application is interactive requiring 100s of searches per second.
What are the most efficient algorithms/data structures to perform wild card (*) searches such as:
(1, *, 6) (3601, *, *) (*, 1935, *)
The aim is to have a Linda like tuple space but on an application level
Well, there are only 8 possible arrangements of wildcards, so you can easily construct 6 multi-maps and a set to serve as indices: one for each arrangement of wildcards in the query. You don't need an 8th index because the query (*,*,*) trivially returns all tuples. The set is for tuples with no wildcards; only a membership test is needed in this case.
A multimap takes a key to a set. In your example, e.g., the query (1,*,6) would consult the multimap for queries of the form (X,*,Y), which takes key <X,Y> to the set of all tuples with X in the first position and Y in third. In this case, X=1 and Y=6.
With any reasonable hash-based multimap implementation, lookups ought to be very fast. Several hundred a second ought to be easy, and several thousand per second doable (with e.g a contemporary x86 CPU).
Insertions and deletions require updating the maps and set. Again this ought to be reasonably fast, though not as fast as lookups of course. Again several hundred per second ought to be doable.
With only ~10^5 tuples, this approach ought to be fine for memory as well. You can save a bit of space with tricks, e.g. keeping a single copy of each tuple in an array and storing indices in the map/set to represent both key and value. Manage array slots with a free list.
To make this concrete, here is pseudocode. I'm going to use angle brackets <a,b,c> for tuples to avoid too many parens:
# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
Note that for simplicity, I've not shown the "optimizations" for M(0) and M(7). For M(0), the algorithm above would create a multimap taking the empty tuple to the set of all 3-tuples in the database. You can avoid this merely by treating i=0 as a special case. Similarly M(7) would take each tuple to a set containing only itself.
An "optimized" version:
fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
Addition
For fun, a Java implementation:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
#Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
#Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
#Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
Some output:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...
If you think of the tuples like a ip address, then a radix tree (trie) type structure might work. Radix tree is used for IP discovery.
Another way maybe to calculate use bit operations and calculate a bit hash for the tuple and in your search do bit (or, and) for quick discovery.

Resources