Performance of Mass-Evaluating Expressions in IronPython - performance

In an C#-4.0 application, I have a Dictionary of strongly typed ILists having the same length - a dynamically strongly typed column based table.
I want the user to provide one or more (python-)expressions based on the available columns that will be aggregated over all rows. In a static context it would be:
IDictionary<string, IList> table;
// ...
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += (double)a[i] / b[i]; // Expression to sum up
For n = 10^7 this runs in 0.270 sec on my laptop (win7 x64). Replacing the expression by a delegate with two int arguments it takes 0.580 sec, for a nontyped delegate 1.19 sec.
Creating the delegate from IronPython with
IDictionary<string, IList> table;
// ...
var options = new Dictionary<string, object>();
options["DivisionOptions"] = PythonDivisionOptions.New;
var engine = Python.CreateEngine(options);
string expr = "a / b";
Func<int, int, double> f = engine.Execute("lambda a, b : " + expr);
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
sum += f(a[i], b[i]);
it takes 3.2 sec (and 5.1 sec with Func<object, object, object>) - factor 4 to 5.5. Is this the expected overhead for what I'm doing? What could be improved?
If I have many columns, the approach chosen above will not be sufficient any more. One solution could be to determine the required columns for each expression and use only those as arguments. The other solution I've unsuccessfully tried was using a ScriptScope and dynamically resolve the columns. For that I defined a RowIterator that has a RowIndex for the active row and a property for each column.
class RowIterator
{
IList<int> la;
IList<int> lb;
public RowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
}
A ScriptScope can be created from a IDynamicMetaObjectProvider, which I expected to be implemented by C#'s dynamic - but at runtime engine.CreateScope(IDictionary) is trying to be called, which fails.
dynamic iterator = new RowIterator(a, b) as dynamic;
var scope = engine.CreateScope(iterator);
var expr = engine.CreateScriptSourceFromString("a / b").Compile();
double sum = 0;
for (int i = 0; i < n; i++)
{
iterator.Index = i;
sum += expr.Execute<double>(scope);
}
Next I tried to let RowIterator inherit from DynamicObject and made it to a running example - with terrible performance: 158 sec.
class DynamicRowIterator : DynamicObject
{
Dictionary<string, object> members = new Dictionary<string, object>();
IList<int> la;
IList<int> lb;
public DynamicRowIterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public override bool TryGetMember(GetMemberBinder binder, out object result)
{
if (binder.Name == "a") // Why does this happen?
{
result = this.a;
return true;
}
if (binder.Name == "b")
{
result = this.b;
return true;
}
if (base.TryGetMember(binder, out result))
return true;
if (members.TryGetValue(binder.Name, out result))
return true;
return false;
}
public override bool TrySetMember(SetMemberBinder binder, object value)
{
if (base.TrySetMember(binder, value))
return true;
members[binder.Name] = value;
return true;
}
}
I was surprised that TryGetMember is called with the name of the properties. From the documentation I would have expected that TryGetMember would only be called for undefined properties.
Probably for a sensible performance I would need to implement IDynamicMetaObjectProvider for my RowIterator to make use of dynamic CallSites, but couldn't find a suited example for me to start with. In my experiments I didn't know how to handle __builtins__ in BindGetMember:
class Iterator : IDynamicMetaObjectProvider
{
IList<int> la;
IList<int> lb;
public Iterator(IList<int> a, IList<int> b)
{
this.la = a;
this.lb = b;
}
public int RowIndex { get; set; }
public int a { get { return la[RowIndex]; } }
public int b { get { return lb[RowIndex]; } }
public DynamicMetaObject GetMetaObject(Expression parameter)
{
return new MetaObject(parameter, this);
}
private class MetaObject : DynamicMetaObject
{
internal MetaObject(Expression parameter, Iterator self)
: base(parameter, BindingRestrictions.Empty, self) { }
public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
{
switch (binder.Name)
{
case "a":
case "b":
Type type = typeof(Iterator);
string methodName = binder.Name;
Expression[] parameters = new Expression[]
{
Expression.Constant(binder.Name)
};
return new DynamicMetaObject(
Expression.Call(
Expression.Convert(Expression, LimitType),
type.GetMethod(methodName),
parameters),
BindingRestrictions.GetTypeRestriction(Expression, LimitType));
default:
return base.BindGetMember(binder);
}
}
}
}
I'm sure my code above is suboptimal, at least it doesn't handle the IDictionary of columns yet. I would be grateful for any advices on how to improve design and/or performance.

I also compared the performance of IronPython against a C# implementation. The expression is simple, just adding the values of two arrays at a specified index. Accessing the arrays directly provides the base line and theoretical optimum. Accessing the values via a symbol dictionary has still acceptable performance.
The third test creates a delegate from a naive (and bad by intend) expression tree without any fancy stuff like call-side caching, but it's still faster than IronPython.
Scripting the expression via IronPython takes the most time. My profiler shows me that most time is spent in PythonOps.GetVariable, PythonDictionary.TryGetValue and PythonOps.TryGetBoundAttr. I think there's room for improvement.
Timings:
Direct: 00:00:00.0052680
via Dictionary: 00:00:00.5577922
Compiled Delegate: 00:00:03.2733377
Scripted: 00:00:09.0485515
Here's the code:
public static void PythonBenchmark()
{
var engine = Python.CreateEngine();
int iterations = 1000;
int count = 10000;
int[] a = Enumerable.Range(0, count).ToArray();
int[] b = Enumerable.Range(0, count).ToArray();
Dictionary<string, object> symbols = new Dictionary<string, object> { { "a", a }, { "b", b } };
Func<int, object> calculate = engine.Execute("lambda i: a[i] + b[i]", engine.CreateScope(symbols));
var sw = Stopwatch.StartNew();
int sum = 0;
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += a[i] + b[i];
}
}
Console.WriteLine("Direct: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += ((int[])symbols["a"])[i] + ((int[])symbols["b"])[i];
}
}
Console.WriteLine("via Dictionary: " + sw.Elapsed);
var indexExpression = Expression.Parameter(typeof(int), "index");
var indexerMethod = typeof(IList<int>).GetMethod("get_Item");
var lookupMethod = typeof(IDictionary<string, object>).GetMethod("get_Item");
Func<string, Expression> getSymbolExpression = symbol => Expression.Call(Expression.Constant(symbols), lookupMethod, Expression.Constant(symbol));
var addExpression = Expression.Add(
Expression.Call(Expression.Convert(getSymbolExpression("a"), typeof(IList<int>)), indexerMethod, indexExpression),
Expression.Call(Expression.Convert(getSymbolExpression("b"), typeof(IList<int>)), indexerMethod, indexExpression));
var compiledFunc = Expression.Lambda<Func<int, object>>(Expression.Convert(addExpression, typeof(object)), indexExpression).Compile();
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)compiledFunc(i);
}
}
Console.WriteLine("Compiled Delegate: " + sw.Elapsed);
sw.Restart();
for (int iteration = 0; iteration < iterations; iteration++)
{
for (int i = 0; i < count; i++)
{
sum += (int)calculate(i);
}
}
Console.WriteLine("Scripted: " + sw.Elapsed);
Console.WriteLine(sum); // make sure cannot be optimized away
}

Although I don't know all the specific details in your case, a slowdown of only 5x for doing anything this low level in IronPython is actually pretty good. Most entries in the Computer Languages Benchmark Game show a 10-30x slowdown.
A major part of the reason is that IronPython has to allow for the possibility that you've done something sneaky at runtime, and thus can't produce code of the same efficiency.

Related

Insertion Sort for Singly Linked List [EXTERNAL]

I'm not sure where to start, but this is messy. Basically I need to write an Insertion Sort method for singly linked list - which causes enough problems, because usually for Insertion Sort - you're supposed to go through array/list elements backwards - which implementing into a singly linked list seems pointless, because the point of it - is that you're only capable of going forwards in the list and in addition to that -> I need to execute "swap" operations externally, which I do not completely understand how to perform that while using list structure.
This is my ArrayClass and Swap method that I used:
class MyFileArray : DataArray
{
public MyFileArray(string filename, int n, int seed)
{
double[] data = new double[n];
length = n;
Random rand = new Random(seed);
for (int i = 0; i < length; i++)
{
data[i] = rand.NextDouble();
}
if (File.Exists(filename)) File.Delete(filename);
try
{
using (BinaryWriter writer = new BinaryWriter(File.Open(filename,
FileMode.Create)))
{
for (int j = 0; j < length; j++)
writer.Write(data[j]);
}
}
catch (IOException ex)
{
Console.WriteLine(ex.ToString());
}
}
public FileStream fs { get; set; }
public override double this[int index]
{
get
{
Byte[] data = new Byte[8];
fs.Seek(8 * index, SeekOrigin.Begin);
fs.Read(data, 0, 8);
double result = BitConverter.ToDouble(data, 0);
return result;
}
}
public override void Swap(int j, double a)
{
Byte[] data = new Byte[16];
BitConverter.GetBytes(a).CopyTo(data, 0);
fs.Seek(8 * (j + 1), SeekOrigin.Begin);
fs.Write(data, 0, 8);
}
}
And this is my Insertion Sort for array:
public static void InsertionSort(DataArray items)
{
double key;
int j;
for (int i = 1; i < items.Length; i++)
{
key = items[i];
j = i - 1;
while (j >= 0 && items[j] > key)
{
items.Swap(j, items[j]);
j = j - 1;
}
items.Swap(j, key);
}
}
Now I somehow have to do the same exact thing - however using Singly Linked List, I'm given this kind of class to work with (allowed to make changes):
class MyFileList : DataList
{
int prevNode;
int currentNode;
int nextNode;
public MyFileList(string filename, int n, int seed)
{
length = n;
Random rand = new Random(seed);
if (File.Exists(filename)) File.Delete(filename);
try
{
using (BinaryWriter writer = new BinaryWriter(File.Open(filename,
FileMode.Create)))
{
writer.Write(4);
for (int j = 0; j < length; j++)
{
writer.Write(rand.NextDouble());
writer.Write((j + 1) * 12 + 4);
}
}
}
catch (IOException ex)
{
Console.WriteLine(ex.ToString());
}
}
public FileStream fs { get; set; }
public override double Head()
{
Byte[] data = new Byte[12];
fs.Seek(0, SeekOrigin.Begin);
fs.Read(data, 0, 4);
currentNode = BitConverter.ToInt32(data, 0);
prevNode = -1;
fs.Seek(currentNode, SeekOrigin.Begin);
fs.Read(data, 0, 12);
double result = BitConverter.ToDouble(data, 0);
nextNode = BitConverter.ToInt32(data, 8);
return result;
}
public override double Next()
{
Byte[] data = new Byte[12];
fs.Seek(nextNode, SeekOrigin.Begin);
fs.Read(data, 0, 12);
prevNode = currentNode;
currentNode = nextNode;
double result = BitConverter.ToDouble(data, 0);
nextNode = BitConverter.ToInt32(data, 8);
return result;
}
To be completely honest - I'm not sure neither how I'm supposed to implement Insertion Sort nor How then translate it into an external sort. I've used this code for not external sorting previously:
public override void InsertionSort()
{
sorted = null;
MyLinkedListNode current = headNode;
while (current != null)
{
MyLinkedListNode next = current.nextNode;
sortedInsert(current);
current = next;
}
headNode = sorted;
}
void sortedInsert(MyLinkedListNode newnode)
{
if (sorted == null || sorted.data >= newnode.data)
{
newnode.nextNode = sorted;
sorted = newnode;
}
else
{
MyLinkedListNode current = sorted;
while (current.nextNode != null && current.nextNode.data < newnode.data)
{
current = current.nextNode;
}
newnode.nextNode = current.nextNode;
current.nextNode = newnode;
}
}
So if someone could maybe give some kind of tips/explanations - or maybe if you have ever tried this - code examples how to solve this kind of problem, would be appreciated!
I actually have solved this fairly recently.
Here's the code sample that you can play around with, it should work out of the box.
public class SortLinkedList {
public static class LinkListNode {
private Integer value;
LinkListNode nextNode;
public LinkListNode(Integer value, LinkListNode nextNode) {
this.value = value;
this.nextNode = nextNode;
}
public Integer getValue() {
return value;
}
public void setValue(Integer value) {
this.value = value;
}
public LinkListNode getNextNode() {
return nextNode;
}
public void setNextNode(LinkListNode nextNode) {
this.nextNode = nextNode;
}
#Override
public String toString() {
return this.value.toString();
}
}
public static void main(String...args) {
LinkListNode f = new LinkListNode(12, null);
LinkListNode e = new LinkListNode(11, f);
LinkListNode c = new LinkListNode(13, e);
LinkListNode b = new LinkListNode(1, c);
LinkListNode a = new LinkListNode(5, b);
print(sort(a));
}
public static void print(LinkListNode aList) {
LinkListNode iterator = aList;
while (iterator != null) {
System.out.println(iterator.getValue());
iterator = iterator.getNextNode();
}
}
public static LinkListNode sort(LinkListNode aList){
LinkListNode head = new LinkListNode(null, aList);
LinkListNode fringePtr = aList.getNextNode();
LinkListNode ptrBeforeFringe = aList;
LinkListNode findPtr;
LinkListNode prev;
while(fringePtr != null) {
Integer valueToInsert = fringePtr.getValue();
findPtr = head.getNextNode();
prev = head;
while(findPtr != fringePtr) {
System.out.println("fringe=" + fringePtr);
System.out.println(findPtr);
if (valueToInsert <= findPtr.getValue()) {
LinkListNode tmpNode = fringePtr.getNextNode();
fringePtr.setNextNode(findPtr);
prev.setNextNode(fringePtr);
ptrBeforeFringe.setNextNode(tmpNode);
fringePtr = ptrBeforeFringe;
break;
}
findPtr = findPtr.getNextNode();
prev = prev.getNextNode();
}
fringePtr = fringePtr.getNextNode();
if (ptrBeforeFringe.getNextNode() != fringePtr) {
ptrBeforeFringe = ptrBeforeFringe.getNextNode();
}
}
return head.getNextNode();
}
}
From a high level, what you are doing is you are keeping track of a fringe ptr, and you are inserting a node s.t. the it is in the correct spot in the corresponding sublist.
For instance, suppose I have this LL.
3->2->5->4
The first iteration, I have fringePtr at 2, and I want to insert 2 somewhere in the sublist that's before the fringe ptr, so I basically traverse starting from head going to the fringe ptr until the value is less than the current value. I also have a previous keeping track of the previous ptr (to account for null, I have a sentinel node at the start of my traversal so I can insert it at the head).
Then, when I see that it's less than the current, I know I need to insert it next to the previous, so I have to:
use a temporary ptr to keep track of my previous's current next.
bind previuos's next to my toInsert node.
bind my toInsert node's next to my temp node.
Then, to continue, you just advance your fringe ptr and try again, basically building up a sublist that is sorted as you move along until fringe hits the end.
i.e. the iterations will look like
1. 3->2->5->4
^
2. 2->3->5->4
^
3. 2->3->5->4
^
4. 2->3->4->5 FIN.

Merge two text input files, each line of the files one after the other. See example

I was trying to solve a problem using java 8 that I have already solved using a simple for loop. However I have no idea how to do this.
The Problem is :
File1 :
1,sdfasfsf
2,sdfhfghrt
3,hdfxcgyjs
File2 :
10,xhgdfgxgf
11,hcvcnhfjh
12,sdfgasasdfa
13,ghdhtfhdsdf
Output should be like
1,sdfasfsf
10,xhgdfgxgf
2,sdfhfghrt
11,hcvcnhfjh
3,hdfxcgyjs
12,sdfgasasdfa
13,ghdhtfhdsdf
I already have this basically working,
The core logic is :
List<String> left = readFile(lhs);
List<String> right = readFile(rhs);
int leftSize = left.size();
int rightSize = right.size();
int size = leftSize > rightSize? leftSize : right.size();
for (int i = 0; i < size; i++) {
if(i < leftSize) {
merged.add(left.get(i));
}
if(i < rightSize) {
merged.add(right.get(i));
}
}
MergeInputs.java
UnitTest
Input files are in src/test/resources/com/linux/test/merge/list of the same repo (only allowed to post two links)
However, I boasted I could do this easily using streams and now I am not sure if this can even be done.
Help is really appreciated.
You may simplify your operation to have less conditionals per element:
int leftSize = left.size(), rightSize = right.size(), min = Math.min(leftSize, rightSize);
List<String> merged = new ArrayList<>(leftSize+rightSize);
for(int i = 0; i < min; i++) {
merged.add(left.get(i));
merged.add(right.get(i));
}
if(leftSize!=rightSize) {
merged.addAll(
(leftSize<rightSize? right: left).subList(min, Math.max(leftSize, rightSize)));
}
Then, you may replace the first part by a stream operation:
int leftSize = left.size(), rightSize = right.size(), min = Math.min(leftSize, rightSize);
List<String> merged=IntStream.range(0, min)
.mapToObj(i -> Stream.of(left.get(i), right.get(i)))
.flatMap(Function.identity())
.collect(Collectors.toCollection(ArrayList::new));
if(leftSize!=rightSize) {
merged.addAll(
(leftSize<rightSize? right: left).subList(min, Math.max(leftSize, rightSize)));
}
But it isn’t really simpler than the loop variant. The loop variant may be even more efficient due to its presized list.
Incorporating both operation into one stream operation would be even more complicated (and probably even less efficient).
the code logic should be like as this:
int leftSize = left.size();
int rightSize = right.size();
int minSize = Math.min(leftSize,rightSize);
for (int i = 0; i < minSize; i++) {
merged.add(left.get(i));
merged.add(right.get(i));
}
// adding remaining elements
merged.addAll(
minSize < leftSize ? left.subList(minSize, leftSize)
: right.subList(minSize, rightSize)
);
Another option is using toggle mode through Iterator, for example:
toggle(left, right).forEachRemaining(merged::add);
//OR using stream instead
List<String> merged = Stream.generate(toggle(left, right)::next)
.limit(left.size() + right.size())
.collect(Collectors.toList());
the toggle method as below:
<T> Iterator<? extends T> toggle(List<T> left, List<T> right) {
return new Iterator<T>() {
private final int RIGHT = 1;
private final int LEFT = 0;
int cursor = -1;
Iterator<T>[] pair = arrayOf(left.iterator(), right.iterator());
#SafeVarargs
private final Iterator<T>[] arrayOf(Iterator<T>... iterators) {
return iterators;
}
#Override
public boolean hasNext() {
for (Iterator<T> each : pair) {
if (each.hasNext()) {
return true;
}
}
return false;
}
#Override
public T next() {
return pair[cursor = next(cursor)].next();
}
private int next(int cursor) {
cursor=pair[LEFT].hasNext()?pair[RIGHT].hasNext()?cursor: RIGHT:LEFT;
return (cursor + 1) % pair.length;
}
};
}

Using minHash to compare more than 2 sets

I have a class called FindSimilar which uses minHash to find similarities between 2 sets (and for this goal, it works great). My problem is that I need to compare more than 2 sets, more specifically, I need to compare a given set1 with an unknown amount of other sets. Here is the class:
import java.util.HashSet;
import java.util.Map;
import java.util.Random;
import java.util.Set;
public class FindSimilar<T>
{
private int hash[];
private int numHash;
public FindSimilar(int numHash)
{
this.numHash = numHash;
hash = new int[numHash];
Random r = new Random(11);
for (int i = 0; i < numHash; i++)
{
int a = (int) r.nextInt();
int b = (int) r.nextInt();
int c = (int) r.nextInt();
int x = hash(a * b * c, a, b, c);
hash[i] = x;
}
}
public double similarity(Set<T> set1, Set<T> set2)
{
int numSets = 4;
Map<T, boolean[]> bitMap = buildBitMap(set1, set2);
int[][] minHashValues = initializeHashBuckets(numSets, numHash);
computeFindSimilarForSet(set1, 0, minHashValues, bitMap);
computeFindSimilarForSet(set2, 1, minHashValues, bitMap);
return computeSimilarityFromSignatures(minHashValues, numHash);
}
private static int[][] initializeHashBuckets(int numSets,
int numHashFunctions)
{
int[][] minHashValues = new int[numSets][numHashFunctions];
for (int i = 0; i < numSets; i++)
{
for (int j = 0; j < numHashFunctions; j++)
{
minHashValues[i][j] = Integer.MAX_VALUE;
}
}
return minHashValues;
}
private static double computeSimilarityFromSignatures(
int[][] minHashValues, int numHashFunctions)
{
int identicalFindSimilares = 0;
for (int i = 0; i < numHashFunctions; i++)
{
if (minHashValues[0][i] == minHashValues[1][i])
{
identicalFindSimilares++;
}
}
return (1.0 * identicalFindSimilares) / numHashFunctions;
}
private static int hash(int x, int a, int b, int c)
{
int hashValue = (int) ((a * (x >> 4) + b * x + c) & 131071);
return Math.abs(hashValue);
}
private void computeFindSimilarForSet(Set<T> set, int setIndex,
int[][] minHashValues, Map<T, boolean[]> bitArray)
{
int index = 0;
for (T element : bitArray.keySet())
{
/*
* for every element in the bit array
*/
for (int i = 0; i < numHash; i++)
{
/*
* for every hash
*/
if (set.contains(element))
{
/*
* if the set contains the element
*/
int hindex = hash[index];
if (hindex < minHashValues[setIndex][index])
{
/*
* if current hash is smaller than the existing hash in
* the slot then replace with the smaller hash value
*/
minHashValues[setIndex][i] = hindex;
}
}
}
index++;
}
}
public Map<T, boolean[]> buildBitMap(Set<T> set1, Set<T> set2)
{
Map<T, boolean[]> bitArray = new HashMap<T, boolean[]>();
for (T t : set1)
{
bitArray.put(t, new boolean[] { true, false });
}
for (T t : set2)
{
if (bitArray.containsKey(t))
{
// item is present in set1
bitArray.put(t, new boolean[] { true, true });
}
else if (!bitArray.containsKey(t))
{
// item is not present in set1
bitArray.put(t, new boolean[] { false, true });
}
}
return bitArray;
}
public static void main(String[] args)
{
Set<String> set1 = new HashSet<String>();
set1.add("FRANCISCO");
set1.add("abc");
set1.add("SAN");
Set<String> set2 = new HashSet<String>();
set2.add("b");
set2.add("a");
set2.add("SAN");
set2.add("USA");
FindSimilar<String> minHash = new FindSimilar<String>(set1.size() + set2.size());
System.out.println("Set1 : " + set1);
System.out.println("Set2 : " + set2);
System.out.println("Similarity between two sets: "
+ minHash.similarity(set1, set2));
}
}
I need to use the similarity method on more than 2 sets. The problem is that I can't find a way to go over all of them. If I create a for, I can't say I want to compare set1 and seti. I am not sure if I am making sense, I must admit I am a bit confused.
The goal of the program is to compare users. A user has a list of contacts (other users) and similar users have similar contacts. Each set is a user and the contents of the sets will be their contacts.
In implementations of set similarity join algorithms, sets are usually converted to an array of integers. Each integer represents a set element, and the conversion is typically done with a hash map. The arrays are sorted, such that the overlap between two sets can be computed in a merge like manner. If you are interested in these algorithms and their pruning techniques, the paper at http://ssjoin.dbresearch.uni-salzburg.at/ could be a good start.
I have found a (not sure if) cheesy solution for my problem by placing all sets inside an ArrayList structure and then converting it to an actual array:
ArrayList<Set<String>> list = new ArrayList<Set<String>>();
for(int i = 0; i < numPeople; i++){
Set<String> set1 = new HashSet<String>();
list.add(set1);
//another for goes here later on
}
Set<String>[] bs = list.toArray(new Set[0]);
.
.
.
public static void main(String[] args)
{
.
.
.
for(int i = 1; i<bs.length; i++){
System.out.format("Set %d: ", i+1);
System.out.println(bs[0]);
System.out.println("Similarity between two sets: "
+ minHash.similarity(bs[0], bs[i]));
}
}
This gives off a The expression of type Set[] needs unchecked conversion to conform to Set<String>[] warning, but runs fine. This does exactly what I wanted it to (I still need a for to put data inside the sets, but that shouldn't be hard. If anyone could tell me if this solution should be used or if there is a better alternative, I'd like to hear it, since I am still learning and any info would be useful.

Finding the index of the first word starting with a given alphabet form a alphabetically sorted list

Based on the current implementation, I will get an arraylist which contains some 1000 unique names in the alphabetically sorted order(A-Z or Z-A) from some source.
I need to find the index of the first word starting with a given alphabet.
So to be more precise, when I select an alphabet, for eg. "M", it should give me the index of the first occurrence of the word starting in "M" form the sorted list.
And that way I should be able to find the index of all the first words starting in each of the 26 alphabets.
Please help me find a solution which doesn't compromise on the speed.
UPDATE:
Actually after getting the 1000 unique names, the sorting is also done by one of my logics.
If this can be done while doing the sorting itself, I can avoid the reiteration on the list after sorting to find the indices for the alphabets.
Is that possible?
Thanks,
Sen
I hope this little piece of code will help you. I guessed the question is related to Java, because you mentioned ArrayList.
String[] unsorted = {"eve", "bob", "adam", "mike", "monica", "Mia", "marta", "pete", "Sandra"};
ArrayList<String> names = new ArrayList<String>(Arrays.asList(unsorted));
String letter = "M"; // find index of this
class MyComp implements Comparator<String>{
String first = "";
String letter;
MyComp(String letter){
this.letter = letter.toUpperCase();
}
public String getFirst(){
return first;
}
#Override
public int compare(String s0, String s1) {
if(s0.toUpperCase().startsWith(letter)){
if(s0.compareTo(first) == -1 || first.equals("")){
first = s0;
}
}
return s0.toUpperCase().compareTo(s1.toUpperCase());
}
};
MyComp mc = new MyComp(letter);
Collections.sort(names, mc);
int index = names.indexOf(mc.getFirst()); // the index of first name starting with letter
I'm not sure if it's possible to also store the index of the first name in the comparator without much overhead. Anyway, if you implement your own version of sorting algorithm e.g. quicksort, you should know about the index of the elements and could calculate the index while sorting. This depends on your chosen sorting algorithm and implementation. In fact if I know how your sorting is implemented, we could insert the index calculation.
So I came up with my own solution for this.
package test.binarySearch;
import java.util.Random;
/**
*
* Binary search to find the index of the first starting in an alphabet
*
* #author Navaneeth Sen <navaneeth.sen#multichoice.co.za>
*/
class SortedWordArray
{
private final String[] a; // ref to array a
private int nElems; // number of data items
public SortedWordArray(int max) // constructor
{
a = new String[max]; // create array
nElems = 0;
}
public int size()
{
return nElems;
}
public int find(String searchKey)
{
return recFind(searchKey, 0, nElems - 1);
}
String array = null;
int arrayIndex = 0;
private int recFind(String searchKey, int lowerBound,
int upperBound)
{
int curIn;
curIn = (lowerBound + upperBound) / 2;
if (a[curIn].startsWith(searchKey))
{
array = a[curIn];
if ((curIn == 0) || !a[curIn - 1].startsWith(searchKey))
{
return curIn; // found it
}
else
{
return recFind(searchKey, lowerBound, curIn - 1);
}
}
else if (lowerBound > upperBound)
{
return -1; // can't find it
}
else // divide range
{
if (a[curIn].compareTo(searchKey) < 0)
{
return recFind(searchKey, curIn + 1, upperBound);
}
else // it's in lower half
{
return recFind(searchKey, lowerBound, curIn - 1);
}
} // end else divide range
} // end recFind()
public void insert(String value) // put element into array
{
int j;
for (j = 0; j < nElems; j++) // find where it goes
{
if (a[j].compareTo(value) > 0) // (linear search)
{
break;
}
}
for (int k = nElems; k > j; k--) // move bigger ones up
{
a[k] = a[k - 1];
}
a[j] = value; // insert it
nElems++; // increment size
} // end insert()
public void display() // displays array contents
{
for (int j = 0; j < nElems; j++) // for each element,
{
System.out.print(a[j] + " "); // display it
}
System.out.println("");
}
} // end class OrdArray
class BinarySearchWordApp
{
static final String AB = "12345aqwertyjklzxcvbnm";
static Random rnd = new Random();
public static String randomString(int len)
{
StringBuilder sb = new StringBuilder(len);
for (int i = 0; i < len; i++)
{
sb.append(AB.charAt(rnd.nextInt(AB.length())));
}
return sb.toString();
}
public static void main(String[] args)
{
int maxSize = 100000; // array size
SortedWordArray arr; // reference to array
int[] indices = new int[27];
arr = new SortedWordArray(maxSize); // create the array
for (int i = 0; i < 100000; i++)
{
arr.insert(randomString(10)); //insert it into the array
}
arr.display(); // display array
String searchKey;
for (int i = 97; i < 124; i++)
{
searchKey = (i == 123)?"1":Character.toString((char) i);
long time_1 = System.currentTimeMillis();
int result = arr.find(searchKey);
long time_2 = System.currentTimeMillis() - time_1;
if (result != -1)
{
indices[i - 97] = result;
System.out.println("Found " + result + "in "+ time_2 +" ms");
}
else
{
if (!(i == 97))
{
indices[i - 97] = indices[i - 97 - 1];
}
System.out.println("Can't find " + searchKey);
}
}
for (int i = 0; i < indices.length; i++)
{
System.out.println("Index [" + i + "][" + (char)(i+97)+"] = " + indices[i]);
}
} // end main()
}
All comments welcome.

Interview - Oracle

In a game the only scores which can be made are 2,3,4,5,6,7,8 and they can be made any number of times
What are the total number of combinations in which the team can play and the score of 50 can be achieved by the team.
example 8,8,8,8,8,8,2 is valid 8,8,8,8,8,4,4,2 is also valid. etc...
The problem can be solved with dynamic programming, with 2 parameters:
i - the index up to which we have considered
s - the total score.
f(i, s) will contain the total number of ways to achieve score s.
Let score[] be the list of unique positive scores that can be made.
The formulation for the DP solution:
f(0, s) = 1, for all s divisible to score[0]
f(0, s) = 0, otherwise
f(i + 1, s) = Sum [for k = 0 .. floor(s/score[i + 1])] f(i, s - score[i + 1] * k)
This looks like a coin change problem. I wrote some Python code for it a while back.
Edited Solution:
from collections import defaultdict
my_dicto = defaultdict(dict)
def row_analysis(v, my_dicto, coins):
temp = 0
for coin in coins:
if v >= coin:
if v - coin == 0: # changed from if v - coin in (0, 1):
temp += 1
my_dicto[coin][v] = temp
else:
temp += my_dicto[coin][v - coin]
my_dicto[coin][v] = temp
else:
my_dicto[coin][v] = temp
return my_dicto
def get_combs(coins, value):
'''
Returns answer for coin change type problems.
Coins are assumed to be sorted.
Example:
>>> get_combs([1,2,3,5,10,15,20], 50)
2955
'''
dicto = defaultdict(dict)
for v in xrange(value + 1):
dicto = row_analysis(v, dicto, coins)
return dicto[coins[-1]][value]
In your case:
>>> get_combs([2,3,4,5,6,7,8], 50)
3095
It is like visit a 7-branches decision tree.
The code is:
class WinScore{
static final int totalScore=50;
static final int[] list={2,3,4,5,6,7,8};
public static int methodNum=0;
static void visitTree( int achieved , int index){
if (achieved >= totalScore ){
return;
}
for ( int i=index; i< list.length; i++ ){
if ( achieved + list[i] == totalScore ) {
methodNum++;
}else if ( achieved + list[i] < totalScore ){
visitTree( achieved + list[i], i );
}
}
}
public static void main( String[] args ){
visitTree(0, 0);
System.out.println("number of methods are:" + methodNum );
}
}
output:
number of methods are:3095
Just stumbled on this question - here's a c# variation which allows you to explore the different combinations:
static class SlotIterator
{
public static IEnumerable<string> Discover(this int[] set, int maxScore)
{
var st = new Stack<Slot>();
var combinations = 0;
set = set.OrderBy(c => c).ToArray();
st.Push(new Slot(0, 0, set.Length));
while (st.Count > 0)
{
var m = st.Pop();
for (var i = m.Index; i < set.Length; i++)
{
if (m.Counter + set[i] < maxScore)
{
st.Push(m.Clone(m.Counter + set[i], i));
}
else if (m.Counter + set[i] == maxScore)
{
m.SetSlot(i);
yield return m.Slots.PrintSlots(set, ++combinations, maxScore);
}
}
}
}
public static string PrintSlots(this int[] slots, int[] set, int numVariation, int maxScore)
{
var sb = new StringBuilder();
var accumulate = 0;
for (var j = 0; j < slots.Length; j++)
{
if (slots[j] <= 0)
{
continue;
}
var plus = "+";
for (var k = 0; k < slots[j]; k++)
{
accumulate += set[j];
if (accumulate == maxScore) plus = "";
sb.AppendFormat("{0}{1}", set[j], plus);
}
}
sb.AppendFormat("={0} - Variation nr. {1}", accumulate, numVariation);
return sb.ToString();
}
}
public class Slot
{
public Slot(int counter, int index, int countSlots)
{
this.Slots = new int[countSlots];
this.Counter = counter;
this.Index = index;
}
public void SetSlot(int index)
{
this.Slots[index]++;
}
public Slot Clone(int newval, int index)
{
var s = new Slot(newval, index, this.Slots.Length);
this.Slots.CopyTo(s.Slots, 0);
s.SetSlot(index);
return s;
}
public int[] Slots { get; private set; }
public int Counter { get; set; }
public int Index { get; set; }
}
Example:
static void Main(string[] args)
{
using (var sw = new StreamWriter(#"c:\test\comb50.txt"))
{
foreach (var s in new[] { 2, 3, 4, 5, 6, 7, 8 }.Discover(50))
{
sw.WriteLine(s);
}
}
}
Yields 3095 combinations.

Resources