Problems calculating CRC for NRPE protocol on .NET Micro Framework (Netduino) - endianness

I am trying to write an NRPE interpreter for a Netduino board. This is an Arduino-type board running .NET Micro Framework 4.3. I'm having trouble calculating the CRC that the protocol calls for, which looks like this (original C++ header file snippet):
typedef struct packet_struct {
int16_t packet_version;
int16_t packet_type;
uint32_t crc32_value;
int16_t result_code;
char buffer[1024];
} packet;
There are definitely byte ordering problems because I'm moving from big-endian (Network) to little endian (Netduino/.Net). I have been trying to be careful to reverse and re-reverse the Int16 and Uint32s as they come in and out of my structure. When I re-output a packet I've read in from the wire it is identical, so I believe that much is handled properly. But the CRC I calculate for it is not. The routine I'm calling is Utility.ComputeCRC from the Micro framework
Others have had similar problems in this area, so I'm fortunate enough to have some clues what the problem might be:
The NRPE Protocol Explained
Stack Overflow post about CRC'ing NRPE posts in Python
CRC implementations for Micro
For example, it seems clear the original message is 1034 bytes, padded to 1036. Where I'm not so fortunate is that I'm on the constrained Micro environment, and all the example code for CRC I can find generally involves templates, Linq, or other libraries I don't have access to.
All help appreciated. Here's some sample code where I attempt to re-compute a CRC from an existing valid packet unsuccessfully.
Code Output:
Original 1036 bytes: 0002000174D13FD5426E5F4E5250455F434845434B0000000000000000...
Original CRC: 3FD574D1
1036 bytes with zeroed out checksum: 0002000100000000426E5F4E5250455F434845434B00000000000000....
Re-computed checksum (0xFFFF seed): F5B1C55A
Actual Code:
using System;
using System.Text;
// .NET Micro Framework 4.3
using Microsoft.SPOT;
using Microsoft.SPOT.Hardware;
namespace TestApp
{
public class Program
{
/// <summary>
/// These are the bytes as-received from the wire, hex encoded for readability here.
/// </summary>
private const string OriginalNetworkBytes = "0002000174D13FD5426E5F4E5250455F434845434B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000";
/// <summary>
/// Index the CRC starts at in the original message
/// </summary>
private const int CrcIndex = 4;
public static void Main()
{
byte[] bufferBytes = StringToByteArrayFastest(OriginalNetworkBytes);
PrintBytesInHex("Original " + bufferBytes.Length + " bytes: ", bufferBytes);
UInt32 originalCrc = ParseSwappedUInt32(bufferBytes, CrcIndex);
Debug.Print("Original CRC: " + originalCrc.ToString("X"));
// Zero out CRC, then attempt to recompute the CRC
ZeroOutChecksum(bufferBytes);
PrintBytesInHex(bufferBytes.Length + " bytes with zeroed out checksum: ", bufferBytes);
uint computedCrc = Utility.ComputeCRC(bufferBytes, 0, bufferBytes.Length, 0xFFFF);
Debug.Print("Re-computed checksum (0xFFFF seed): " + computedCrc.ToString("X"));
}
/// <summary>
/// From this fine Stack Overflow post:
/// https://stackoverflow.com/questions/321370/convert-hex-string-to-byte-array
/// Because as the author points out, "also works on .NET Micro Framework where (in SDK4.3) byte.Parse(string) only
/// permits integer formats."
/// </summary>
/// <param name="hex"></param>
/// <returns></returns>
public static byte[] StringToByteArrayFastest(string hex)
{
if (hex.Length%2 == 1)
throw new Exception("The binary key cannot have an odd number of digits");
var arr = new byte[hex.Length >> 1];
for (int i = 0; i < hex.Length >> 1; ++i)
{
arr[i] = (byte) ((GetHexVal(hex[i << 1]) << 4) + (GetHexVal(hex[(i << 1) + 1])));
}
return arr;
}
public static int GetHexVal(char hex)
{
int val = hex;
//For uppercase A-F letters:
return val - (val < 58 ? 48 : 55);
//For lowercase a-f letters:
//return val - (val < 58 ? 48 : 87);
//Or the two combined, but a bit slower:
//return val - (val < 58 ? 48 : (val < 97 ? 55 : 87));
}
public static UInt32 ParseSwappedUInt32(byte[] byteArray, int arrayIndex)
{
byte[] swappedBytes = ByteSwapper(byteArray, arrayIndex, 4);
return BitConverter.ToUInt32(swappedBytes, 0);
}
public static byte[] ByteSwapper(byte[] array, int incomingArrayIndex, int countOfBytesToSwap)
{
if (countOfBytesToSwap%2 != 0)
{
throw new Exception("Bytes to be swapped must be divisible by 2; you requested " + countOfBytesToSwap);
}
int outgoingArrayIndex = 0;
byte lastByte = 0;
var arrayToReturn = new byte[countOfBytesToSwap];
int finalArrayIndex = incomingArrayIndex + countOfBytesToSwap;
for (int arrayIndex = incomingArrayIndex; arrayIndex < finalArrayIndex; arrayIndex++)
{
bool isEvenIndex = arrayIndex%2 == 0 || arrayIndex == 0;
byte currentByte = array[arrayIndex];
if (isEvenIndex)
{
// Store current byte for next pass through
lastByte = currentByte;
}
else
{
// Swap two bytes, put into outgoing array
arrayToReturn[outgoingArrayIndex] = currentByte;
arrayToReturn[outgoingArrayIndex + 1] = lastByte;
outgoingArrayIndex += 2;
}
}
return arrayToReturn;
}
private static void ZeroOutChecksum(byte[] messageBytesToClear)
{
messageBytesToClear[CrcIndex] = 0;
messageBytesToClear[CrcIndex + 1] = 0;
messageBytesToClear[CrcIndex + 2] = 0;
messageBytesToClear[CrcIndex + 3] = 0;
}
/// <summary>
/// Debug function to output the message as a hex string
/// </summary>
public static void PrintBytesInHex(string messageLabel, byte[] messageBytes)
{
string hexString = BytesToHexString(messageBytes);
Debug.Print(messageLabel + hexString);
}
private static string BytesToHexString(byte[] messageBytes)
{
var sb = new StringBuilder();
foreach (byte b in messageBytes)
{
sb.Append(b.ToString("X2"));
}
string hexString = sb.ToString();
return hexString;
}
}
}

I eventually worked out a solution.
The full thing is documented here:
http://www.skyscratch.com/2014/04/02/rats-ate-the-washing-machine-or-a-nagios-nrpe-environmental-monitor-for-netduino/
The relevant CRC code goes like this:
using System;
namespace FloodSensor
{
/// <summary>
/// Ported from https://github.com/KristianLyng/nrpe/blob/master/src/utils.c
/// I am not sure if this was strictly necessary, but then I could not seem to get Utility.ComputeCRC
/// (http://msdn.microsoft.com/query/dev11.query?appId=Dev11IDEF1&l=EN-US&k=k%28Microsoft.SPOT.Hardware.Utility.ComputeCRC%29;k%28TargetFrameworkMoniker-.NETMicroFramework)
/// to return the same result as this function, no matter what seed I tried with it.
/// </summary>
class NrpeCrc
{
private const int CrcTableLength = 256;
static private readonly UInt32[] Crc32Table = new UInt32[CrcTableLength];
public NrpeCrc()
{
generateCrc32Table();
}
// Build the crc table - must be called before calculating the crc value
private void generateCrc32Table()
{
const uint poly = 0xEDB88320;
for (int i = 0; i < 256; i++)
{
var crc = (UInt32)i;
for (int j = 8; j > 0; j--)
{
if ((crc & (UInt32)1) > 0)
{
crc = (crc >> 1) ^ poly;
}
else
{
crc >>= 1;
}
}
Crc32Table[i] = crc;
}
}
/// <summary>
/// Calculates the CRC 32 value for a buffer
/// </summary>
public UInt32 CalculateCrc32(byte[] buffer, int bufferSize)
{
int currentIndex;
uint crc = 0xFFFFFFFF;
for (currentIndex = 0; currentIndex < bufferSize; currentIndex++)
{
int thisChar = buffer[currentIndex];
crc = ((crc >> 8) & 0x00FFFFFF) ^ Crc32Table[(crc ^ thisChar) & 0xFF];
}
return (crc ^ 0xFFFFFFFF);
}
}
}
See also https://github.com/StewLG/NetduinoNrpe/blob/master/FloodSensor/NrpeServer/NrpeCrc.cs

Related

Incompatible types in assigmnet of int to char[16] error - Arduino UNO

I've created an array of structs but I'm getting the error written on the title of this question. I'm still new to this so I was wondering if I could get some help.
Code:
void setup() {
// put your setup code here, to run once:
Serial.begin(9600);
lcd.begin(16, 2);
i = 0;
}
#define LIMIT 27
struct protocol {
char create[16];
char character;
int values;
int minimum;
int maximum;
};
struct protocol channels[LIMIT];
int i;
void create_channels() {
if (Serial.available() > 0) {
Serial.print("Enter the channel description");
channels[i].create = Serial.read();
Serial.print("Enter the starting character: ");
channels[i].character = Serial.read();
if (i == LIMIT) {
for (i = 0; i < LIMIT; i++)
{
Serial.println(channels[i].create);
Serial.println(channels[i].character);
}
i = 0;
}
}
}
Error:
cw.ino:24:38: error: incompatible types in assignment of 'int' to 'char [16]'
channels[i].create = Serial.read();

Implementing FNV hash in swift

I am trying to implement a version of FNV hash in swift. Here it is in Objective-C:
+ (uint32_t)hash:(uint8_t *)a length:(uint32_t)length
{
uint8_t *p;
uint32_t x;
p = a;
x = *p << 7;
for (int i=0; i<length; i++) {
x = (1000003 * x) ^ *p++;
x ^= length;
}
if (x == -1) {
x = -2;
}
return x;
}
Here is my attempt at porting it to swift:
func hashFNV(data: UInt8[]) -> UInt32 {
var x = data[0] << 7
for byte in data {
x *= 1000003
x ^= byte
x ^= data.count
}
if x == -1 {
x = -2
}
return x
}
It compiles but results in an error at runtime:
EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP,subcode=0x0)
Same error when I try in the playground:
Playground execution failed: error: Execution was interrupted, reason: EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0).
The process has been left at the point where it was interrupted, use "thread return -x" to return to the state before expression evaluation.
* thread #1: tid = 0x619fa, 0x000000010d119aad, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
* frame #0: 0x000000010d119aad
frame #1: 0x0000000100204880 libswift_stdlib_core.dylib`value witness table for Swift.Int + 160
I thought that maybe it was related to the overflow, but the following code also fails with the same error:
func hashFNV(data: UInt8[]) -> UInt32 {
var x = UInt32(data[0]) << 7
for byte in data {
x = 1000003 &* x
x ^= byte
x ^= data.count
}
if x == -1 {
x = -2
}
return x
}
EDIT:
Actually, shouldn't the fact that I am trying to assign -2 to x result in a compile error? I thought swift won't implicitly cast from what looks like Int (-2) to UInt32 (x).
Same with the x ^= byte line. byte should be UInt8 and x is UInt32.
EDIT 2:
This was a compile error (see comments below).
Fixed the compile error, still fails at runtime:
func hashFNV(data: UInt8[]) -> UInt32 {
var x = Int(data[0]) << 7
for byte in data {
x = 1000003 &* x
x ^= Int(byte)
x ^= data.count
}
if x == -1 {
x = -2
}
return UInt32(x)
}
If you are still looking for an implementation, here is mine. It is built much like the regular default Hasher from the standard lib.
struct HasherFNV1a {
private var hash: UInt = 14_695_981_039_346_656_037
private let prime: UInt = 1_099_511_628_211
mutating func combine<S: Sequence>(_ sequence: S) where S.Element == UInt8 {
for byte in sequence {
hash ^= UInt(byte)
hash = hash &* prime
}
}
func finalize() -> Int {
Int(truncatingIfNeeded: hash)
}
}
extension HasherFNV1a {
mutating func combine(_ string: String) {
combine(string.utf8)
}
mutating func combine(_ bool: Bool) {
combine(CollectionOfOne(bool ? 1 : 0))
}
}
Keep in mind that this is FNV1a, if you truly need FNV1 you can just switch the 2 lines in the loop around.
I found this GPL Swift implementation:
//
// FNVHash.swift
//
// A Swift implementation of the Fowler–Noll–Vo (FNV) hash function
// See http://www.isthe.com/chongo/tech/comp/fnv/
//
// Created by Mauricio Santos on 3/9/15.
import Foundation
// MARK:- Constants
private struct Constants {
// FNV parameters
#if arch(arm64) || arch(x86_64) // 64-bit
static let OffsetBasis: UInt = 14695981039346656037
static let FNVPrime: UInt = 1099511628211
#else // 32-bit
static let OffsetBasis: UInt = 2166136261
static let FNVPrime: UInt = 16777619
#endif
}
// MARK:- Public API
/// Calculates FNV-1 hash from a raw byte array.
public func fnv1(bytes: [UInt8]) -> UInt {
var hash = Constants.OffsetBasis
for byte in bytes {
hash = hash &* Constants.FNVPrime // &* means multiply with overflow
hash ^= UInt(byte)
}
return hash
}
/// Calculates FNV-1a hash from a raw byte array.
public func fnv1a(bytes: [UInt8]) -> UInt {
var hash = Constants.OffsetBasis
for byte in bytes {
hash ^= UInt(byte)
hash = hash &* Constants.FNVPrime
}
return hash
}
/// Calculates FNV-1 hash from a String using it's UTF8 representation.
public func fnv1(str: String) -> UInt {
return fnv1(bytesFromString(str))
}
/// Calculates FNV-1a hash from a String using it's UTF8 representation.
public func fnv1a(str: String) -> UInt {
return fnv1a(bytesFromString(str))
}
/// Calculates FNV-1 hash from an integer type.
public func fnv1<T: IntegerType>(value: T) -> UInt {
return fnv1(bytesFromNumber(value))
}
/// Calculates FNV-1a hash from an integer type.
public func fnv1a<T: IntegerType>(value: T) -> UInt {
return fnv1a(bytesFromNumber(value))
}
/// Calculates FNV-1 hash from a floating point type.
public func fnv1<T: FloatingPointType>(value: T) -> UInt {
return fnv1(bytesFromNumber(value))
}
/// Calculates FNV-1a hash from a floating point type.
public func fnv1a<T: FloatingPointType>(value: T) -> UInt {
return fnv1a(bytesFromNumber(value))
}
// MARK:- Private helper functions
private func bytesFromString(str: String) -> [UInt8] {
var byteArray = [UInt8]()
for codeUnit in str.utf8 {
byteArray.append(codeUnit)
}
return byteArray
}
private func bytesFromNumber<T>(var value: T) -> [UInt8] {
return withUnsafePointer(&value) {
Array(UnsafeBufferPointer(start: UnsafePointer<UInt8>($0), count: sizeof(T)))
}
}

unable to create second deck from same code for a queue

I have to set up a queue class that implements from a deque class. I need to use this to set up two deck cards with a random order. I have the code below, it works when the first deck is created but for some reason it does not work with the second deck, its the same code that im reusing.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5
at prog.pkg4.Deque.insertOnBack(Prog4.java:93)
at prog.pkg4.Queue.insert(Prog4.java:153)
at prog.pkg4.Prog4.createDeck(Prog4.java:465)
at prog.pkg4.Prog4.topTrump(Prog4.java:444)
at prog.pkg4.Prog4.main(Prog4.java:287)
initiates the two decks
Queue player = new Queue();
Queue computer = new Queue();
player = createDeck(player, cards);
computer = createDeck(computer, cards);
code to create random deck
public static Queue createDeck(Queue queue, GreekHero[] cards){
Random rand = new Random();
int temp = 0;
int r;
for(int i = 0; i < 30; i++){
r = rand.nextInt(30);
cards[temp] = cards[i];
cards[i] = cards[r];
cards[r] = cards[temp];
}
for(int i = 0; i < 29; i++){
queue.insert(cards[i]);
System.out.println(queue.insertions());
System.out.println(queue);
}
return queue;
}
class Queue{
private Deque queue;
public Queue(){
queue = new Deque();
}
public void insert(Object o){
queue.insertOnBack(o);
}
public Object delete(){
return queue.deleteFromFront();
}
public boolean isEmpty(){
return queue.isEmpty();
}
public String toString(){
return queue.toString();
}
public int insertions(){
return queue.getInsertions();
}
}
i've tested out the deque code several times i know it works, as demonstrated by the first deck that is created, im just not sure what could be causing the problem for the second deck.
EDIT: I've added the Deque class code below, the way i have this set up is that if the number of insertions equals the size of the array, it should double in size, as mentioned before it works with the first deque but on the second deque it stops at size of array - 1. I've increased the size to test out and I could make it bigger to satisfy this project but I need to create a deque with an increasing array.
class Deque{
private Object[] arrayObject;
private int beggining; //tracks first element in array
private int insertions; //counts the items in the array
private static int SIZE = 30; //size of array
public Deque(){
arrayObject = new Object[SIZE];
beggining = 0;
insertions = 0;
}
// displays position of first element in circular array
public Object getBeggining(){
int temp = beggining + 1;
if(temp == SIZE)
temp = 0;
return temp;
}
public int getInsertions(){
return insertions;
}
public Object indexOne(){
int temp = beggining + 1;
if(temp == SIZE)
temp = 0;
return arrayObject[temp];
}
public String toString(){
if(isEmpty())
return "Empty";
int temp = beggining + 1;
if( temp >= SIZE)
temp = 0;
String s = "Current Index:\n[("+arrayObject[temp]+")";
int loops = 0;
for(int i = temp + 1; loops < insertions - 1; i++){
if(i >= SIZE)
i = 0;
s += ", ("+arrayObject[i]+")";
loops++;
}
s += "]";
return s;
}
public String toStore(){
String s = "Store Index:\n[(1: "+arrayObject[1]+")";
for(int i = 1; i <= SIZE - 1; i++)
s += ", ("+(i+1)+": "+arrayObject[i]+")";
s += "]";
return s;
}
public void insertOnFront(Object o){
if(insertions == SIZE)
arrayObject = increaseArray();
arrayObject[beggining] = o;
beggining--;
if(beggining < 0)
beggining = SIZE - 1;
insertions++;
}
public Object deleteFromFront(){
if(isEmpty())
return null;
int count = beggining + 1;
if(count >= SIZE)
count = 0;
Object temp = arrayObject[count];
beggining += 1;
insertions--;
if(insertions > 0)
insertions = 0;
return temp;
}
public void insertOnBack(Object o){
int temp = beggining + insertions + 1;
if(insertions == SIZE - 1)
arrayObject = increaseArray();
if(temp >= SIZE)
temp = 0 + (temp - SIZE);
arrayObject[temp] = o;
insertions++;
}
public Object deleteFromBack(){
if(isEmpty())
return null;
int count = beggining + insertions;
Object temp = arrayObject[count];
insertions--;
if(insertions >= 0)
insertions = 0;
return temp;
}
public boolean isEmpty(){
if(insertions > 0)
return false;
else
return true;
}
public Object[] increaseArray(){
SIZE *= 2;
int loops = 0;
int j = beggining;
Object[] newArray = new Object[SIZE];
for(int i = j; loops <= SIZE/2; i++){
if(j >= SIZE/2)
j = 0;
newArray[i] = arrayObject[j];
loops++;
j++;
}
return newArray;
}
}
I solved the issue by moving the SIZE variable as an instance variable of the class and removed static from it. I don't know why the issue popped in on the second iteration rather than on the first try, ill look it up later, if anyone knows please post it here.

Finding the index of the first word starting with a given alphabet form a alphabetically sorted list

Based on the current implementation, I will get an arraylist which contains some 1000 unique names in the alphabetically sorted order(A-Z or Z-A) from some source.
I need to find the index of the first word starting with a given alphabet.
So to be more precise, when I select an alphabet, for eg. "M", it should give me the index of the first occurrence of the word starting in "M" form the sorted list.
And that way I should be able to find the index of all the first words starting in each of the 26 alphabets.
Please help me find a solution which doesn't compromise on the speed.
UPDATE:
Actually after getting the 1000 unique names, the sorting is also done by one of my logics.
If this can be done while doing the sorting itself, I can avoid the reiteration on the list after sorting to find the indices for the alphabets.
Is that possible?
Thanks,
Sen
I hope this little piece of code will help you. I guessed the question is related to Java, because you mentioned ArrayList.
String[] unsorted = {"eve", "bob", "adam", "mike", "monica", "Mia", "marta", "pete", "Sandra"};
ArrayList<String> names = new ArrayList<String>(Arrays.asList(unsorted));
String letter = "M"; // find index of this
class MyComp implements Comparator<String>{
String first = "";
String letter;
MyComp(String letter){
this.letter = letter.toUpperCase();
}
public String getFirst(){
return first;
}
#Override
public int compare(String s0, String s1) {
if(s0.toUpperCase().startsWith(letter)){
if(s0.compareTo(first) == -1 || first.equals("")){
first = s0;
}
}
return s0.toUpperCase().compareTo(s1.toUpperCase());
}
};
MyComp mc = new MyComp(letter);
Collections.sort(names, mc);
int index = names.indexOf(mc.getFirst()); // the index of first name starting with letter
I'm not sure if it's possible to also store the index of the first name in the comparator without much overhead. Anyway, if you implement your own version of sorting algorithm e.g. quicksort, you should know about the index of the elements and could calculate the index while sorting. This depends on your chosen sorting algorithm and implementation. In fact if I know how your sorting is implemented, we could insert the index calculation.
So I came up with my own solution for this.
package test.binarySearch;
import java.util.Random;
/**
*
* Binary search to find the index of the first starting in an alphabet
*
* #author Navaneeth Sen <navaneeth.sen#multichoice.co.za>
*/
class SortedWordArray
{
private final String[] a; // ref to array a
private int nElems; // number of data items
public SortedWordArray(int max) // constructor
{
a = new String[max]; // create array
nElems = 0;
}
public int size()
{
return nElems;
}
public int find(String searchKey)
{
return recFind(searchKey, 0, nElems - 1);
}
String array = null;
int arrayIndex = 0;
private int recFind(String searchKey, int lowerBound,
int upperBound)
{
int curIn;
curIn = (lowerBound + upperBound) / 2;
if (a[curIn].startsWith(searchKey))
{
array = a[curIn];
if ((curIn == 0) || !a[curIn - 1].startsWith(searchKey))
{
return curIn; // found it
}
else
{
return recFind(searchKey, lowerBound, curIn - 1);
}
}
else if (lowerBound > upperBound)
{
return -1; // can't find it
}
else // divide range
{
if (a[curIn].compareTo(searchKey) < 0)
{
return recFind(searchKey, curIn + 1, upperBound);
}
else // it's in lower half
{
return recFind(searchKey, lowerBound, curIn - 1);
}
} // end else divide range
} // end recFind()
public void insert(String value) // put element into array
{
int j;
for (j = 0; j < nElems; j++) // find where it goes
{
if (a[j].compareTo(value) > 0) // (linear search)
{
break;
}
}
for (int k = nElems; k > j; k--) // move bigger ones up
{
a[k] = a[k - 1];
}
a[j] = value; // insert it
nElems++; // increment size
} // end insert()
public void display() // displays array contents
{
for (int j = 0; j < nElems; j++) // for each element,
{
System.out.print(a[j] + " "); // display it
}
System.out.println("");
}
} // end class OrdArray
class BinarySearchWordApp
{
static final String AB = "12345aqwertyjklzxcvbnm";
static Random rnd = new Random();
public static String randomString(int len)
{
StringBuilder sb = new StringBuilder(len);
for (int i = 0; i < len; i++)
{
sb.append(AB.charAt(rnd.nextInt(AB.length())));
}
return sb.toString();
}
public static void main(String[] args)
{
int maxSize = 100000; // array size
SortedWordArray arr; // reference to array
int[] indices = new int[27];
arr = new SortedWordArray(maxSize); // create the array
for (int i = 0; i < 100000; i++)
{
arr.insert(randomString(10)); //insert it into the array
}
arr.display(); // display array
String searchKey;
for (int i = 97; i < 124; i++)
{
searchKey = (i == 123)?"1":Character.toString((char) i);
long time_1 = System.currentTimeMillis();
int result = arr.find(searchKey);
long time_2 = System.currentTimeMillis() - time_1;
if (result != -1)
{
indices[i - 97] = result;
System.out.println("Found " + result + "in "+ time_2 +" ms");
}
else
{
if (!(i == 97))
{
indices[i - 97] = indices[i - 97 - 1];
}
System.out.println("Can't find " + searchKey);
}
}
for (int i = 0; i < indices.length; i++)
{
System.out.println("Index [" + i + "][" + (char)(i+97)+"] = " + indices[i]);
}
} // end main()
}
All comments welcome.

Finding the longest repeated substring

What would be the best approach (performance-wise) in solving this problem?
I was recommended to use suffix trees. Is this the best approach?
Check out this link: http://introcs.cs.princeton.edu/java/42sort/LRS.java.html
/*************************************************************************
* Compilation: javac LRS.java
* Execution: java LRS < file.txt
* Dependencies: StdIn.java
*
* Reads a text corpus from stdin, replaces all consecutive blocks of
* whitespace with a single space, and then computes the longest
* repeated substring in that corpus. Suffix sorts the corpus using
* the system sort, then finds the longest repeated substring among
* consecutive suffixes in the sorted order.
*
* % java LRS < mobydick.txt
* ',- Such a funny, sporty, gamy, jesty, joky, hoky-poky lad, is the Ocean, oh! Th'
*
* % java LRS
* aaaaaaaaa
* 'aaaaaaaa'
*
* % java LRS
* abcdefg
* ''
*
*************************************************************************/
import java.util.Arrays;
public class LRS {
// return the longest common prefix of s and t
public static String lcp(String s, String t) {
int n = Math.min(s.length(), t.length());
for (int i = 0; i < n; i++) {
if (s.charAt(i) != t.charAt(i))
return s.substring(0, i);
}
return s.substring(0, n);
}
// return the longest repeated string in s
public static String lrs(String s) {
// form the N suffixes
int N = s.length();
String[] suffixes = new String[N];
for (int i = 0; i < N; i++) {
suffixes[i] = s.substring(i, N);
}
// sort them
Arrays.sort(suffixes);
// find longest repeated substring by comparing adjacent sorted suffixes
String lrs = "";
for (int i = 0; i < N - 1; i++) {
String x = lcp(suffixes[i], suffixes[i+1]);
if (x.length() > lrs.length())
lrs = x;
}
return lrs;
}
// read in text, replacing all consecutive whitespace with a single space
// then compute longest repeated substring
public static void main(String[] args) {
String s = StdIn.readAll();
s = s.replaceAll("\\s+", " ");
StdOut.println("'" + lrs(s) + "'");
}
}
Have a look at http://en.wikipedia.org/wiki/Suffix_array as well - they are quite space-efficient and have some reasonably programmable algorithms to produce them, such as "Simple Linear Work Suffix Array Construction" by Karkkainen and Sanders
Here is a simple implementation of longest repeated substring using simplest suffix tree. Suffix tree is very easy to implement in this way.
#include <iostream>
#include <vector>
#include <unordered_map>
#include <string>
using namespace std;
class Node
{
public:
char ch;
unordered_map<char, Node*> children;
vector<int> indexes; //store the indexes of the substring from where it starts
Node(char c):ch(c){}
};
int maxLen = 0;
string maxStr = "";
void insertInSuffixTree(Node* root, string str, int index, string originalSuffix, int level=0)
{
root->indexes.push_back(index);
// it is repeated and length is greater than maxLen
// then store the substring
if(root->indexes.size() > 1 && maxLen < level)
{
maxLen = level;
maxStr = originalSuffix.substr(0, level);
}
if(str.empty()) return;
Node* child;
if(root->children.count(str[0]) == 0) {
child = new Node(str[0]);
root->children[str[0]] = child;
} else {
child = root->children[str[0]];
}
insertInSuffixTree(child, str.substr(1), index, originalSuffix, level+1);
}
int main()
{
string str = "banana"; //"abcabcaacb"; //"banana"; //"mississippi";
Node* root = new Node('#');
//insert all substring in suffix tree
for(int i=0; i<str.size(); i++){
string s = str.substr(i);
insertInSuffixTree(root, s, i, s);
}
cout << maxLen << "->" << maxStr << endl;
return 1;
}
/*
s = "mississippi", return "issi"
s = "banana", return "ana"
s = "abcabcaacb", return "abca"
s = "aababa", return "aba"
*/
the LRS problem is one that is best solved using either a suffix tree or a suffix array. Both approaches have a best time complexity of O(n).
Here is an O(nlog(n)) solution to the LRS problem using a suffix array. My solution can be improved to O(n) if you have a linear construction time algorithm for the suffix array (which is quite hard to implement). The code was taken from my library. If you want more information on how suffix arrays work make sure to check out my tutorials
/**
* Finds the longest repeated substring(s) of a string.
*
* Time complexity: O(nlogn), bounded by suffix array construction
*
* #author William Fiset, william.alexandre.fiset#gmail.com
**/
import java.util.*;
public class LongestRepeatedSubstring {
// Example usage
public static void main(String[] args) {
String str = "ABC$BCA$CAB";
SuffixArray sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
str = "aaaaa";
sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
str = "abcde";
sa = new SuffixArray(str);
System.out.printf("LRS(s) of %s is/are: %s\n", str, sa.lrs());
}
}
class SuffixArray {
// ALPHABET_SZ is the default alphabet size, this may need to be much larger
int ALPHABET_SZ = 256, N;
int[] T, lcp, sa, sa2, rank, tmp, c;
public SuffixArray(String str) {
this(toIntArray(str));
}
private static int[] toIntArray(String s) {
int[] text = new int[s.length()];
for(int i=0;i<s.length();i++)text[i] = s.charAt(i);
return text;
}
// Designated constructor
public SuffixArray(int[] text) {
T = text;
N = text.length;
sa = new int[N];
sa2 = new int[N];
rank = new int[N];
c = new int[Math.max(ALPHABET_SZ, N)];
construct();
kasai();
}
private void construct() {
int i, p, r;
for (i=0; i<N; ++i) c[rank[i] = T[i]]++;
for (i=1; i<ALPHABET_SZ; ++i) c[i] += c[i-1];
for (i=N-1; i>=0; --i) sa[--c[T[i]]] = i;
for (p=1; p<N; p <<= 1) {
for (r=0, i=N-p; i<N; ++i) sa2[r++] = i;
for (i=0; i<N; ++i) if (sa[i] >= p) sa2[r++] = sa[i] - p;
Arrays.fill(c, 0, ALPHABET_SZ, 0);
for (i=0; i<N; ++i) c[rank[i]]++;
for (i=1; i<ALPHABET_SZ; ++i) c[i] += c[i-1];
for (i=N-1; i>=0; --i) sa[--c[rank[sa2[i]]]] = sa2[i];
for (sa2[sa[0]] = r = 0, i=1; i<N; ++i) {
if (!(rank[sa[i-1]] == rank[sa[i]] &&
sa[i-1]+p < N && sa[i]+p < N &&
rank[sa[i-1]+p] == rank[sa[i]+p])) r++;
sa2[sa[i]] = r;
} tmp = rank; rank = sa2; sa2 = tmp;
if (r == N-1) break; ALPHABET_SZ = r + 1;
}
}
// Use Kasai algorithm to build LCP array
private void kasai() {
lcp = new int[N];
int [] inv = new int[N];
for (int i = 0; i < N; i++) inv[sa[i]] = i;
for (int i = 0, len = 0; i < N; i++) {
if (inv[i] > 0) {
int k = sa[inv[i]-1];
while( (i + len < N) && (k + len < N) && T[i+len] == T[k+len] ) len++;
lcp[inv[i]-1] = len;
if (len > 0) len--;
}
}
}
// Finds the LRS(s) (Longest Repeated Substring) that occurs in a string.
// Traditionally we are only interested in substrings that appear at
// least twice, so this method returns an empty set if this is not the case.
// #return an ordered set of longest repeated substrings
public TreeSet <String> lrs() {
int max_len = 0;
TreeSet <String> lrss = new TreeSet<>();
for (int i = 0; i < N; i++) {
if (lcp[i] > 0 && lcp[i] >= max_len) {
// We found a longer LRS
if ( lcp[i] > max_len )
lrss.clear();
// Append substring to the list and update max
max_len = lcp[i];
lrss.add( new String(T, sa[i], max_len) );
}
}
return lrss;
}
public void display() {
System.out.printf("-----i-----SA-----LCP---Suffix\n");
for(int i = 0; i < N; i++) {
int suffixLen = N - sa[i];
String suffix = new String(T, sa[i], suffixLen);
System.out.printf("% 7d % 7d % 7d %s\n", i, sa[i],lcp[i], suffix );
}
}
}
public class LongestSubString {
public static void main(String[] args) {
String s = findMaxRepeatedString("ssssssssssss this is a ddddddd word with iiiiiiiiiis and loads of these are ppppppppppppps");
System.out.println(s);
}
private static String findMaxRepeatedString(String s) {
Processor p = new Processor();
char[] c = s.toCharArray();
for (char ch : c) {
p.process(ch);
}
System.out.println(p.bigger());
return new String(new char[p.bigger().count]).replace('\0', p.bigger().letter);
}
static class CharSet {
int count;
Character letter;
boolean isLastPush;
boolean assign(char c) {
if (letter == null) {
count++;
letter = c;
isLastPush = true;
return true;
}
return false;
}
void reassign(char c) {
count = 1;
letter = c;
isLastPush = true;
}
boolean push(char c) {
if (isLastPush && letter == c) {
count++;
return true;
}
return false;
}
#Override
public String toString() {
return "CharSet [count=" + count + ", letter=" + letter + "]";
}
}
static class Processor {
Character previousLetter = null;
CharSet set1 = new CharSet();
CharSet set2 = new CharSet();
void process(char c) {
if ((set1.assign(c)) || set1.push(c)) {
set2.isLastPush = false;
} else if ((set2.assign(c)) || set2.push(c)) {
set1.isLastPush = false;
} else {
set1.isLastPush = set2.isLastPush = false;
smaller().reassign(c);
}
}
CharSet smaller() {
return set1.count < set2.count ? set1 : set2;
}
CharSet bigger() {
return set1.count < set2.count ? set2 : set1;
}
}
}
I had an interview and I needed to solve this problem. This is my solution:
public class FindLargestSubstring {
public static void main(String[] args) {
String test = "ATCGATCGA";
System.out.println(hasRepeatedSubString(test));
}
private static String hasRepeatedSubString(String string) {
Hashtable<String, Integer> hashtable = new Hashtable<>();
int length = string.length();
for (int subLength = length - 1; subLength > 1; subLength--) {
for (int i = 0; i <= length - subLength; i++) {
String sub = string.substring(i, subLength + i);
if (hashtable.containsKey(sub)) {
return sub;
} else {
hashtable.put(sub, subLength);
}
}
}
return "No repeated substring!";
}}
There are way too many things that affect performance for us to answer this question with only what you've given us. (Operating System, language, memory issues, the code itself)
If you're just looking for a mathematical analysis of the algorithm's efficiency, you probably want to change the question.
EDIT
When I mentioned "memory issues" and "the code" I didn't provide all the details. The length of the strings you will be analyzing are a BIG factor. Also, the code doesn't operate alone - it must sit inside a program to be useful. What are the characteristics of that program which impact this algorithm's use and performance?
Basically, you can't performance tune until you have a real situation to test. You can make very educated guesses about what is likely to perform best, but until you have real data and real code, you'll never be certain.

Resources