How do I make my JSON less verbose? - ajax

I'm currently developing a web application and using JSON for ajax requests and responses. I have an area where I return a very large dataset to the client in the form of an array of over 10000 objects. Here's part of the example (its been simplified somewhat):
"schedules": [
{
"codePractice": 35,
"codeScheduleObject": 576,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 12,
"name": "Dr. 1"
},
{
"codePractice": 35,
"codeScheduleObject": 169,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 43,
"name": "Dr. 2"
},
{
"codePractice": 35,
"codeScheduleObject": 959,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 76,
"name": "Dr. 3"
}
]
As, you can imagine, with a very large number of objects in this array, the size of the JSON reponse can be quite large.
My question is, is there a JSON stringifier/parser that would convert the "schedules" array to look something like this as a JSON string:
"schedules": [
["codePractice", "codeScheduleObject", "codeLogin", "codeScheduleObjectType", "defaultCodeScheduleObject","name"],
[35, 576, "", 12, "Dr. 1"],
[35, 169, "", 43, "Dr. 2"],
[35, 959, "", 76, "Dr. 3"],
]
ie, that there would be an array at the beginning of the "schedules" array that held the keys of the objects this array, and all of the other container arrays would hold the values.
I could, if I wanted, do the conversion on the server and parse it on the client, but I'm wondering if there's a standard library for parsing/stringifying large JSON?
I could also run it through a minifier, but I'd like to keep the keys I have currently as they give some context within the application.
I'm also hoping you might critique my approach here or suggest alternatives?

HTTP compression (i.e. gzip or deflate) already does exactly that. Repeated patterns, like your JSON keys, are replaced with tokens so that the verbose pattern only has to occur once per transmission.

Not an answer, but to give a rough estimate of "savings" based on 10k entries and some bogus data :-) This is in response to a comment I posted. Will the added complexity make the schema'ized approach worth it?
"It depends."
This C# is LINQPad and is ready-to-go for testing/modifying:
string LongTemplate (int n1, int n2, int n3, string name) {
return string.Format(#"
{{
""codePractice"": {0},
""codeScheduleObject"": {1},
""codeScheduleObjectType"": """",
""defaultCodeScheduleObject"": {2},
""name"": ""Dr. {3}""
}}," + "\n", n1, n2, n3, name);
}
string ShortTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0}, {1}, \"\", {2}, \"Dr. {3}\"],\n",
n1, n2, n3, name);
}
string MinTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0},{1},\"\",{2},\"Dr. {3}\"],",
n1, n2, n3, name);
}
long GZippedSize (string s) {
var ms = new MemoryStream();
using (var gzip = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress, true))
using (var sw = new StreamWriter(gzip)) {
sw.Write(s);
}
return ms.Position;
}
void Main()
{
var r = new Random();
var l = new StringBuilder();
var s = new StringBuilder();
var m = new StringBuilder();
for (int i = 0; i < 10000; i++) {
var n1 = r.Next(10000);
var n2 = r.Next(10000);
var n3 = r.Next(10000);
var name = "bogus" + r.Next(50);
l.Append(LongTemplate(n1, n2, n3, name));
s.Append(ShortTemplate(n1, n2, n3, name));
m.Append(MinTemplate(n1, n2, n3, name));
}
var lc = GZippedSize(l.ToString());
var sc = GZippedSize(s.ToString());
var mc = GZippedSize(s.ToString());
Console.WriteLine(string.Format("Long:\tNormal={0}\tGZip={1}\tCompressed={2:P}", l.Length, lc, (float)lc / l.Length));
Console.WriteLine(string.Format("Short:\tNormal={0}\tGZip={1}\tCompressed={2:P}", s.Length, sc, (float)sc / s.Length));
Console.WriteLine(string.Format("Min:\tNormal={0}\tGZip={1}\tCompressed={2:P}", m.Length, mc, (float)mc / m.Length));
Console.WriteLine(string.Format("Short/Long\tRegular={0:P}\tGZip={1:P}",
(float)s.Length / l.Length, (float)sc / lc));
Console.WriteLine(string.Format("Min/Long\tRegular={0:P}\tGZip={1:P}",
(float)m.Length / l.Length, (float)mc / lc));
}
My results:
Long: Normal=1754614 GZip=197053 Compressed=11.23 %
Short: Normal=384614 GZip=128252 Compressed=33.35 %
Min: Normal=334614 GZip=128252 Compressed=38.33 %
Short/Long Regular=21.92 % GZip=65.09 %
Min/Long Regular=19.07 % GZip=65.09 %
Conclusion:
The single biggest savings is to use GZIP (better than just using schema'ize).
GZIP + schema'ized will be the smallest overall.
With GZIP there is no point to use a normal JavaScript minimizer (in this scenario).
Use GZIP (e.g. DEFLATE); it performs very well on repetitive structured text (900% compression on normal!).
Happy coding.

Here's an article that does pretty much what you're looking to do:
http://stevehanov.ca/blog/index.php?id=104
At first glance, it looks like your example would be compressed down to the following after the first step of the algorithm, which will actually do more work on it in subsequent steps):
{
"templates": [
["codePractice", "codeScheduleObject", "codeScheduleObjectType", "defaultCodeScheduleObject", "name"]
],
"values": [
{ "type": 1, "values": [ 35, 576, "", 12, "Dr. 1" ] },
{ "type": 1, "values": [ 35, 169, "", 43, "Dr. 2" ] },
{ "type": 1, "values": [ 35, 959, "", 76, "Dr. 3" ] }
]
}
You can start to see the benefit of the algorithm already. Here's the final output after running it through the compressor:
{
"f" : "cjson",
"t" : [
[0,"schedules"],
[0,"codePractice","codeScheduleObject","codeScheduleObjectType","defaultCodeScheduleObject","name"]
],
"v" : {
"" : [ 1, [
{ "" : [2, 35, 576, "", 12, "Dr. 1"] },
{ "" : [2, 35, 169, "", 43, "Dr. 2"] },
{ "" : [2, 35, 959, "", 76, "Dr. 3"] }
]
]
}
}
One can obviously see the improvement if you have several thousands of records. The output is still readable, but I think the other guys are right too: a good compression algorithm is going to remove the blocks of text that are repeated anyway...

Before you change your JSON schema give this a shot
http://httpd.apache.org/docs/2.0/mod/mod_deflate.html

For the record, i am doing exactly it in php. Its a list of objects from a database.
$comp=base64_encode(gzcompress(json_encode($json)));
json: string(22501 length)
gz compressed = string(711) but its a binary format.
gz compressed + base64 = string(948) its a text format.
So, its considerably smaller by using a fraction of second.

Related

d3 stack on data without header

My csv is dynamically generated and doesn't have any headers because the number of columns and rows are varying with each run. An example below
A, 30, 40, 35, 25
B, 25, 35, 45, 35
Which if there were headers would look like as below
Age1, Age2, Age1, Age2
A, 30, 40, 35, 25
B, 25, 35, 45, 35
For each row the data is in pairs, i.e. col1 & col[2] need to be stacked and col [3] & col [4] need to be stacked. Goal is to have a clustered stacked bar chart with A and B in X axis and two stacked bars for each pair.
I was trying to follow the example at https://bl.ocks.org/SpaceActuary/6233700e7f443b719855a227f4749ee5 but I am not able to get, how to use the stack function in the absence of headers/keys.
You can use d3.text to load the CSV data, and then iterate over the text to create an array of objects with named values, which could then be stacked or whatever you would normally do in D3 with your data
d3.text("data.csv", function(text) {
console.log(text);
var data = []
d3.csvParseRows(text).forEach(function(row) {
let obj = {}
row.forEach(function(value, i) {
let pairIndex = Math.floor((i - 1) / 2)
//assume first value is the index or name for the row, eg A, B, etc
if (i == 0) {
obj.index = value
}
else if (i % 2 == 0) {
let v = "age2-" + pairIndex
obj[v] = value
} else {
let v = "age1-" + pairIndex
obj[v] = value
}
});
data.push(obj)
});
console.log(data);
// continue with your code

Algorithm to maximize the distance (avoid repetition) of picking elements from sets

I'm looking to find an algorithm that successfully generalizes the following problem to n number of sets, but for simplicity assume that there are 4 different sets, each containing 4 elements. Also we can assume that each set always contains an equal number of elements, however there can be any number of elements. So if there are 37 elements in the first set, we can assume there are also 37 elements contained in each of the other sets.
A combination of elements is formed by taking 1 element from the first set and putting it into first place, 1 element from the second set and putting it in the second place, and so on. For example say the first set contains {A0,A1,A2,A3}, the second set contains {B0,B1,B2,B3}, third is {C0,C1,C2,C3} and fourth is {D0,D1,D2,D3}. One possible combination would be [A0, B2, C1, D3].
The goal is to find the path that maximizes the distance when cycling through all the possible combinations, avoiding repetition as much as possible. And avoiding repetition applies to contiguous groups as well as individual columns. For example:
Individual columns
[A0, B0, C0, D0]
[A1, B1, C1, D1]
[A2, B0, C2, D2]
This is incorrect because B0 is repeated sooner than it had to be.
Contiguous groups
[A0, B0, C0, D0]
[A1, B1, C1, D1]
[A2, B2, C2, D2]
[A3, B3, C3, D3]
[A0, B0, C1, D2]
This is incorrect because the contiguous pair (A0, B0) was repeated sooner than it had to be. However if the last one was instead [A0, B1, C0, D1] then this would be alright.
When cycling through all possible combinations the contiguous groups will have to be repeated, but the goal is to maximize the distance between them. So for example if (A0, B0) is used, then ideally all the other first pairs would be used before it's used again.
I was able to find a solution for when there are 3 sets, but I'm having trouble generalizing it to n sets and even solving for 4 sets. Any ideas?
Can you post your solution for three sets?
Sure, first I wrote down all possible combinations. Then I made three 3x3 matrices of entries by grouping the entries where the non-contiguous (first and third) elements were repeated:
(A0,B0,C0)1, (A1,B0,C1)4, (A2,B0,C2)7 (A0,B0,C1)13, (A1,B0,C2)16, (A2,B0,C0)10 (A0,B0,C2)25, (A1,B0,C0)19, (A2,B0,C1)22
(A0,B1,C0)8, (A1,B1,C1)2, (A2,B1,C2)5 (A0,B1,C1)11, (A1,B1,C2)14, (A2,B1,C0)17 (A0,B1,C2)23, (A1,B1,C0)26, (A2,B1,C1)20
(A0,B2,C0)6, (A1,B2,C1)9, (A2,B2,C2)3 (A0,B2,C1)18, (A1,B2,C2)12, (A2,B2,C0)15 (A0,B2,C2)21, (A1,B2,C0)24, (A2,B2,C1)27
Then I realized if I traversed in a diagonal pattern (order indicated by the superscript index) that it would obey the rules. I then wrote the following code to take advantage of this visual pattern:
#Test
public void run() {
List<String> A = new ArrayList<String>();
A.add("0");
A.add("1");
A.add("2");
List<String> B = new ArrayList<String>();
B.add("0");
B.add("1");
B.add("2");
List<String> C = new ArrayList<String>();
C.add("0");
C.add("1");
C.add("2");
int numElements = A.size();
List<String> output = new ArrayList<String>();
int offset = 0;
int nextOffset = 0;
for (int i = 0; i < A.size()*B.size()*C.size(); i++) {
int j = i % numElements;
int k = i / numElements;
if (j == 0 && k%numElements == numElements-1) {
nextOffset = (j+k+offset) % numElements;
}
if (j == 0 && k%numElements == 0) {
offset = nextOffset;
}
String first = A.get((j+k+offset) % numElements);
String second = B.get(j);
String third = C.get((j+k) % numElements);
System.out.println(first + " " + second + " " + third);
output.add(first + second + third);
}
}
However I just realized that this isn't ideal either, since it looks like the pair (A0,B1) is repeated too soon, at indices 8 and 11 :( However I think maybe this is unavoidable, when crossing over from one group to another?.. This is a difficult problem! Harder than it looks
If you can think about and revise your actual requirements
Okay so I decided to remove the restriction of traversing through all possible combinations, and instead reduce the yield a little bit to improve the quality of the results.
The whole point of this is to take elements belonging to a particular set and combine them to form a combination of elements that appear unique. So if I start out with 3 combinations and there are 3 sets, I can break each combination into 3 elements and place the elements into their respective sets. I can then use the algorithm to mix and match the elements and produce 27 seemingly unique combinations -- of course they're formed from derivative elements so they only appear unique as long as you don't look too closely!
So the 3 combinations formed by hand can be turned into 33 combinations, saving a lot of time and energy. Of course this scales up pretty nicely too, if I form 10 combinations by hand then the algorithm can generate 1000 combinations. I probably don't need quite that many combinations anyways, so I can sacrifice some entries to better avoid repetition. In particular with 3 sets I noticed that while my solution was decent, there was some bunching that occurred every numElements2 entries. Here is an example of 3 sets of 5 elements, with an obvious repetition after 25 combinations:
19) A1 B3 C1
20) A2 B4 C2
21) A4 B0 C4 <--
22) A0 B1 C0
23) A1 B2 C1
24) A2 B3 C2
25) A3 B4 C3
26) A0 B0 C4 <--
27) A1 B1 C0
28) A2 B2 C1
29) A3 B3 C2
30) A4 B4 C3
31) A1 B0 C0
32) A2 B1 C1
To fix this we can introduce the following statement and get rid of this bad block:
if (k % numElements == 0) continue;
However this only works when numElements > numSets, otherwise the Individual Columns rule will be broken. (In case you were wondering I also switched the ordering of the first and third sets in this example, did this initially so I wasn't opening with the bad repetition)
Aaannd I'm still completely stuck on how to form an approach for n or even 4 sets. It certainly gets trickier because there are now different sizes of contiguous groups to avoid, contiguous trios as well as pairs.. Any thoughts? Am I crazy for even trying to do this?
Even after the modifications in your question, I'm still not sure exactly what you want. It seems that what you would really like is impossible, but I'm not sure exactly how much relaxation in the conditions is acceptable. Nevertheless, I'll give it a crack.
Oddly there seems to be little literature (that I can find, anyway) covering the subject of your problem, so I had to invent something myself. This is the idea: you are looking for a sequence of points on a multidimensional torus such that elements of the sequence are as far apart as possible in a complicated metric. What this reminds me of is something I learned years ago in a mechanics class, strangely enough. If you have a line on a flat torus with rational slope, the line will loop back onto itself after a few cycles, but if you have a line with irrational slope, the line will densely cover the entire torus.
I don't expect that to mean a lot to many people, but it did give me an idea. The index for each set could step by an irrational amount. You would have to take the floor, of course, and then modulo whatever, but it does seem to cover the bases well, so to speak. The irrational step for each set could be different (and mutually irrational, to use rather loose language).
To make the idea more precise, I wrote a short program. Please check it out.
class Equidistributed {
static final double IRRATIONAL1 = Math.sqrt(2);
static final double IRRATIONAL2 = Math.sqrt(3);
static final double IRRATIONAL3 = Math.sqrt(5)-1;
// four sets of 7 elements each
static int setSize = 7;
public static void main(String[] args) {
for (int i = 0; i < Math.pow(setSize,4); i++) {
String tuple = "";
int j = i % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL1)) % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL2)) % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL3)) % setSize;
tuple += j;
System.out.println(tuple);
}
}
}
I "eyeballed" the results, and they aren't perfect, but they're pretty nice. Plus the program runs quickly. It's for four sets with a variable number of elements (I chose 7 for the example). The irrational numbers I'm using are based on square roots of prime numbers; I subtracted 1 from sqrt(5) so that the result would be in the range between 1 and 2. Each tuple is basically
(i, floor(i*irrational1), floor(i*irrational2), floor(i*irrational3)) mod 7
Statistically that should make the sequence evenly distributed, which is a consequence of what you want. Whether that translates into the right "distance" properties, I can't really be sure. You should probably write a program to test whether a sequence has the property you want, and then pipe the output from my program into the test.
Define an array of all possible combinations.
For each possible order of the array, compute your distance score. If greater than the previous best (default start = 0), then copy the array to your output, overwriting the previous best array.
Assuming values are 1 dimensional, you do not need to compare the distance between every single element. Instead, you can find the maximum and minimum value within each set before comparing it with other sets.
Step 1: Find the element with maximum value and the minimum value within each set (eg A1, A34, B4, B32, C5, C40, with the smaller number with smaller values in this example)
Step 2: Compare A1 with the maximum values of all other sets, and repeat the process for all minimum values.
Generalized algorithm and wrote code to do performance testing:
import java.util.*;
public class Solution {
public static void main(String[] args) throws Exception {
List<String> A = new ArrayList<>();
A.add("A0"); A.add("A1"); A.add("A2");
A.add("A3"); A.add("A4"); A.add("A5"); A.add("A6");
List<String> B = new ArrayList<>();
B.add("B0"); B.add("B1"); B.add("B2");
B.add("B3"); B.add("B4"); B.add("B5"); B.add("B6");
List<String> C = new ArrayList<>();
C.add("C0"); C.add("C1"); C.add("C2");
C.add("C3"); C.add("C4"); C.add("C5"); C.add("C6");
List<String> D = new ArrayList<>();
D.add("D0"); D.add("D1"); D.add("D2");
D.add("D3"); D.add("D4"); D.add("D5"); D.add("D6");
List<List<String>> columns = new ArrayList<>();
columns.add(A); columns.add(B); columns.add(C); columns.add(D);
List<String> output = equidistribute(columns);
// for (String row : output) {
// System.out.println(row);
// }
// new Solution().test(output, columns.size(), A.size());
new Solution().testAllTheThings();
}
public static List<String> equidistribute(List<List<String>> columns) {
List<String> output = new ArrayList<>();
int[] primeNumbers = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439,
443, 449, 457, 461, 463, 467, 479, 487, 491, 499,
503, 509, 521, 523, 541};
int numberOfColumns = columns.size();
int numberOfElements = columns.get(0).size();
for (int i = 0; i < Math.pow(numberOfElements, numberOfColumns); i++) {
String row = "";
for (int j = 0; j < numberOfColumns; j++) {
if (j==0) {
row += columns.get(0).get(i % numberOfElements);
} else {
int index = ((int) Math.floor(i * Math.sqrt(primeNumbers[j-1]))) % numberOfElements;
row += " " + columns.get(j).get(index);
}
}
output.add(row);
}
return output;
}
class MutableInt {
int value = 0;
public void increment() { value++; }
public int get() { return value; }
public String toString() { return String.valueOf(value); }
}
public void test(List<String> columns, int numberOfColumns, int numberOfElements) throws Exception {
List<HashMap<String, MutableInt>> pairMaps = new ArrayList<>();
List<HashMap<String, MutableInt>> individualElementMaps = new ArrayList<>();
// initialize structures for calculating min distance
for (int i = 0; i < numberOfColumns; i++) {
if (i != numberOfColumns-1) {
HashMap<String, MutableInt> pairMap = new HashMap<>();
pairMaps.add(pairMap);
}
HashMap<String, MutableInt> individualElementMap = new HashMap<>();
individualElementMaps.add(individualElementMap);
}
int minDistancePair = Integer.MAX_VALUE;
int minDistanceElement = Integer.MAX_VALUE;
String pairOutputMessage = "";
String pairOutputDebugMessage = "";
String elementOutputMessage = "";
String elementOutputDebugMessage = "";
String outputMessage = numberOfColumns + " columns, " + numberOfElements + " elements";
for (int i = 0; i < columns.size(); i++) {
String[] elements = columns.get(i).split(" ");
for (int j = 0; j < numberOfColumns; j++) {
// pair stuff
if (j != numberOfColumns-1) {
String pairEntry = elements[j] + " " + elements[j+1];
MutableInt count = pairMaps.get(j).get(pairEntry);
if (pairMaps.get(j).containsKey(pairEntry)) {
if (count.get() <= minDistancePair) {
minDistancePair = count.get();
pairOutputMessage = "minDistancePair = " + minDistancePair;
pairOutputDebugMessage += "(" + pairEntry + " at line " + (i+1) + ") min = " + minDistancePair + "\n";
}
count = null;
}
if (count == null) {
pairMaps.get(j).put(pairEntry, new MutableInt());
}
}
// element stuff
String elementEntry = elements[j];
MutableInt count = individualElementMaps.get(j).get(elementEntry);
if (individualElementMaps.get(j).containsKey(elementEntry)) {
if (count.get() <= minDistanceElement) {
minDistanceElement = count.get();
elementOutputMessage = "minDistanceElement = " + minDistanceElement;
elementOutputDebugMessage += "(" + elementEntry + " at line " + (i+1) + ") min = " + minDistanceElement + "\n";
}
count = null;
}
if (count == null) {
individualElementMaps.get(j).put(elementEntry, new MutableInt());
}
}
// increment counters
for (HashMap<String, MutableInt> pairMap : pairMaps) {
Iterator it = pairMap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry mapEntry = (Map.Entry) it.next();
((MutableInt) mapEntry.getValue()).increment();
}
}
for (HashMap<String, MutableInt> elementMap : individualElementMaps) {
Iterator it = elementMap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry mapEntry = (Map.Entry) it.next();
((MutableInt) mapEntry.getValue()).increment();
}
}
}
System.out.println(outputMessage + " -- " + pairOutputMessage + ", " + elementOutputMessage);
// System.out.print(elementOutputDebugMessage);
// System.out.print(pairOutputDebugMessage);
}
public void testAllTheThings() throws Exception {
char[] columnPrefix = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".toCharArray();
int maxNumberOfColumns = columnPrefix.length;
int maxNumberOfElements = 30;
for (int i = 2; i < maxNumberOfColumns; i++) {
for (int j = i; j < maxNumberOfElements; j++) {
List<List<String>> columns = new ArrayList<>();
for (int k = 0; k < i; k++) {
List<String> column = new ArrayList<>();
for (int l = 0; l < j; l++) {
column.add(String.valueOf(columnPrefix[k]) + l);
}
columns.add(column);
}
List<String> output = equidistribute(columns);
test(output, i, j);
}
}
}
}
edit: removed restriction that each set must have same number of elements
public List<String> equidistribute(List<List<String>> columns) {
List<String> output = new ArrayList<>();
int[] primeNumbers = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439,
443, 449, 457, 461, 463, 467, 479, 487, 491, 499,
503, 509, 521, 523, 541};
int numberOfColumns = columns.size();
int numberOfCombinations = 1;
for (List<String> column : columns) {
numberOfCombinations *= column.size();
}
for (int i = 0; i < numberOfCombinations; i++) {
String row = "";
for (int j = 0; j < numberOfColumns; j++) {
int numberOfElementsInColumn = columns.get(j).size();
if (j==0) {
row += columns.get(0).get(i % numberOfElementsInColumn);
} else {
int index = ((int) Math.floor(i * Math.sqrt(primeNumbers[j-1]))) % numberOfElementsInColumn;
row += "|" + columns.get(j).get(index);
}
}
output.add(row);
}
return output;
}

Parallel for loop in Swift

What is the closest Swift equivalent of the following C & OpenMP code (assume that n is huge and f is simple):
#openmp parallel for
for (int i = 0; i < n; ++i) {
a[i] = f(b[i]);
}
Parallelising a for loop with striding and dispatch_apply seems like a lot of work for such a routine task. Is there any clever shortcut?
If your code has loops, and the work being done each time through the loop is independent of the work being done in the other iterations, you might consider reimplementing that loop code using the dispatch_apply or dispatch_apply_f function. These functions submit each iteration of a loop separately to a dispatch queue for processing. When used in conjunction with a concurrent queue, this feature lets you perform multiple iterations of the loop concurrently.
Read here: https://developer.apple.com/library/content/documentation/General/Conceptual/ConcurrencyProgrammingGuide/ThreadMigration/ThreadMigration.html#//apple_ref/doc/uid/TP40008091-CH105-SW2
For swift: https://developer.apple.com/documentation/dispatch/dispatchqueue/2016088-concurrentperform
DispatchQueue.concurrentPerform(iterations: 1000) { (index) in
print("current: \(index)")
}
It appears (from the iBook) that there's not yet a swift-specific API/language feature for parallelism. Using GCD seems like the best option at this point, performance-wise. If you're looking for code brevity, you can just use the standard Objective-C idiom for concurrent array enumeration:
var array : Int[] = [1,2,3,4]
array.bridgeToObjectiveC().enumerateObjectsWithOptions(NSEnumerationOptions.Concurrent, {(obj: AnyObject!, index: Int, outStop: CMutablePointer<ObjCBool>) -> Void in
// Do stuff
});
The clue in above answer about using concurrentPerform is correct, but not expanded. So here's the simples solution:
class ConcurrentCalculator<T> {
func calc(for b: [T], _ f: (T) -> T) -> [T?] {
let n = b.count
var a: [T?] = Array(repeating: nil, count: n)
DispatchQueue.concurrentPerform(iterations: n) { i in
a[i] = f(b[i])
}
return a
}
}
And here's the test:
let myB = Array(1...100)
func myF(b: Int) -> Int {
b * b
}
let myCalc = ConcurrentCalculator<Int>()
let myA = myCalc.calc(for: myB, myF)
print(myA)
The output will look like
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000]
Of course this is a basic solution: it's not thread safe, and also it may run into memory issues with large arrays. This can be added later.
The beauty of this approach is that you don't need to count available threads, etc. It will just use all available threads.
For parallelism of for loop in Swift, you must use
DispatchQueue.concurrentPerform(iteration: execute:)
You can convert a Swift array to NSArray and use enumerateObjectsWithOptions without bridging - see simple example below:
let kCompanyListFileNames :[String] = [
"nyse_companylist",
"nasdaq_companylist",
"amex_companylist"
]
let companyListFileNames:NSArray = kCompanyListFileNames as NSArray
companyListFileNames.enumerateObjectsWithOptions(NSEnumerationOptions.Concurrent) {
(companyName:AnyObject!, index:Int, stop:UnsafeMutablePointer<ObjCBool>) -> Void in
println(companyName)
}

Java8 equivalent for ruby's each_with_index

I wonder, if there's some stream operation that can do as each_with_index in ruby.
Where each_with_index iterates over the value as well as the index of the value.
There is no stream operation specifically for that purpose. But you can mimic the functionality in several ways.
Index variable: The following approach works fine for sequential streams.
int[] index = { 0 };
stream.forEach(item -> System.out.printf("%s %d\n", item, index[0]++));
External iteration: The following approach works fine for parallel streams, as long as the original collection supports random access.
List<String> tokens = ...;
IntStream.range(0, tokens.size()).forEach(
index -> System.out.printf("%s %d\n", tokens.get(index), index));
You can reduce it
<T> void forEachIndexed(Stream<T> stream, BiConsumer<Integer, T> consumer) {
stream.reduce(0, (index, t) -> {
consumer.accept(index, t);
return index + 1;
}, Integer::max);
}
this way:
List<Integer> ints = Arrays.asList(1, 2, 4, 6, 8, 16, 32);
forEachIndexed(ints.stream(), (idx, el) -> {
System.out.println(idx + ": " + el);
});
You can use forEachWithIndex() in Eclipse Collections (formerly GS Collections).
MutableList<Integer> elements = FastList.newList();
IntArrayList indexes = new IntArrayList();
MutableList<Integer> collection = this.newWith(1, 2, 3, 4);
collection.forEachWithIndex((Integer object, int index) -> {
elements.add(object);
indexes.add(index);
});
Assert.assertEquals(FastList.newListWith(1, 2, 3, 4), elements);
Assert.assertEquals(IntArrayList.newListWith(0, 1, 2, 3), indexes);
If you cannot convert your Collection to a GS Collections type, you can use one of the adapters, like ListAdapter.
List<Integer> list = Arrays.asList(1, 2, 3, 4);
ListIterable<Integer> collection = ListAdapter.adapt(list);
collection.forEachWithIndex((object, index) -> {
elements.add(object);
indexes.add(index);
});
Note: I am a committer for Eclipse Collections.
Alternative with stream reduce operation with accumulator (2nd parameter) for side-effect. 3rd parameter could be any function, if you don't need the result from reduce operation.
List<String> tokens = Arrays.asList("A", "B", "C", "D");
tokens.stream().reduce(1, (i, str) -> {
System.out.printf("%s %d\n", str, i);
return i + 1;
}, Integer::max);
PS: Although it is possible, I am personally not satisfied with abuse of reduce function. :)
Easy to do with utility library protonpack: https://github.com/poetix/protonpack
Stream<String> source = Stream.of("Foo", "Bar", "Baz");
List<Indexed<String>> zipped = StreamUtils.zipWithIndex(source).collect(Collectors.toList());
assertThat(zipped, contains(
Indexed.index(0, "Foo"),
Indexed.index(1, "Bar"),
Indexed.index(2, "Baz")));

Merge ranges in intervals

Given a set of intervals: {1-4, 6-7, 10-12} add a new interval: (9,11) so that the final solution is 'merged': Output: {1-4, 6-7, 9-12}. The merger can happen on both sides (low as well as high range).
I saw this question was answered at multiple places, someone even suggested using Interval Tress, but did not explain how exactly they would use it. The only solution I know of is to arrange the intervals in ascending order of their start time and iterating over them and trying to merge them appropriately.
If someone can help me understand how we can use interval trees in this use case, that will be great!
[I have been following interval trees in CLRS book, but they do not talk about merging, all they talk about is insertion and search.]
(I'm assuming that this means that intervals can never overlap, since otherwise they'd be merged.)
One way to do this would be to store a balanced binary search tree with one node per endpoint of a range. Each node would then be marked as either an "open" node marking the start of an interval or a "close" node marking the end of an interval.
When inserting a new range, one of two cases will occur regarding the start point of the range:
It's already inside a range, which means that you will extend an already-existing range as part of the insertion.
It's not inside a range, so you'll be creating a new "open" node.
To determine which case you're in, you can do a predecessor search in the tree for the range's start point. If you get NULL or a close node, you need to insert a new open node representing the start point of the range. If you get an open node, you will just keep extending that interval.
From there, you need to determine how far the range extends. To do this, continuously compute the successor of the initial node you inserted until one of the following occurs:
You have looked at all nodes in the tree. In that case, you need to insert a close node marking the end of this interval.
You see a close node after the end of the range. In that case, you're in the middle of an existing range when the new range ends, so you don't need to do anything more. You're done.
You see a close or open node before the end of the range. In that case, you need to remove that node from the tree, since the old range is subsumed by the new one.
You see an open node after the end of the range. In that case, insert a new close node into the tree, since you need to terminate the current range before seeing the start of this new one.
Implemented naively, the runtime of this algorithm is O(log n + k log n), where n is the number of intervals and k is the number of intervals removed during this process (since you have to do n deletes). However, you can speed this up to O(log n) by using the following trick. Since the deletion process always deletes nodes in a sequence, you can use a successor search for the endpoint to determine the end of the range to remove. Then, you can splice the subrange to remove out of the tree by doing two tree split operations and one tree join operation. On a suitable balanced tree (red-black or splay, for example), this can be done in O(log n) total time, which is much faster if a lot of ranges are going to get subsumed.
Hope this helps!
public class MergeIntervals {
public static class Interval {
public double start;
public double end;
public Interval(double start, double end){
this.start = start;
this.end = end;
}
}
public static List<Interval> mergeInteval(List<Interval> nonOverlapInt, Interval another){
List<Interval> merge = new ArrayList<>();
for (Interval current : nonOverlapInt){
if(current.end < another.start || another.end < current.start){
merge.add(current);
}
else{
another.start = current.start < another.start ? current.start : another.start ;
another.end = current.end < another.end ? another.end : current.end;
}
}
merge.add(another);
return merge;
}
Check this out. It may help you:- http://www.boost.org/doc/libs/1_46_0/libs/icl/doc/html/index.html
The library offers these functionalities:
1) interval_set
2) separate_interval_set
3) split_interval_set
C#
public class Interval
{
public Interval(int start, int end) { this.start = start; this.end = end; }
public int start;
public int end;
}
void AddInterval(List<Interval> list, Interval interval)
{
int lo = 0;
int hi = 0;
for (lo = 0; lo < list.Count; lo++)
{
if (interval.start < list[lo].start)
{
list.Insert(lo, interval);
hi++;
break;
}
if (interval.start >= list[lo].start && interval.start <= list[lo].end)
{
break;
}
}
if (lo == list.Count)
{
list.Add(interval);
return;
}
for (hi = hi + lo; hi < list.Count; hi++)
{
if (interval.end < list[hi].start)
{
hi--;
break;
}
if (interval.end >= list[hi].start && interval.end <= list[hi].end)
{
break;
}
}
if (hi == list.Count)
{
hi = list.Count - 1;
}
list[lo].start = Math.Min(interval.start, list[lo].start);
list[lo].end = Math.Max(interval.end, list[hi].end);
if (hi - lo > 0)
{
list.RemoveRange(lo + 1, hi - lo);
}
}
This is simply done by adding the interval in question to the end of the interval set, then performing a merge on all teh elements of the interval set.
The merge operation is well-detailed here: http://www.geeksforgeeks.org/merging-intervals/
If you're not in the mood for C++ code, here is the same things in python:
def mergeIntervals(self, intervalSet):
# interval set is an array.
# each interval is a dict w/ keys: startTime, endTime.
# algorithm from: http://www.geeksforgeeks.org/merging-intervals/
import copy
intArray = copy.copy(intervalSet)
if len(intArray) <= 1:
return intArray
intArray.sort(key=lambda x: x.get('startTime'))
print "sorted array: %s" % (intArray)
myStack = [] #append and pop.
myStack.append(intArray[0])
for i in range(1, len(intArray)):
top = myStack[0]
# if current interval NOT overlapping with stack top, push it on.
if (top['endTime'] < intArray[i]['startTime']):
myStack.append(intArray[i])
# otherwise, if end of current is more, update top's endTime
elif (top['endTime'] < intArray[i]['endTime']):
top['endTime'] = intArray[i]['endTime']
myStack.pop()
myStack.append(top)
print "merged array: %s" % (myStack)
return myStack
Don't forget your nosetests to verify you actually did the work right:
class TestMyStuff(unittest.TestCase):
def test_mergeIntervals(self):
t = [ { 'startTime' : 33, 'endTime' : 35 }, { 'startTime' : 11, 'endTime' : 15 }, { 'startTime' : 72, 'endTime' : 76 }, { 'startTime' : 44, 'endTime' : 46 } ]
mgs = MyClassWithMergeIntervalsMethod()
res = mgs.mergeIntervals(t)
assert res == [ { 'startTime' : 11, 'endTime' : 15 }, { 'startTime' : 33, 'endTime' : 35 }, { 'startTime' : 44, 'endTime' : 46 }, { 'startTime' : 72, 'endTime' : 76 } ]
t = [ { 'startTime' : 33, 'endTime' : 36 }, { 'startTime' : 11, 'endTime' : 35 }, { 'startTime' : 72, 'endTime' : 76 }, { 'startTime' : 44, 'endTime' : 46 } ]
mgs = MyClassWithMergeIntervalsMethod()
res = mgs.mergeIntervals(t)
assert res == [{'endTime': 36, 'startTime': 11}, {'endTime': 46, 'startTime': 44}, {'endTime': 76, 'startTime': 72}]

Resources