I'm trying to measure the impact of string interning in an application.
I came up with this:
class Program
{
static void Main(string[] args)
{
_ = BenchmarkRunner.Run<Benchmark>();
}
}
[MemoryDiagnoser]
public class Benchmark
{
[Params(10000, 100000, 1000000)]
public int Count { get; set; }
[Benchmark]
public string[] NotInterned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
[Benchmark]
public string[] Interned()
{
var a = new string[this.Count];
for (var i = this.Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
But I always end up with the same amount of allocated.
Is there any other measure or diagnostic that gives me the memory savings of using string.Intern()?
The main question here is what kind of impact do you want to measure? To be more specific: what are your target metrics? Here are some examples: performance metrics, memory traffic, memory footprint.
In the BenchmarkDotNet Allocated column, you get the memory traffic. string.Intern doesn't help to optimize it in your example, each (i % 10).ToString() call will allocate a new string. Thus, it's expected that BenchmarkDotNet shows the same numbers in the Allocated column.
However, string.Intern should help you to optimize the memory footprint of your application at the end (the total managed heap size, can be fetched via GC.GetTotalMemory()). It can be verified with a simple console application without BenchmarkDotNet:
using System;
namespace ConsoleApp24
{
class Program
{
private const int Count = 100000;
private static string[] notInterned, interned;
static void Main(string[] args)
{
var memory1 = GC.GetTotalMemory(true);
notInterned = NotInterned();
var memory2 = GC.GetTotalMemory(true);
interned = Interned();
var memory3 = GC.GetTotalMemory(true);
Console.WriteLine(memory2 - memory1);
Console.WriteLine(memory3 - memory2);
Console.WriteLine((memory2 - memory1) - (memory3 - memory2));
}
public static string[] NotInterned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = GetString(i);
}
return a;
}
public static string[] Interned()
{
var a = new string[Count];
for (var i = Count; i-- > 0;)
{
a[i] = string.Intern(GetString(i));
}
return a;
}
private static string GetString(int i)
{
var result = (i % 10).ToString();
return result;
}
}
}
On my machine (Linux, .NET Core 3.1), I got the following results:
802408
800024
2384
The first number and the second number are the memory footprint impacts for both cases. It's pretty huge because the string array consumes a lot of memory to keep the references to all the string instances.
The third number is the footprint difference between the footprint impact of interned and not-interned string. You may ask why it's so small. This can be easily explained: Stephen Toub implemented a special cache for single-digit strings in dotnet/coreclr#18383, it's described in his blog post:
So, it doesn't make sense to measure interning of the "0".."9" strings on .NET Core. We can easily modify our program to fix this problem:
private static string GetString(int i)
{
var result = "x" + (i % 10).ToString();
return result;
}
Here are the updated results:
4002432
800344
3202088
Now the impact difference (the third number) is pretty huge (3202088). It means that interning helped us to save 3202088 bytes in the managed heap.
So, there are the most important recommendation for your future experiments:
Carefully define metrics that you actually want to measure. Don't say "I want to find all kinds of affected metrics," any changes in the source code may affect hundreds of different metrics; it's pretty hard to measure all of them in each experiment. Carefuly think about what kind of metrics are really important for you.
Try to take the input data that are close to your actual work scenarios. Benchmarking with some "dummy" data may leads to incorrect results because there are too many tricky optimizations in runtime that works pretty well with such "dummy" cases.
Hi i am new in java programing. I've created a program to allocate 20 block inside 10 memory.
Here's the code
import java.util.*;
import java.io.*;
public class BestFit
{
private int[] job;//f
private int[] memBlock;//b
private int[] jobStatus;
private int[] jobAT;
static private int[] memTaken;
static int[] ff;
private int[] jobCC;
private int[] ArrivalTime;
private int[] waitingTime;
private int[] turnaroundTime;
public BestFit()
{
job = new int[]{5040,4600,1060,1950,6950,6410,2960,3070,2770,7790,5680,9150,7880,3870,7160,8880,4410,6130,6750,2560};
memBlock = new int[]{4400,6200,9300,1000,4200,8200,4600,3700,6300,2900};
memTaken = new int[20];
ff = new int[20];//to store no. of block that used by particular file
jobCC = new int[]{2,8,10,1,10,8,4,2,6,7,1,1,1,8,8,2,5,7,6,7};//cpu cycle
ArrivalTime = new int[]{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
waitingTime = new int[]{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
turnaroundTime = new int[]{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
}
public void BestFitAlgo()
{
int[] frag = new int[25];
int i,j,nb,nf,sizeDifference;
int lowest = 10000;
nf = 20;
nb = 10;
int startTime = 1;
int complete = 1;
int totalTime = 1;
int waitTime;
int tTime = 1;
Arrays.sort(memBlock);
for (i=0;i<nf;i++)
{
if (complete != 20)
{
for (j=0;j<nb;j++)
{
sizeDifference = memBlock[j] - job[i];
if (sizeDifference>=0)
if (lowest>sizeDifference)
{
ff[i] = j;//no of block = j
lowest = sizeDifference;
complete++;
System.out.println("Job: "+i+" is added to block: "+ff[i]+" and being process");
for (int k = 1;k<jobCC[i];k++)
{
startTime++;
}
if(startTime == jobCC[i])
{
waitingTime[i] = tTime - ArrivalTime[i];
turnaroundTime[i] = jobCC[i] + waitingTime[i];
System.out.println("Job: "+i+" is fully processed.Block: "+ff[i]+" is free");
System.out.println("Arrival Time: "+ArrivalTime[i]);
System.out.println("Start time: "+totalTime);
System.out.println("CPU cycle: "+jobCC[i]);
totalTime +=startTime;
startTime = 1;
tTime = totalTime;
System.out.println("Waiting time: "+waitingTime[i]);
System.out.println("Turnaround time: "+turnaroundTime[i]+"\n");
}
}
}
}
frag[i]=lowest;
lowest = 10000;
}
System.out.println("File No:\tFile_Size:\tBlock_No:\tBlock_Size:\tFragment");
for (i=0;i<nf&&ff[i]!=0;i++)
{
System.out.println(i+"\t\t"+job[i]+"\t\t"+ff[i]+"\t\t"+memBlock[ff[i]]+"\t\t"+frag[i]);
}
System.out.println("\nTotal time: "+totalTime);
}
public static void main (String[] args)
{
BestFit b = new BestFit();
b.BestFitAlgo();
}
}
For now the job can be allocated to the memory block by fcfs but the problem now is the next job wont be able to enter the memory list ( where all the block) until the previous job is done. So there are 9 free memory block everytime a job enter.
How do i make it so that job can enter the block simultaneously (with the condition the desired mem block is not occupied and based on arrival time).
I know how fcfs work but that is with only 1 memory block. I've been googling all day trying to find how fcfs work in multiple memory block but no avail.
I hope anyone can help me to understand how it work and maybe a hint on how to implement in in coding.
Thanks in advance
EDIT: i put my code instead so anyone can get a clear view of my problem.
I have a functional interface in Java 8:
public interface IFuncLambda1 {
public int someInt();
}
in main:
IFuncLambda1 iFuncL1 = () -> 5;
System.out.println("\niFuncL1.someInt: " + iFuncL1.someInt());
iFuncL1 = () -> 1;
System.out.println("iFuncL1.someInt: " + iFuncL1.someInt());
Running this will yield:
iFuncL1.someInt: 5
iFuncL1.someInt: 1
Is this functionality OK as it is? Is it intended?
If the overriding would be done in an implementing class, and the implementation would change at some point, then in every place that that method is called, the behaviour would be the same, we would have consistency. But if I change the behaviour/implementation through lambda expressions like in the example, the behaviour will only be valid til the next change, later on in the flow. This feels unreliable and hard to follow.
EDIT:
#assylias I don't see how someInt() has its behaviour changed...
What if I added a param to someInt and have this code:
IFuncLambda1 iFuncL1 = (x) -> x - 1;
System.out.println("\niFuncL1.someInt: " + iFuncL1.someInt(var));
iFuncL1 = (x) -> x + 1;
System.out.println("iFuncL1.someInt: " + iFuncL1.someInt(var));
with var being a final even, how would you re-write that with classes?
In your example, () -> 5 is one object and () -> 1 is another object. You happen to use the same variable to refer to them but that is just how references work in Java.
By the way it behaves exactly the same way as if you had used anonymous classes:
IFuncLambda1 iFuncL1 = new IFuncLambda1() { public int someInt() { return 5; } };
System.out.println("\niFuncL1.someInt: " + iFuncL1.someInt());
iFuncL1 = new IFuncLambda1() { public int someInt() { return 1; } };
System.out.println("iFuncL1.someInt: " + iFuncL1.someInt());
Or using "normal" classes:
public static class A implements IFuncLambda1 {
private final int i;
public A(int i) { this.i = i; }
public int someInt() { return i; }
}
IFuncLambda1 iFuncL1 = new A(5);
System.out.println("\niFuncL1.someInt: " + iFuncL1.someInt());
iFuncL1 = new A(1);
System.out.println("iFuncL1.someInt: " + iFuncL1.someInt());
There again there are two instances of A but you lose the reference to the first instance when you reassign iFuncL1.
I wrote a simple import/export application that transforms data from source->destination using EntityFramework and AutoMapper. It basically:
selects batchSize of records from the source table
'maps' data from source->destination entity
add new destination entities to destination table and saves context
I move around 500k records in under 5 minutes. After I refactored the code using generics the performance drops drastically to 250 records in 5 minutes.
Are my delegates that return DbSet<T> properties on the DbContext causing these problems? Or is something else going on?
Fast non-generic code:
public class Importer
{
public void ImportAddress()
{
const int batchSize = 50;
int done = 0;
var src = new SourceDbContext();
var count = src.Addresses.Count();
while (done < count)
{
using (var dest = new DestinationDbContext())
{
var list = src.Addresses.OrderBy(x => x.AddressId).Skip(done).Take(batchSize).ToList();
list.ForEach(x => dest.Address.Add(Mapper.Map<Addresses, Address>(x)));
done += batchSize;
dest.SaveChanges();
}
}
src.Dispose();
}
}
(Very) slow generic code:
public class Importer<TSourceContext, TDestinationContext>
where TSourceContext : DbContext
where TDestinationContext : DbContext
{
public void Import<TSourceEntity, TSourceOrder, TDestinationEntity>(Func<TSourceContext, DbSet<TSourceEntity>> getSourceSet, Func<TDestinationContext, DbSet<TDestinationEntity>> getDestinationSet, Func<TSourceEntity, TSourceOrder> getOrderBy)
where TSourceEntity : class
where TDestinationEntity : class
{
const int batchSize = 50;
int done = 0;
var ctx = Activator.CreateInstance<TSourceContext>();
//Does this getSourceSet delegate cause problems perhaps?
//Added this
var set = getSourceSet(ctx);
var count = set.Count();
while (done < count)
{
using (var dctx = Activator.CreateInstance<TDestinationContext>())
{
var list = set.OrderBy(getOrderBy).Skip(done).Take(batchSize).ToList();
//Or is the db-side paging mechanism broken by the getSourceSet delegate?
//Added this
var destSet = getDestinationSet(dctx);
list.ForEach(x => destSet.Add(Mapper.Map<TSourceEntity, TDestinationEntity>(x)));
done += batchSize;
dctx.SaveChanges();
}
}
ctx.Dispose();
}
}
Problem is invocation of the Func delegates you're doing a lot. Cache the resulting values in variables and it'll be fine.
I faced a rather stupid performance issue in my code. After a small investigation, i have found that AsQueryable method i used to cast my generic list slows down the code up to 8000 times.
So the the question is, why is that?
Here is the example
class Program
{
static void Main(string[] args)
{
var c = new ContainerTest();
c.FillList();
var s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(true);
}
var e = Environment.TickCount;
Console.WriteLine("TestLinq AsQueryable - {0}", e - s);
s = Environment.TickCount;
for (int i = 0; i < 10000; ++i)
{
c.TestLinq(false);
}
e = Environment.TickCount;
Console.WriteLine("TestLinq as List - {0}", e - s);
Console.WriteLine("Press enter to finish");
Console.ReadLine();
}
}
class ContainerTest
{
private readonly List<int> _list = new List<int>();
private IQueryable<int> _q;
public void FillList()
{
_list.Clear();
for (int i = 0; i < 10; ++i)
{
_list.Add(i);
}
_q = _list.AsQueryable();
}
public Tuple<int, int> TestLinq(bool useAsQ)
{
var upperBorder = useAsQ ? _q.FirstOrDefault(i => i > 7) : _list.FirstOrDefault(i => i > 7);
var lowerBorder = useAsQ ? _q.TakeWhile(i => i < 7).LastOrDefault() : _list.TakeWhile(i => i < 7).LastOrDefault();
return new Tuple<int, int>(upperBorder, lowerBorder);
}
}
UPD As i understand, i have to avoid AsQueryable method as much as possible(if it's not in the line of inheritance of the container), because i'll get immediately performance issue
"and avoid the moor in those hours of darkness when the powers of evil are exalted"
Just faced the same issue.
The thing is that IQueryable<T> takes Expression<Func<T, Bool>> as parameter for filtering in Where()/FirstOrDefault() calls - as opposed of just the Func<T, Bool> pre-compiled delegate taken in simple IEnumerable's correspondent methods.
That means there will be a compile phase to transform the Expression into a delegate. And this costs quite a lot.
Now you need that in a loop (just I did)? You'll get in some trouble...
PS: It seems .NET Core/.NET 5 improves this significantly. Unfortunately, our projects are not there yet...
at least use LINQ with List too
manual implementation will always be faster than LINQ
EDIT
you know that both test doesn't give the same result
Because AsQueryable returns an IQueryable, which has a completely different set of extension methods for the LINQ standard query operators from the one intended for things like List.
Queryable collections are meant to have a backing store of an RDBMS or something similar, and you are building a different, more complex code expression tree when you call IQueryable.FirstOrDefault() as opposed to List<>.FirstOrDefault().