I an using Ignite's Data Grid and wanted to test the off heap tiered mode. I have 1 server and 1 client as part of the grid on different machines. Here are the steps that I follow to create the cache :
Start the server on one node.
Start the client on another node (use Discovery spi to connect to the server) and create a cache along with a near cache and load 10,000 entries into the cache.
The cache memory mode is OFFHEAP_TIERED and the off heap memory is set to zero using the method CacheConfiguration#setOffHeapMaxMemory(int size).
Open the Ignite CLI (visor) and check the number of entries stored off heap and the one's stored on heap.
The strange thing that I encounter is that not even a single entry is stored off heap. The visor shows all the entries in the client and on the server being stored on heap. But, if I do not a use a near cache then, all the entries are stored in off heap.
I want to know whether this a problem with the statistics shown by the visor or is there a change in behavior of Ignite storing entries when a near cache is enabled.
This is my Client Side Code
public class IgniteClient {
public static void main(String[] args) {
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
// IP has not been shown intentionally
ipFinder.setAddresses(Arrays.asList("*.*.*.*"));
TcpDiscoverySpi spi = new TcpDiscoverySpi();
spi.setIpFinder(ipFinder);
IgniteConfiguration icfg = new IgniteConfiguration();
icfg.setMetricsUpdateFrequency(-1);
icfg.setClientMode(true);
Ignite grid = Ignition.start(icfg);
CacheConfiguration<Integer, String> ccfg = new CacheConfiguration<Integer, String>();
NearCacheConfiguration<Integer, String> ncfg = new NearCacheConfiguration<>();
ccfg.setMemoryMode(CacheMemoryMode.OFFHEAP_TIERED);
ccfg.setOffHeapMaxMemory(0);
ccfg.setName("data");
ncfg.setNearStartSize(1000);
IgniteCache<Integer, String> dataCache = grid.getOrCreateCache(ccfg, ncfg);
for (int i = 1; i <= 10000; i++) {
dataCache.put(i, Integer.toString(i));
}
System.out.println("The entries in data cache are " + dataCache.size(CachePeekMode.ALL));
}
}
This is my Server Side Code
public class IgniteMain {
public static void main(String[] args) {
IgniteConfiguration icfg = new IgniteConfiguration();
icfg.setMetricsUpdateFrequency(-1);
Ignite grid = Ignition.start(icfg);
}
}
This is the output of the command 'cache' on the Ignite visor which is running on the client machine
Time of the snapshot: 01/28/17, 18:23:41
+===================================================================================================================+
| Name(#) | Mode | Nodes | Entries (Heap / Off heap) | Hits | Misses | Reads | Writes |
+===================================================================================================================+
| data(#c0) | PARTITIONED | 2 | min: 10000 (10000 / 0) | min: 0 | min: 0 | min: 0 | min: 0 |
| | | | avg: 10000.00 (10000.00 / 0.00) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
| | | | max: 10000 (10000 / 0) | max: 0 | max: 0 | max: 0 | max: 0 |
+-------------------------------------------------------------------------------------------------------------------+
As you can see the visor shows that all the entries are in the heap and none of them are stored off heap.
Also, if I create and load the cache from the server and start the client (it does nothing) then all the entries are stored off heap.
To add to this there is other behavior which might throw more light.
Post the steps provided above, if you start another server node, the new server node stores the cache entries in off heap memory (assuming backup is set).
When you run the client again to clear the existing cache and add
the data again, on the servers, part data is on heap and part on off
heap.
I investigated and Ignite woks this way as you see.
You can track this issue for fix https://issues.apache.org/jira/browse/IGNITE-4662
Or not use near cache
Related
We want to increase the speed of bulk-load.
Now we used JAVA to bulk load documents to Elasticsearch. We planned to import 10m documents each document size is almost 8M. Now we only can import 400K documents each day/ 5 documents every second.
Our ES infrastructure is 3 master node with 4G ES_JAVA_OPTS(heap size) 2 data nodes and 2 client nodes with 2G memory. When I want to increase the speed of bulk-load, we will get over the heap size issue. we set up the es cluster on Kubernetes.
The I/O is below.
dd if=/dev/zero of=/data/tmp/test1.img bs=1G count=10 oflag=dsync
10737418240 bytes (11 GB) copied, 50.7528 s, 212 MB/s
dd if=/dev/zero of=/data/tmp/test2.img bs=512 count=100000 oflag=dsync
51200000 bytes (51 MB) copied, 336.107 s, 152 kB/s
Any advice for the improvement?
for (int x =0; x<200000;x++) {
BulkRequest bulkRequest = new BulkRequest();
for (int k = 0; k < 50; k++) {
Order order = generateOrder();
IndexRequest indexRequest = new IndexRequest("orderpot", "orderpot");
Object esDataMap = objectToMap(order);
String source = JSONObject.valueToString(esDataMap);
indexRequest.source(source, XContentType.JSON);
bulkRequest.add(indexRequest);
}
rhlclient.bulk(bulkRequest, RequestOptions.DEFAULT);
over heap size
Seems you need more memory for data node.10m documents with 8M each will cost a lot of memory.And you can reduce the memory of master node and add on data nodes, master node need less memory than data nodes, and if there is no more nodes, you can combine the client nodes with data nodes, more data nodes with share the pressure.
Some other advise:
1. disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0, when indexing.
2. set a mapping for your index, do not use default mapping, for example:some fields can be integer no need to use long, some fields can be text but keyword will never be used, and some fields will only be used as text.
[tune-for-indexing-speed given by official][1]https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html
I am trying to solve the following homework problem for an operating systems class:
The following processes are being scheduled using a preemptive, round robin scheduling algorithm. Each process is assigned a numerical priority, with a higher number indicating a higher relative priority.
In addition to the processes listed below, the system also has an idle task (which consumes no CPU resources and is identified as Pidle ). This task has priority 0 and is scheduled whenever the system has no other available processes to run.
The length of a time quantum is 10 units.
If a process is preempted by a higher-priority process, the preempted process is placed at the end of the queue.
+--+--------+----------+-------+---------+
| | Thread | Priority | Burst | Arrival |
+--+--------+----------+-------+---------+
| | P1 | 40 | 15 | 0 |
| | P2 | 30 | 25 | 25 |
| | P3 | 30 | 20 | 30 |
| | P4 | 35 | 15 | 50 |
| | P5 | 5 | 15 | 100 |
| | P6 | 10 | 10 | 105 |
+--+--------+----------+-------+---------+
a. Show the scheduling order of the processes using a Gantt chart.
b. What is the turnaround time for each process?
c. What is the waiting time for each process?
d. What is the CPU utilization rate?
My question is --- What role does priority play when we're considering that this uses the round robin algorithm? I have been thinking about it a lot what I have come up with is that it only makes sense if the priority is important at the time of its arrival in order to decide if it should preempt another process or not. The reason I have concluded this is because if it was checked every time there was a context switch then the process with the highest priority would always be run indefinitely and other processes would starve. This is against the idea of round robin making sure that no process executes longer than one time quantum and the idea that after a process executes it goes to the end of the queue.
Using this logic I have worked out the problem as such:
Could you please advise me if I'm on the right track of the role priority has in this situation and if I'm approaching it the right way?
I think you are on the wrong track. Round robin controls the run order within a priority. It is as if each priority has its own queue, and corresponding round robin scheduler. When a given priority’s queue is empty, the subsequent lower priority queues are considered. Eventually, it will hit idle.
If you didn’t process it this way, how would you prevent idle from eventually being scheduled, despite having actual work ready to go?
Most high priority processes are reactive, that is they execute for a short burst in response to an event, so for the most part on not on a run/ready queue.
In code:
void Next() {
for (int i = PRIO_HI; i >= PRIO_LO; i--) {
Proc *p;
if ((p = prioq[i].head) != NULL) {
Resume(p);
/*NOTREACHED*/
}
}
panic(“Idle not on runq!”);
}
void Stop() {
unlink(prioq + curp->prio, curp);
Next();
}
void Start(Proc *p) {
p->countdown = p->reload;
append(prioq + p->prio, p);
Next();
}
void Tick() {
if (--(curp->countdown) == 0) {
unlink(prioq + curp->prio, curp);
Start(curp);
}
}
I'm am working on writing some performance tests using Taurus & Jmeter.
After executing a set of tests on a some URLs, I see the stats on console as below.
19:03:40 INFO: Percentiles:
+---------------+---------------+
| Percentile, % | Resp. Time, s |
+---------------+---------------+
| 95.0 | 2.731 |
+---------------+---------------+
19:03:40 INFO: Request label stats:
+--------------+--------+---------+--------+-------+
| label | status | succ | avg_rt | error |
+--------------+--------+---------+--------+-------+
| /v1/brands | OK | 100.00% | 2.730 | |
| /v1/catalogs | OK | 100.00% | 1.522 | |
+--------------+--------+---------+--------+-------+
I'm wondering if there is a way to display other labels per URL. for ex. percentile response time per URL.
Below are all the stats that could be captured from Taurus. (according to taurus documentation), but I couldn't figure out the configuration required to display them onto the console. Appreciate any help.
label - is the sample group for which this CSV line presents the stats. Empty label means total of all labels
concurrency - average number of Virtual Users
throughput - total count of all samples
succ - total count of not-failed samples
fail - total count of saved samples
avg_rt - average response time
stdev_rt - standard deviation of response time
avg_ct - average connect time if present
avg_lt - average latency if present
rc_200 - counts for specific response codes
perc_0.0 .. perc_100.0 - percentile levels for response time, 0 is also minimum response time, 100 is maximum
bytes - total download size
Looking into documentation on Taurus Console Reporter it is possible to amend only the following parameters:
modules:
console:
# disable console reporter
disable: false # default: auto
# configure screen type
screen: console
# valid values are:
# - console (ncurses-based dashboard, default for *nix systems)
# - gui (window-based dashboard, default for Windows, requires Tkinter)
# - dummy (text output into console for non-tty cases)
dummy-cols: 140 # width for dummy screen
dummy-rows: 35 # height for dummy screen
If you can understand and write Python code you can try amending reporting.py file which is responsible for generating stats and summary table. This is a good point to start:
def __report_summary_labels(self, cumulative):
data = [("label", "status", "succ", "avg_rt", "error")]
justify = {0: "left", 1: "center", 2: "right", 3: "right", 4: "left"}
sorted_labels = sorted(cumulative.keys())
for sample_label in sorted_labels:
if sample_label != "":
data.append(self.__get_sample_element(cumulative[sample_label], sample_label))
table = SingleTable(data) if sys.stdout.isatty() else AsciiTable(data)
table.justify_columns = justify
self.log.info("Request label stats:\n%s", table.table)
Otherwise alternatively you can use Online Interactive Reports or configure your JMeter test to use Grafana and InfluxDB as 3rd-party metrics storage and visualisation systems.
I want to know if there is any way to limit the number of cpu usage by the user name in windows? For example, there are 8 cores and I want to limit the global cpu usage of a user to 6. So, he can not run more than 6 serial jobs (each use one core).
In Linux, that can be done via scripting. But I haven't see any similar thing even with powershell scripts. Does that mean, it can not be done?
The keyword for this is Affinity.
Affinity starts at 0 being first core.
Affinity is a bitmap
10000000 = first core
01000000 = second core
11000000 = first and second core
00100000 = third core
10100000 = first and third core
11100000 = first second and third core
function Set-Affinity([string]$Username,[int[]]$core){
[int]$affinty = 0
$core | %{ $affinty += [math]::pow(2,$_)}
get-process -IncludeUserName | ?{$_.UserName -eq $Username} | %{
$_.ProcessorAffinity = $affinty
}
}
Set-Affinity -username "TESTDOMAIN\TESTUSER" -core 0,1,2,3
I have implemented an Apache Pig script. When I execute the script it results in many mappers for a specific step, but has only one reducer for that step. Because of this condition (many mappers, one reducer) the Hadoop cluster is almost idle while the single reducer executes. In order to better use the resources of the cluster I would like to also have many reducers running in parallel.
Even if I set the parallelism in the Pig script using the SET DEFAULT_PARALLEL command I still result in having only 1 reducer.
The code part issuing the problem is the following:
SET DEFAULT_PARALLEL 5;
inputData = LOAD 'input_data.txt' AS (group_name:chararray, item:int);
inputDataGrouped = GROUP inputData BY (group_name);
-- The GeneratePairsUDF generates a bag containing pairs of integers, e.g. {(1, 5), (1, 8), ..., (8, 5)}
pairs = FOREACH inputDataGrouped GENERATE GeneratePairsUDF(inputData.item) AS pairs_bag;
pairsFlat = FOREACH pairs GENERATE FLATTEN(pairs_bag) AS (item1:int, item2:int);
The 'inputData' and 'inputDataGrouped' aliases are computed in the mapper.
The 'pairs' and 'pairsFlat' in the reducer.
If I change the script by removing the line with the FLATTEN command (pairsFlat = FOREACH pairs GENERATE FLATTEN(pairs_bag) AS (item1:int, item2:int);) then the execution results in 5 reducers (and thus in a parallel execution).
It seems that the FLATTEN command is the problem and avoids that many reducers are created.
How could I reach the same result of FLATTEN but having the script being executed in parallel (with many reducers)?
Edit:
EXPLAIN plan when having two FOREACH (as above):
Map Plan
inputDataGrouped: Local Rearrange[tuple]{chararray}(false) - scope-32
| |
| Project[chararray][0] - scope-33
|
|---inputData: New For Each(false,false)[bag] - scope-29
| |
| Cast[chararray] - scope-24
| |
| |---Project[bytearray][0] - scope-23
| |
| Cast[int] - scope-27
| |
| |---Project[bytearray][1] - scope-26
|
|---inputData: Load(file:///input_data.txt:org.apache.pig.builtin.PigStorage) - scope-22--------
Reduce Plan
pairsFlat: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-42
|
|---pairsFlat: New For Each(true)[bag] - scope-41
| |
| Project[bag][0] - scope-39
|
|---pairs: New For Each(false)[bag] - scope-38
| |
| POUserFunc(GeneratePairsUDF)[bag] - scope-36
| |
| |---Project[bag][1] - scope-35
| |
| |---Project[bag][1] - scope-34
|
|---inputDataGrouped: Package[tuple]{chararray} - scope-31--------
Global sort: false
EXPLAIN plan when having only one FOREACH with FLATTEN wrapping the UDF:
Map Plan
inputDataGrouped: Local Rearrange[tuple]{chararray}(false) - scope-29
| |
| Project[chararray][0] - scope-30
|
|---inputData: New For Each(false,false)[bag] - scope-26
| |
| Cast[chararray] - scope-21
| |
| |---Project[bytearray][0] - scope-20
| |
| Cast[int] - scope-24
| |
| |---Project[bytearray][1] - scope-23
|
|---inputData: Load(file:///input_data.txt:org.apache.pig.builtin.PigStorage) - scope-19--------
Reduce Plan
pairs: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-36
|
|---pairs: New For Each(true)[bag] - scope-35
| |
| POUserFunc(GeneratePairsUDF)[bag] - scope-33
| |
| |---Project[bag][1] - scope-32
| |
| |---Project[bag][1] - scope-31
|
|---inputDataGrouped: Package[tuple]{chararray} - scope-28--------
Global sort: false
There is no surety if pig uses the configuration DEFAULT_PARALLEL value for every steps in the pig script. Try PARALLEL along with your specific join/group step which you feel taking time (In your case GROUP step).
inputDataGrouped = GROUP inputData BY (group_name) PARALLEL 67;
If still it is not working then you might have to see your data for skewness issue.
I think there is a skewness in the data. Only a small number of mappers are producing exponentially large output. Look at the distribution of keys in your data. Like data contains few Groups with large number of records.
I tried "set default parallel" and "PARALLEL 100" but no luck. Pig still uses 1 reducer.
It turned out I have to generate a random number from 1 to 100 for each record and group these records by that random number.
We are wasting time on grouping, but it is much faster for me because now I can use more reducers.
Here is the code (SUBMITTER is my own UDF):
tmpRecord = FOREACH record GENERATE (int)(RANDOM()*100.0) as rnd, data;
groupTmpRecord = GROUP tmpRecord BY rnd;
result = FOREACH groupTmpRecord GENERATE FLATTEN(SUBMITTER(tmpRecord));
To answer your question we must first know how many reducers pig enforces to accomplish the - Global Rearrange process. Because as per my understanding the Generate / Projection should not require a single reducer. I cannot say the same thing about Flatten. However we know from common-sense that during flatten the aim is to de-nestify the tuples from bags and vice versa. And to do that all the tuples belonging to a bag should definitely be available in the same reducer. I might be wrong. But can anyone add something here to get this user an answer please ?