Disk fragmentation calculation - disk

So if I have disk with 10 free spaces. Add two files, A with size 3 and B with size 4.
A A A B B B B - - -
After removing A, - - - B B B - - -
Is now B fragmented or not?

No.
B will stay fragmented if it was fragmented before, and it will stay unfragmented if it was unfragmented before (that is the case in your example).
Removing a different file / object generally does not affect fragmentation of an object without it being moved itself.
(Note that B was truncated to length 3 in your example as a consequence of deleting A, this does not normally happen either.)

Related

Are all processor cores on a cache-coherent system required to see the same value of a shared data at any point in time

From what I've learnt, cache coherence is defined by the following 3 requirements:
Read R from an address X on a core C returns the value written by the most recent write W to X on C if no other core has written to X between W and R.
If a core C1 writes to X and a core C2 reads after a sufficient time, and there are no other writes in between, C2's read returns the value from C1's write.
Writes to the same location are serialized: any two writes to X must be seen to occur in the same order on all cores.
As far as I understand these rules, they basically require all threads to see updates made by other threads within some reasonable time and in the same order, but there seems to be no requirement about seeing the same data at any point in time. For example, say thread A wrote a value to a shared memory location X, then thread B wrote another value to X. Threads C and D reading from X must see the same order of updates: A, B. Imagine that thread C has already seen both updates A and B, while thread D has only observed A (the event B is yet to be seen). Provided that the time interval between writes to X and reads from X is small enough (less than what we consider a sufficient time), this situation doesn't violate any rules of coherence, does it?
On the other hand, coherence protocols, e.g. MSI use write-invalidation to guarantee that all cores have an up-to-date value of a shared variable. Wiki says: "The intention is that two clients must never see different values for the same shared data". If what I wrote about the coherence rules is true, I don't understand where this point comes from. I mean, I realize it's useful, but don't see where it is defined.

Dirty bit value after changing data to original state

If the value in some part of cache is 4 and we change it to 5, that sets the dirty bit for that data to 1. But what about, if we set the value back to 4, will dirty bit still stay 1 or change back to 0?
I am interested in this, because this would mean a higher level optimization of the computer system when dealing with read-write operations between main memory and cache.
In order for a cache to work like you said, it would need to reserve half of its data space to store the old values.
Since cache are expensive exactly because they have an high cost per bit, and considering that:
That mechanism would only detect a two levels writing history: A -> B -> A and not any deeper (like A -> B -> C -> A).
Writing would imply the copy of the current values in the old values.
The minimum amount of taggable data in a cache is the line and the whole line need to be changed back to its original value. Considering that a line has a size in the order of 64 Bytes, that's very unlikely to happen.
An hierarchical structure of the caches (L1, L2, L3, ...) its there exactly to mitigate the problem of eviction.
The solution you proposed has little benefits compared to the cons and thus is not implemented.

Ext2 File system Block bitmap

I was reading Ext2 file system details, and I am not clear with the fact that the number of blocks in a block group is (b x 8) where b is the block size.
How have they arrived at this figure. What is the significance of 8.
For each group in a filesystem ext2 there is a block bitmap, which keeps track of which blocks are used (bit equals 1) and which are still free (bit equals 0). This structure is designed to occupy exactly one block. Hence, the number of bits in the block bitmap is equal to b x 8, where b is the block size expressed in bytes.
Blocks in the group must not outnumber bits in the block bitmap - otherwise we would not be able to keep information on their availability. At the same time we want groups to manage maximal possible number of blocks in order to limit space occupied by metadata. Therefore, the number of blocks in the group equals the maximum: b x 8.

Which memory access pattern is more efficient for a cached GPU?

So lets say I have a global array of memory:
|a|b|c| |e|f|g| |i|j|k| |
There are four 'threads' (local work items in OpenCL) accessing this memory, and two possible patterns for this access (columns are time slices, rows are threads):
0 -> 1 -> 2 -> 3
t1 a -> b -> c -> .
t2 e -> f -> g -> .
t3 i -> j -> k -> .
t4 . . . `> .
The above pattern splits the array in to blocks with each thread iterating to and accessing the next element in a block per time slice. I believe this sort of access would work well for CPUs because it maximizes cache locality per thread. Also, loops utilizing this pattern can be easily unrolled by the compiler.
The second pattern:
0 -> 1 -> 2 -> 3
t1 a -> e -> i -> .
t2 b -> f -> j -> .
t3 c -> g -> k -> .
t4 . . . `> .
The above pattern accesses memory in strides: for example, thread 1 accesses a, then e, then i etc. This maximizes cache locality per unit time. Consider you have 64 work-items 'striding' at any given time slice. This means that, with a cache-line size of 64 bytes and elements of sizeof(float), work-items 1-16's read are cached by work-item 1's read. The data width/count per cell (where 'a' is a cell from above) has to be chosen carefully to avoid misaligned access. These loops don't seem to unroll as easily (or at all using Intel's Kernel Builder with the CPU). I believe this pattern would work well with a GPU.
I'm targeting GPUs with cache hierarchies. Specifically AMD's latest architecture (GCN). Is the second access pattern an example of 'coalescing'? Am I wrong in my thought process somewhere?
I think the answer depends on whether or not the accesses are to global or local memory. If you are pulling the data from global memory, then you need to worry about coalescing the reads (ie contiguous blocks, second example). However, if you are pulling the data from local memory, then you need to worry about bank conflicts. I have some but not a lot of experience, so I'm not stating this as absolute truth.
Edit: After reading up on GCN, I don't think the caches make a difference here. You can basically think of them as just speeding up global memory if you repeatedly read/write the same elements. On a side note, thanks for asking the question, because reading up on the new architecture is pretty interesting.
Edit 2: Here's a nice Stack Overflow discussion of banks for local and global memory: Why aren't there bank conflicts in global memory for Cuda/OpenCL?

Dijkstra's Bankers Algorithm

Could somebody please provide a step-through approach to solving the following problem using the Banker's Algorithm? How do I determine whether a "safe-state" exists? What is meant when a process can "run to completion"?
In this example, I have four processes and 10 instances of the same resource.
Resources Allocated | Resources Needed
Process A 1 6
Process B 1 5
Process C 2 4
Process D 4 7
Per Wikipedia,
A state (as in the above example) is considered safe if it is possible for all processes to finish executing (terminate). Since the system cannot know when a process will terminate, or how many resources it will have requested by then, the system assumes that all processes will eventually attempt to acquire their stated maximum resources and terminate soon afterward. This is a reasonable assumption in most cases since the system is not particularly concerned with how long each process runs (at least not from a deadlock avoidance perspective). Also, if a process terminates without acquiring its maximum resources, it only makes it easier on the system.
A process can run to completion when the number of each type of resource that it needs is available, between itself and the system. If a process needs 8 units of a given resource, and has allocated 5 units, then it can run to completion if there are at least 3 more units available that it can allocate.
Given your example, the system is managing a single resource, with 10 units available. The running processes have already allocated 8 (1+1+2+4) units, so there are 2 units left. The amount that any process needs to complete is its maximum less whatever it has already allocated, so at the start, A needs 5 more (6-1), B needs 4 more (5-1), C needs 2 more (4-2), and D needs 3 more (7-4). There are 2 available, so Process C is allowed to run to completion, thus freeing up 2 units (leaving 4 available). At this point, either B or D can be run (we'll assume D). Once D has completed, there will be 8 units available, after which either A or B can be run (we'll assume A). Once A has completed, there will be 9 units available, and then B can be run, which will leave all 10 units left for further work. Since we can select an ordering of processes that will allow all processes to be run, the state is considered 'safe'.
Resources Allocated | Resources Needed claim
Process A 1 6 5
Process B 1 5 4
Process C 2 4 2
Process D 4 7 3
Total resources allocated is 8
Hence 2 resources are yet to be allocated hence that is allocated to process C. and process c after finishing relieves 4 resources that can be given to process B ,Process B after finishing relives 5 resources which is allocated to PROCESS A the n process A after finishing allocates 2 resources to process D

Resources