How much memory of a process is paged out? - performance

Is there a Performance Counter which indicates how much of memory of a specific process is paged out? I have a server which has 40 GB of available RAM (of 128 GB physical memory) but the paged out amount of data is over 100 GB.
How can I find out which of my processes are responsible for that huge page file consumption?
It would be also ok to have some xperf tracing to see when the page out activity happens. But apart from many writes to the page file I cannot see from which processes the memory is written to the page file.
Reference Set Tracing shows me only as far as I understand it how big the physical memory consumption of my process is. But it does not seem to track page out activity.
Update
The OS is Windows Server 2012 R2

The ETW provider "Microsoft-Windows-Kernel-Memory" has a keyword "KERNEL_MEM_KEYWORD_WS_SWAP" ("0x80"). Here there are some events that occur when data are paged out/paged in:
<event value="4" symbol="WorkingSetOutSwapStart" version="0" task="WorkingSetOutSwap" opcode="win:Start" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStartArgs"/>
<event value="4" symbol="WorkingSetOutSwapStart_V1" version="1" task="WorkingSetOutSwap" opcode="win:Start" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStartArgs_V1"/>
<event value="5" symbol="WorkingSetOutSwapStop" version="0" task="WorkingSetOutSwap" opcode="win:Stop" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStopArgs"/>
<event value="5" symbol="WorkingSetOutSwapStop_V1" version="1" task="WorkingSetOutSwap" opcode="win:Stop" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStopArgs_V1"/>
<event value="6" symbol="WorkingSetInSwapStart" version="0" task="WorkingSetInSwap" opcode="win:Start" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStartArgs"/>
<event value="6" symbol="WorkingSetInSwapStart_V1" version="1" task="WorkingSetInSwap" opcode="win:Start" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetOutSwapStartArgs_V1"/>
<event value="7" symbol="WorkingSetInSwapStop" version="0" task="WorkingSetInSwap" opcode="win:Stop" level="win:Informational" keywords="KERNEL_MEM_KEYWORD_WS_SWAP" template="WorkingSetInSwapStopArgs"/>
Here you get some data like the number of pages accessed (PagesProcessed):
<template tid="WorkingSetOutSwapStartArgs">
<data name="ProcessId" inType="win:UInt32"/>
</template>
<template tid="WorkingSetOutSwapStopArgs">
<data name="ProcessId" inType="win:UInt32"/>
<data name="Status" inType="win:HexInt32"/>
<data name="PagesProcessed" inType="win:UInt32"/>
</template>
<template tid="WorkingSetInSwapStopArgs">
<data name="ProcessId" inType="win:UInt32"/>
<data name="Status" inType="win:HexInt32"/>
</template>
<template tid="WorkingSetOutSwapStartArgs_V1">
<data name="ProcessId" inType="win:UInt32"/>
<data name="Flags" inType="win:HexInt32"/>
</template>
<template tid="WorkingSetOutSwapStopArgs_V1">
<data name="ProcessId" inType="win:UInt32"/>
<data name="Status" inType="win:HexInt32"/>
<data name="PagesProcessed" inType="win:Pointer"/>
<data name="WriteCombinePagesProcessed" inType="win:Pointer"/>
<data name="UncachedPagesProcessed" inType="win:Pointer"/>
<data name="CleanPagesProcessed" inType="win:Pointer"/>
</template>
Play with it if it includes all data you need.

In Xperf you want to look for Hard Faults - note that this is a type of Page Fault, but page faults can often be handled in software without touching the drive. You can add a column in Task Manager to show page faults for each process.
You can get some information on a process by using a tool like https://technet.microsoft.com/en-us/sysinternals/vmmap.aspx which will tell you for each block of memory in the process address space what type it is, and how much is committed. However, it's the committed memory that can be paged out, and VirtualQueryEx() doesn't tell you about that.
It's also worth noting that a large quantity of paged out memory isn't always a bad thing - it's the hard faults that are slow.
Edit: Hmm, if you want an intrusive one-off test I guess there's the hacky option of combining VirtualQueryEx() and ReadProcessMemory() to touch every committed page in a process so you can count the hard faults!

Related

How can you control Kristen FPGA from implementing excess registers?

I am using Kristen to generate a Verilog FPGA host interface for a neuromorphic processor. I have implemented the basic host as follows,
<module name= "nmp" duplicate="1000">
<register name="start0" type="rdconst" mask="0xFFFFFFFF" default="0x00000000" description="Lower 64 bit start pointer of persitant NMP storage."></register>
<register name="start1" type="rdconst" mask="0xFFFFFFFF" default="0x00000020" description="Upper 64 bit start pointer of persitant NMP storage."></register>
<register name="size" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Size of NMP persitant storage in Mbytes."></register>
<register name="c_start0" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Lower 64 bit start pointer of cached shared storage."></register>
<register name="c_start1" type="rdconst" mask="0xFFFFFFFF" default="0x00000020" description="Upper 64 bit start pointer of cached shared storage."></register>
<register name="c_size" type="rdconst" mask="0xFFFFFFFF" default="0x10000000" description="Size of cached shared storage in Mbytes."></register>
<register name="row" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable row location for this NMP."></register>
<register name="col" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable col location for this NMP."></register>
<register name="threshold" type="rdwr" mask="0xFFFFFFFF" default="0x00000000" description="Configurable synaptic sum threshold for this instance."></register>
<memory name="learn" memsize="0x00004000" type="mem_ack" description="Learning interface - Map input synapsys to node intensity">
<field name="input_id" size="22b" description="Input ID this map data is intended for."></field>
<field name="scale" size="16b" description="The intensity scale for this input ID."></field>
</memory>
</module>
The end result is that I am seeing a ton of registers being generated and I have to scale my NMP size down to fit within the constraints of my FPGA. Is there a way to control the number of registers being generated here? Obviously I need to store settings for these different fields. Am I missing something here?
I should add that I am trying to get to a 2048 scale on my NMP but the best I can do is just over a 1000, and not quite 1024. If I implement without PCIe or host control, I can get to 2048 without issue.
If I understand correctly, each NMP instance has a been coded with a internal register to store data and the configuration you have shown will result in kristen creating Verilog with registers as well. Effectivley there is a double buffered storage occuring.
Because of this, the number of registers are effectively doubled beyond what they need to be. One way of dealing with this situation described is to use another RAM interface of 32 bits wide. I do note that your config calls for 9 x 32 bits words which is a odd size for memory. There will be some wasted adddress space. Kristen will create a RAM's on binary boundaries so, you can get a 16x32bit memory region that you can overlay on that interface. And then a second RAM just like you have already for the learn memory.
<module>
<memory name="regs" memsize="0x10" type="mem_ack" description="Register mapping per NMP instance">
<field name "start0" size="32b" description="Start0"></field>
<field name "start1" size="32b" description="Start1"></field>
....
<field name "threshold" size="32b" description="Threshold"></field>
</memory>
<memory name="learn" memsize="0x00004000" type="mem_ack" description="Learning interface - Map input synapsys to node intensity">
<field name="input_id" size="22b" description="Input ID this map data is intended for."></field>
<field name="scale" size="16b" description="The intensity scale for this input ID."></field>
</memory>
</module>
Generate this and take a look at the new interface. That should reduce the number of registers generated in your Verilog code and subsequent synthesis.

Device Management for none-oneM2M Devices?

I already discussed how to manage devices in OneM2M on this topic but I noticed that I have still some misunderstanding.
Relation between MgmtObj and MgmtCmd. What is the exact correlation between them ? It seems MgmtObj keeps the status like the current software or firmware in it, the battery, device info etc. ObjectIds and ObjectPaths are used for mapping these information to a device management standard like LWM2M, TR-0069. Is it correct ?
I dont understand why Node has multiple reboot objects
in it?
Lets assume we have multiple different firmwares on a node. Each firmware controls different part of the hardware.
Then I guess I should create a MgmtCmd for each firmware but how a MgmtCmd knows
which firmware (MgmtObj) its related for ? There is no link between them when we look at the resource definition in OneM2M. Actually this points to my first question that the relationship between MgmtObj and MgmtCmd because somehow when the MgmtCmd runs and finished its job then the related firmware should be updated in related Node.
Lets assume that I am not going to implement any device management standard like TR-0069, LWM2M etc. We are using nonOneM2M devices which has its own proprietary way of device management. Then whats the simplest way to do that ?
What we thought is, we should put some device management logic to the IPE(Inter proxy entity) which can subscribe to all the events that occurs in any related MgmtCmds for devices like update of its ExecEnabled status and creation of ExecInstance. Then we should notify the IPE with that ExecInstance then IPE manages all the procedure. Is it suitable to use Subscription/Notification mechanism for device management ?
The mgmtCmd resource represents a method to execute management
procedures or to model commands and remote procedure calls (RPC)
required by existing management protocols (e.g. BBF TR-069 [i.4]),
and enables AEs to request management procedures to be executed on a
remote entity. It also enables cancellation of cancellable and
initiated but unfinished management procedures or commands.
The mgmtObj resource contains management data which enables
individual M2M management functions. It provides a general structure
to map to external management technology e.g. OMA DM [i.5], BBF TR-069
[i.4] and LWM2M [i.6] data models. Each instance of mgmtObj
resource shall be mapped to single external management technology.
-------------------------------- CLARIFICATION --------------------------------
When we look at the xsd of node, it contains child resources like
List of firmwares
List of softwares
List of reboots
etc...
Actually I just made up an example, its not a real world scenario. Also I try to understand why node has multiple of those resources like reboot, software, even if deviceinfo seems weird. What they refers ?
<xs:schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.onem2m.org/xml/protocols"
xmlns:m2m="http://www.onem2m.org/xml/protocols" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
elementFormDefault="unqualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="CDT-commonTypes-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-memory-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-battery-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-areaNwkInfo-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-areaNwkDeviceInfo-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-firmware-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-software-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-deviceInfo-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-deviceCapability-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-reboot-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-eventLog-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-cmdhPolicy-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-activeCmdhPolicy-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-subscription-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-semanticDescriptor-v3_9_0.xsd" />
<xs:include schemaLocation="CDT-transaction-v3_9_0.xsd"/>
<xs:include schemaLocation="CDT-schedule-v3_9_0.xsd"/>
<xs:element name="node" substitutionGroup="m2m:sg_announceableResource">
<xs:complexType>
<xs:complexContent>
<!-- Inherit common attributes for announceable Resources -->
<xs:extension base="m2m:announceableResource">
<!-- Resource Specific Attributes -->
<xs:sequence>
<xs:element name="nodeID" type="m2m:nodeID" />
<xs:element name="hostedCSELink" type="m2m:ID" minOccurs="0" />
<xs:element name="hostedAELinks" type="m2m:listOfM2MID" minOccurs="0" />
<xs:element name="hostedServiceLinks" type="m2m:listOfM2MID" minOccurs="0" />
<xs:element name="mgmtClientAddress" type="xs:string" minOccurs="0" />
<xs:element name="roamingStatus" type="xs:boolean" minOccurs="0" />
<xs:element name="networkID" type="xs:string" minOccurs="0" />
<!-- Child Resources -->
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element name="childResource" type="m2m:childResourceRef" minOccurs="1" maxOccurs="unbounded" />
<xs:choice minOccurs="1" maxOccurs="unbounded">
<xs:element ref="m2m:memory" />
<xs:element ref="m2m:battery" />
<xs:element ref="m2m:areaNwkInfo" />
<xs:element ref="m2m:areaNwkDeviceInfo" />
<xs:element ref="m2m:firmware" />
<xs:element ref="m2m:software" />
<xs:element ref="m2m:deviceInfo" />
<xs:element ref="m2m:deviceCapability" />
<xs:element ref="m2m:reboot" />
<xs:element ref="m2m:eventLog" />
<xs:element ref="m2m:cmdhPolicy" />
<xs:element ref="m2m:activeCmdhPolicy" />
<xs:element ref="m2m:subscription" />
<xs:element ref="m2m:semanticDescriptor" />
<xs:element ref="m2m:transaction" />
<xs:element ref="m2m:schedule" />
</xs:choice>
</xs:choice>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
----------------------- MORE CLARIFICATION -----------------------------
By the way, there is already a discussion about deviceinfo. Then I think they chose the way of multiple deviceInfo per Node because the current version of OneM2M supports multiple deviceInfo per Node. I also curious about whats meaning of multiple reboot or firmware per Node ?
To answer your questions one by one:
A specialisation of a <mgmtObj> holds the actual management information or represents an aspect of a device or node to be managed. Some of those specialisations can define "trigger" attributes that can execute a local action on a node. If one updates such an attribute then the action will be executed on the associated device.
A <mgmtCmd> represents a method to execute a remote command or action on a node or device. It offers a way to implement management functionalities that are not provided by <mgmtObj> specialisations. It can also be used for tunnelling management functionality through oneM2M rather then abstracting it.
According to TS-0001, Table 9.6.18-1 "Child resources of <node> resource", the <node> resource shall only have 0 or 1 child resource of the reboot specialization.
Actually it seems that the XSD, which you also quote in your question, is not correct because it does not reflect the written spec (also for some other attributes).
The assumption here is that a firmware is the basic and non-modular software stack or operating system of a device. You can use the [software] specialization to support modular OS architectures where you install "drivers" or packages for various apsects on the device. Each of those software packages can be managed independly from the firmware. TR-0069, for example, supports this kind of management.
The reason why a node may support multiple firmwares is that a device can store multiple firmware versions or generations, and you want to support this feature. Of course, only one firmware is active at a time.
In general, what you want to do is to define and implement a management adapter for your proprietary protocol. This would be an IPE that implements the logic to map between the oneM2M management resources and the proprietary aspects as well as to implement the local protocol to communicate with the proprietary devices.
For the question regarding to use subscription and notifications: this depends on your concrete deployment architecture, but sure, using subscriptions/notification would be an efficient way to implement this. The other method would be that the management IPE polls for changes of the relevant resources, which in general is more resource intensive.

Speeding up "Mapcache_seed" with Mapserver

I'm using mapcache_seed in the mapcache package to create a large image cache from calling my mapserver WMS, with vectors.
Currently, this is the command I'm using:
sudo -u www-data mapcache_seed -c mapcache.xml -g WGS84 -n 8 -t Test -e\ [Foo,Bar,Baz,Fwee] -M 8,8 -z 12,13 --thread-delay 0 --rate-limit 10000
Where www-data is my Nginx system-user, mapcache.xml is my config, WGS84 is my SRS, -n 8 is my logical thread count (on a i7-6700HQ at 3200 MHz), -z 12,13 is one zoom level that needs to be seeded, thread-delay is off, and tile rate creation is set to 10000.
However, I only (max) get 50% total CPU utilization and most times only a single core goes above 50%. And an average of 500 tiles per second -- independent of how many threads or processes I specify. I've been trying to get all zoom-levels (4 to 27) seeded for the last couple of days, but I've only managed to get through 4-12, before being severely bottle-necked at a mere 3GB of a couple million tiles.
Memory utilization is at a stable 2.4% for 8GB PC4-2133 for mapcache_seed (0.5 for the WMS). Write speeds are at 100 MB/s, no-buffer write is also 100 MB/s, while buffered+cache is at 6.7-8.7 GB/s on a SATA III 1TB HDD. I have another SSD drive on my machine that gets 6 GB/s write and 8 GB/s read, but it's too small for storage and I'm afraid of drive failure from too many writes.
The cached tiles are around 4KB each and that means I get around 2MB worth of tiles every second. The majority of them aren't even tiles, but symlinks to a catch-all blank tile for empty tiles.
How would I go about speeding this process up? Messing with threads and limits, through mapcache_seed, does not make any discernible difference. This is also on a Debian Wheezy machine.
This is also being run through fast-cgi, using 256x256 px images, and disk cache with a restricted extent to a single country (otherwise mapcache starts generating nothing but symlinks to blank tiles, because more than 90% of the world is blank!)
Mapserver mapfile (redacted):
MAP
NAME "MAP"
SIZE 1200 800
EXTENT Foo Bar Baz Fwee
UNITS DD
SHAPEPATH "."
IMAGECOLOR 255 255 255
IMAGETYPE PNG
WEB
IMAGEPATH "/path/to/image"
IMAGEURL "/path/to/imageurl"
METADATA
"wms_title" "MAP"
"wms_onlineresource" "http://localhost/cgi-bin/mapserv?MAP=/path/to/map.map"
"wms_srs" "EPSG:4326"
"wms_feature_info_mime_type" "text/plain"
"wms_abstract" "Lorem ipsum"
"ows_enable_request" "*"
"wms_enable_request" "*"
END
END
PROJECTION
"init=epsg:4326"
END
LAYER
NAME base
TYPE POLYGON
STATUS OFF
DATA polygon.shp
CLASS
NAME "Polygon"
STYLE
COLOR 0 0 0
OUTLINECOLOR 255 255 255
END
END
END
LAYER
NAME outline
TYPE LINE
STATUS OFF
DATA line.shp
CLASS
NAME "Line"
STYLE
OUTLINECOLOR 255 255 255
END
END
END
END
mapcache.xml (redacted):
<?xml version="1.0" encoding="UTF-8"?>
<mapcache>
<source name="ms4wserver" type="wms">
<getmap>
<params>
<LAYERS>base</LAYERS>
<MAP>/path/to/map.map</MAP>
</params>
</getmap>
<http>
<url>http://localhost/wms/</url>
</http>
</source>
<cache name="disk" type="disk">
<base>/path/to/cache/</base>
<symlink_blank/>
</cache>
<tileset name="test">
<source>ms4wserver</source>
<cache>disk</cache>
<format>PNG</format>
<grid>WGS84</grid>
<metatile>5 5</metatile>
<metabuffer>10</metabuffer>
<expires>3600</expires>
</tileset>
<default_format>JPEG</default_format>
<service type="wms" enabled="true">
<full_wms>assemble</full_wms>
<resample_mode>bilinear</resample_mode>
<format>JPEG</format>
<maxsize>4096</maxsize>
</service>
<service type="wmts" enabled="false"/>
<service type="tms" enabled="false"/>
<service type="kml" enabled="false"/>
<service type="gmaps" enabled="false"/>
<service type="ve" enabled="false"/>
<service type="mapguide" enabled="false"/>
<service type="demo" enabled="false"/>
<errors>report</errors>
<locker type="disk">
<directory>/path/</directory>
<timeout>300</timeout>
</locker>
</mapcache>
So for anyone coming across this ten years later, when what's left of the little documentation for these tools has rotted away, I messed with my mapcache and mapfile settings to get something better. It wasn't that my generation was too slow, it was that I was generating TOO MANY GODDAMN SYMLINKS to blank files. First, the mapfile extent was incorrect. Second, I was using the "WGS84" grid, which by default seeds ALL extents. That means 90% of all my tiles were just symlinks to a blank.png, and it ate up ALL of my inodes. I recommend using mkdir blank; rsync -a --delete blank/ /path/to/cache for a quick clearing of all that mess.
I fixed the above by taking the WGS84 specifications and changing the extent to the one I specified in my mapfile. Now, only my mapfile gets seeded. Lastly, I appended the grid XML element like so:
<grid restricted_extent="MAP FILE EXTENT HERE">GRIDNAME</grid>
With restricted_extent now it's certain that only my map gets seeded. I had over 100 million tiles, but they were all goddamn symlinks! Otherwise, I got a "Ran out of space" or something of some such. Even if df shows that the partition isn't full, it's misleading. Symlinks take up inode space, not logical space! To see inode space, run df -hi. I was at 100% inode space, but only 1% logical space on a 1TB drive -- filled with goddamn symlinks!

why is virtio-scsi much slower than virtio-blk in my experiment (over and ceph rbd image)?

Hi I recently did a experiment of virtio-scsi over rbd through qemu target (for its DISCARD/TRIM support), and compared the throughput and iops with that of a virtio-blk over rbd setup on the same machine, using fio in the guest. Turnout the throughput in sequential read write is 7 times smaller (42.3MB/s vs 309MB/s) and the iops in random read write is 10 times smaller (546 vs 5705).
What I did is setting up a virtual machine using OpenStack Juno, which give me the virtio-blk over rbd setup. Then I modified the relevant part in libvirt configure xml, from this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback'/>
<auth username='cinder'>
<secret type='ceph' uuid='482b83f9-be95-448e-87cc-9fa602196590'/>
</auth>
<source protocol='rbd' name='vms/c504ea8b-18e6-491e-9470-41c60aa50b81_disk'>
<host name='192.168.20.105' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
to this:
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
<auth username='cinder'>
<secret type='ceph' uuid='482b83f9-be95-448e-87cc-9fa602196590'/>
</auth>
<source protocol='rbd' name='vms/c504ea8b-18e6-491e-9470-41c60aa50b81_disk'>
<host name='192.168.20.105' port='6789'/>
</source>
<target dev='vda' bus='scsi'/>
<controller type='scsi' model='virtio-scsi' index='0'/>
</disk>
The software versions are:
qemu 2.5.1
libvirt 1.2.2
kernel 3.18.0-031800-generic #201412071935 SMP Mon Dec 8 00:36:34 UTC 2014 x86_64 (a Ubuntu 14.04 kernel)
And the hypervisor is KVM.
I don't think the performance difference could be that large between virtio-scsi and virtio-blk. So please point out what I did wrong, and how to achieve a reasonable performance.
A constraint is that I want a solution that works for OpenStack (ideal if works for Juno) without many patching or coding around. E.g., I heard of virtio-scsi + vhost-scsi + scsi-mq, but that seems not available in OpenStack right now.
The simple answer is that VirtIO-SCSI is slightly more complex than VirtIO-Block. Borrowing the simple description from here:
VirtIO Block has the following layers:
guest: app -> Block Layer -> virtio-blk
host: QEMU -> Block Layer -> Block Device Driver -> Hardware
Whereas VirtIO SCSI has looks like this:
guest: app -> Block Layer -> SCSI Layer -> scsi_mod
host: QEMU -> Block Layer -> SCSI Layer -> Block Device Driver -> Hardware
In essence, VirtIO SCSI has to go through another translation layer compared to VirtIO Block.
For most cases using local devices, it will as a result be slower. There are a couple of odd specific cases where the reverse is sometimes true though, namely:
Direct passthrough of host SCSI LUN's to the VirtIO SCSI adapter. This is marginally faster because it bypasses the block layer on the host side.
QEMU native access to iSCSI devices. This is sometimes faster because it avoids the host block and SCSI layers entirely, and doesn't have to translate from VirtIO Block commands to SCSI commands.
For the record though, there are three non-performance related benefits to using VirtIO SCSI over VirtIO Block:
It supports far more devices. VirtIO Block exposes one PCI device per block device, which limits things to around 21-24 devices, whereas VirtIO SCSI uses only one PCI device, and can handle an absolutely astronomical number of LUN's on that device.
VirtIO SCSI supports the SCSI UNMAP command (TRIM in ATA terms, DISCARD in Linux kernel terms). This is important if you're on thinly provisioned storage.
VirtIO SCSI exposes devices as regular SCSI nodes, whereas VirtIO Block uses a special device major. This isn't usually very important, but can be helpful when converting from a physical system.
You enabled discard unmap in your modified configure.xml:
<driver name='qemu' type='raw' cache='writeback' discard='unmap' />
This scrubs the blocks on the fly.

"Insufficient content duration available" when playing a stream through the SmoothStreamingMediaElement

I am working on an application that features IIS Smooth Streaming using the SmoothStreamingMediaElement. Because of the nature of the project I can't disclose the source of the stream, I can however provide full technical information on the problem I encounter.
I seperated the SmoothStreaming part into a seperate application for testing purposes. Everything seems to be working well since the test stream provided by Microsoft works the way it should (http://video3.smoothhd.com.edgesuite.net/ondemand/Big%20Buck%20Bunny%20Adaptive.ism/Manifest)
I took the restrictions for SmoothStreaming on Windows Phone into account:
- In the ManifestReady event the available tracks are filtered to show only one available resolution
- The device is not connected through Zune while testing.
The error message presented is very clear:
"3108 Insufficient content duration available to begin playback.
Available = 3840 ms, Required = 7250 ms"
I have not been able to find any references to this error. I did find some more information on where the required duration of 7250 ms originates from. This MSDN page suggests it has something to do with the LivePlaybackOffset which defaults at 7 seconds and cannot be changed in the WP7 SmoothStreamingMediaElement. The same code works fine in a browser-silverlight application.
I don't have direct access to the server providing the stream. Is there a way to address this issue clientside? Or does it require server-side configuration? If it helps I can share parts of the source code, please let me know what parts would be relevant. Your help is highly appreciated!
This is the manifest file:
<SmoothStreamingMedia MajorVersion="2" MinorVersion="2" TimeScale="10000000" Duration="0" LookAheadFragmentCount="2" IsLive="TRUE" DVRWindowLength="300000000">
<StreamIndex Type="audio" QualityLevels="1" TimeScale="10000000" Name="audio" Chunks="7" Url="http://xxxx/xxx.isml/QualityLevels({bitrate})/Fragments(audio={start time})">
<QualityLevel Index="0" Bitrate="128000" CodecPrivateData="1190" SamplingRate="48000" Channels="2" BitsPerSample="16" PacketSize="4" AudioTag="255" FourCC="AACL"/>
<c t="3485836800000" d="38400000" r="7"/>
</StreamIndex>
<StreamIndex Type="video" QualityLevels="6" TimeScale="10000000" Name="video" Chunks="7" Url="http://xxxx/xxx.isml/QualityLevels({bitrate})/Fragments(video={start time})" MaxWidth="1024" MaxHeight="576" DisplayWidth="1024" DisplayHeight="576">
<QualityLevel Index="0" Bitrate="350000" CodecPrivateData="000000016742E01596540D0FF3CFFF80980097A440000003004000000CA10000000168CE060CC8" MaxWidth="405" MaxHeight="228" FourCC="AVC1" NALUnitLengthField="4"/>
<QualityLevel Index="1" Bitrate="700000" CodecPrivateData="000000016742E01E965404814F2FFF8140013FA440000003004000000CA10000000168CE060CC8" MaxWidth="568" MaxHeight="320" FourCC="AVC1" NALUnitLengthField="4"/>
<QualityLevel Index="2" Bitrate="1000000" CodecPrivateData="000000016742E01E965405217F7FFE0B800B769100000300010000030032840000000168CE060CC8" MaxWidth="654" MaxHeight="368" FourCC="AVC1" NALUnitLengthField="4"/>
<QualityLevel Index="3" Bitrate="1300000" CodecPrivateData="00000001674D4028965605819FDE029100000300010000030032840000000168EA818332" MaxWidth="704" MaxHeight="396" FourCC="AVC1" NALUnitLengthField="4"/>
<QualityLevel Index="4" Bitrate="1600000" CodecPrivateData="00000001674D402A965605A1AFCFFF80CA00CAA440000003004000000CA10000000168EA818332" MaxWidth="718" MaxHeight="404" FourCC="AVC1" NALUnitLengthField="4"/>
<QualityLevel Index="5" Bitrate="2000000" CodecPrivateData="00000001674D4032965300800936029100000300010000030032840000000168E96060CC80" MaxWidth="1024" MaxHeight="576" FourCC="AVC1" NALUnitLengthField="4"/>
<c t="3485836800000" d="38400000" r="7"/>
</StreamIndex>
</SmoothStreamingMedia>
I know this question is a bit old but I had a very similar problem today, so I thought I should answer it...
The problem is with the r="7"
This parameter is not documented in the MS documentation and only found in Smooth Streaming version 2.2 and above (not 2.0).
r="7" means that the chunk in the manifest should be repeated 7 times, which means you have 7 * 3.84 sec in total.
There is a blog post which explains it here:
http://blogs.iis.net/samzhang/archive/2011/03/10/how-to-troubleshoot-live-smooth-streaming-issues-part-5-client-manifest.aspx

Resources