Can't create multiple striped disk volume in Azure VM - windows

I created an Azure Medium instance Windows 2012 Server and I'm having a problem striping together multiple Azure data disks into a single volume using the Server Manager tool.
In Azure I provisioned the medium instance and then created 4 data disks of 60GB each. I then rdp'ed into the server and inside Server Manager under File and Storage Services\Volumes I saw in the Disks section my 4 data disks listed under the C:\ and D:\ drives that come with this instance. I initialized my 4 data disks (later I also tried NOT initializing them) but when I clicked on "Storage Pools" in the nav bar under the Virtual Disk section I only saw 1 of my data disks.
I saw no way to add any of the other 3 data disks into my Storage Pool and then of course into the subsequent Virtual Disk. This problem limits me to just one data disk in my Virtual Disk. I have tried this many different times and the result is always the same.
Does anyone know what can be causing this or have steps to do the same thing I'm trying to do?
Thanks
If you're wondering why I'm trying to stripe these instead of using just 1 large data disk, this article explains the performance benefits of doing so:
http://blog.aditi.com/cloud/windows-azure-virtual-machines-lessons-learned/

In my blog post I explain how to do this, although perhaps the level of detail you are looking for isn't there. Still, everyone that followed this post (it was a lab) was able to create the striped volume. The blog post is a complete lab; go down half way to see the section about the striped volume. Let me know if you have any questions.
http://geekswithblogs.net/hroggero/archive/2013/03/20/windows-azure-it-roadshow-lab-i.aspx
Thanks

I hit the same problem and some Googling revealed that this is a bug in Server Manager (sorry, can't find the link). The workaround is to use PowerShell to create the pool. These commands will create a new Storage Pool called "Storage" and assign all the available disks to it:
$spaces = (Get-StorageSubSystem | where {$_.Model -eq "Storage Spaces"}[0]).UniqueID
New-StoragePool -FriendlyName "Storage" -StorageSubSystemUniqueId $spaces -PhysicalDisks (Get-PhysicalDisk -CanPool $true)

Related

Databricks and Delta cache setting

I am trying to follow the instructions on the MSFT website to use delta cache and hoping someone would help me understand it a little better:
https://learn.microsoft.com/en-us/azure/databricks/delta/optimizations/delta-cache
So In the guide it mentions that I should use Standard_E or L series of VMs. Our workload is now set to use the F series machines and when I tried to use only E or L it seemed that the job ran longer and would be using more DBUs.
I did however notice that the Dv3 series allow you to use delta caching (ex: Standard_D16s_v3 VMs). I tried to run some of our workloads using those types of machines and notices that under the storage tab it now shows a similar screen as in the MSFT docs:
Problem is that I am not sure if that is the right way to go about this. The reason I wanted to try to use Dv3 VMs was because it was relatively comparable to the F series, but also seem to allow the delta caching.
I am also wondering if the MSFT recommendation of using the following settings is correct or if they can be different:
spark.databricks.io.cache.maxDiskUsage 50g
spark.databricks.io.cache.maxMetaDataCache 1g
spark.databricks.io.cache.compression.enabled false
Has any one else played with this and can recommend what they did it would be much appreciated.
As background we have the databricks clusters spin up using our Databricks Linked Service (from ADF) and in that linked service we put the following settings:
This is what sends the config settings to the automated clusters that are spun up when we execute Databricks notebooks though ADF.
Thank you

Windows: Automatically Detect, Stripe, Provision, Mount AWS EC2 Ephemeral Disks for Jenkins Agents

I have windows build agents for Jenkins running in EC2, and I would like to make use of the ephemeral disks coming from the "d" type instances (C5ad.4xl for example gives 2 x 300GB NVMe) to take advantage of the high IO available on those disks.
Since they are build agents the ephemeral nature of the drives is fine. I need something that will detect, provision and mount those disks as a drive in Windows basically regardless of size and number. I can do this easily (LVM or software RAID etc.) in Linux but although there is a guide from 2014 here for achieving this, it doesn't seem to work on Windows Server 2019 and the latest instances.
That same post makes reference to new commandlets added from Server 2012 R2 but those do not support converting the disks to dynamic (which is a key step needed to stripe them done by diskpart in the original post's code), so they cannot be used to directly do what is required.
Are there any other options to make this work dynamically, ideally with powershell (or similar) that can be passed to the Jenkins agent at boot time as part of its config?
Windows now has Storage Pools and these can be used to do what is needed here. This code successfully detected multiple disks, added them to the pool striped, used the max size available and mounted the new volume on the drive letter "E":
# get a list of the disks that can be pooled
$PhysicalDisks = (Get-PhysicalDisk -CanPool $true)
# only take action if there actually are disks
if ($PhysicalDisks) {
# create storage pool using the discovered disks, called ephemeral in the standard subsystem
New-StoragePool –FriendlyName ephemeral -StorageSubSystemFriendlyName "Windows Storage*" –PhysicalDisks $PhysicalDisks
# Create a virtual disk, striped (simple resiliency in its terms), use all space
New-VirtualDisk -StoragePoolFriendlyName "ephemeral" -FriendlyName "stripedephemeral" -ResiliencySettingName Simple -UseMaximumSize
# initialise the disk
Get-VirtualDisk -FriendlyName 'stripedephemeral'|Initialize-Disk -PartitionStyle GPT -PassThru
# create a partition, use all available size (this will pop up if you do it interactively to format the drive, not a problem when running as userdata via Jenkins config)
New-Partition -DiskNumber 3 -DriveLetter 'E' -UseMaximumSize
# format as NTFS to make it useable
Format-Volume -DriveLetter E -FileSystem NTFS -Confirm:$false
# this creates a folder on the drive to use as the workspace by the agent
New-Item -ItemType Directory -Force -Path E:\jenkins\workspace
}
There are some assumptions here about the number of disks, and that will vary based on the instance type, but generally it will take any ephemeral disks it finds, if there is more than one it will stripe across them, and then uses the entire disk size available once the volume has been created/formatted. This can all be wrapped in <powershell></powershell> and added to the user data section of the Jenkins agent config so that it is run at boot.

What's the proper storage location for a database for a cross platform command line program?

I wrote a simple note taking program that's nothing more than a dictionary mapping a key to a value. IE
$ hlp -key age -value 25
$ hlp age
25
and it just stores information in a json file hardcoded to ~/.hlp.json. But I was wondering if there's likely some standard location I should be putting this file. Is there a standard location for databases like this?
A useful resource here is the hier(7) man page. (http://linux.die.net/man)
Data that is only going to be used by you belongs in $HOME, traditionally hosted under /home.
For something that is used to support the system itself, you'd be using /var. For applications that are just hosted on the system, you'd use /var/opt.
If the application is something big that could be replicated or moved to another system, you'd create a separate filesystem with a mount point outside any of those listed in hier(7). This could be a filesystem mounted from a SAN or NAS, which whould help mobility of the application.
Once you actually need to access the data from different machines, you'd have to move it to a network accessable key/value store or sql database.

DFS starts a new metadata refresh of the namespace every hour and never finishes - replication halted

My organisation uses DFS to replicate three servers - a hub at one site, a spoke at one site, and a spoke at a remote site. They contain a number of folders that are separate shares, all within the same replication group. So the namespace is CompFileShare, and it contains IT, Public, etc as shares that users can map based on permissions.
Our remote site's drive recently filled up, and since the drive it was hosted on was MBR and maxed out at 2 TB, we created a new 4 TB GPT drive. We then robocopied the MBR drive to the GPT drive using the recommended xcopy command flags, and let it finish. After it finished, we unshared the original IT folder, shared the new IT folder on the GPT drive, and changed the replication target from MBR to GPT and let it replicate.
For some reason, possibly unrelated, we are now seeing the remote site's fileshare server throwing Event ID 516 every hour:
DFSN service has started performing complete refresh of metadata for namespace TTFileShare. This task can take time if the namespace has large number of folders and may delay namespace administration operations.
Because it is running, all replication to the remote site is halted, even for shares still on the old MBR drive. It has been sitting like this for a day before I really got a chance to look at it. Any tips or places I should look to resolve this issue?
So...
Part of the migration reason was that our MBR drive was completely full. 100 Kbs left.
We deleted a folder (that was already robocopied to the new drive) from the full drive, (freed up ~20 MB) and restarted the server.
The server then replicated the everything, no problems. So lesson learned - don't let drives get so full DFS can't even manage their data.
EDIT: Also, the namespace refresh is flagged as warning when it should be flagged as informational. If you look under informational messages you will see the namespace refresh finished message shortly after the namespace refresh started message. This is the actual cause of the title problem - a microsoft bug mislabeling namespace refresh as a warning. Its just informational

How to give RW permissions on a folder in Windows Azure?

I am deploying an MVC 3.0 web app to Windows Azure. I have an action method that takes a file uploaded by the user and stores it in a folder within my web app.
How could I give RW permissions to that folder to the running process? I read about start up tasks and have a basic understanding, but I wouldn't know,
How to give the permission itself, and
Which running process (user) should I give the permission to.
Many thanks for the help.
EDIT
In addition to #David's answer below, I found this link extremely useful:
https://www.windowsazure.com/en-us/develop/net/how-to-guides/blob-storage/
For local storage, I wouldn't get caught up with granting access permissions to various directories. Instead, take advantage of the storage resources available specifically to your running VM's. With a given instance size, you have local storage available to you ranging from 20GB to almost 2TB (full sizing details here). To take advantage of this space, you'd create local storage resources within your project:
Then, in code, grab a drive letter to that storage:
var storageRoot = RoleEnvironment.GetLocalResource("moreStorage").RootPath;
Now you're free to use that storage. And... none of that requires any startup tasks or granting of permissions.
Now for the caveat: This is storage that's local to each running instance, and isn't shared between instances. Further, it's non-durable - if the disk crashes, the data is gone.
For persistent, durable file storage, Blob Storage is a much better choice, as it's durable (triple-replicated within the datacenter, and geo-replicated to another datacenter) and it's external to your role instances, accessible from any instance (or any app, including your on-premises apps).
Since blob storage is organized by container, and blobs within container, it's fairly straightforward to organize your blobs (and store pretty much anything in a given blob, up to 200GB each). Also, it's trivial to upload/download files to/from blobs, either to file streams or local files (in the storage resources you allocated, as illustrated above).

Resources