Accelerating VAAI

Posted: December 9th, 2010 | Author: | Filed under: Virtualization | Tags: , , , , | 1 Comment »

This post originally appeared on Juku but I find it technical enough to be featured in my personal blog :-)

By now everyone and his dog already made a post about VAAI, I would not bother you with an extensive explanation of what is VAAI and why it’s crucial to Virtualization, I will simply refer to a couple of posts that explain its current implementation in details:

My focus will be on how I envision to accelerate VAAI even more, enhancing its storage side.

To explain my point of view I will do an analogy with a common feature found in storage arrays today: Point-in-Time Copies.

Point-in-time copies (sometimes referred to as Snapshots) are a really valuable feature, they provide a consistent point in time of a specified Data set in order to perform various tasks like: backups, environment duplications and so on.

Traditionally PIT copies were made using a technique called Copy-On-Write which is suitable for a small number of PIT for a single LUN but its performance issues take their toll as soon as their first PIT is created, PIT copies concept was pioneered by IBM with its FlashCopy functionality.

NetApp innovated the approach to PIT copies using a different pointer-based snapshot technique, this almost completely eliminated the performance issue and made possible a massive number of multiple snapshots per single LUN enabling the complete potential of the PIT concept, this post explain how the Compellent Storage Center pointer-based snapshots works in detail, however this is not specific to Compellent, almost all the next-generation storage arrays (like IBM XIV, NetApp FAS, 3Par InServ, Dell Equallogic, HP Lefthand and many others) use the same approach.

So basically we have a great concept (PIT copies) but with most of its potential still locked by its implementation (Copy-on-Write) and then we have an innovator that enable its full potential with a clever implementation and I’m pretty sure that VAAI is still in its “Copy-on-Write” stage of life :-) .

As you already know VAAI is implemented using an extended SCSI command set, Let’s take as example the most sought-after feature: the Hardware Offloaded Copy.
The hardware offload copy in my opinion can be accelerated to 100000x making all the cloning tasks a matter of few seconds, here’s how:

Keep in mind how a pointer-based snapshot works and bear with me with my explanation:

A 16GB VM sitting in a 128GB Datastore is currently accessed by an ESX host.

Then a VAAI-enabled Clone request is issued by the host, the storage array, instead of doing a real block-to-block copy, simply create a “map” of pointers of the cloned VM on another portion of the datastore, locking its space but without issuing a single block copy, this operation should take the same time as a normal snapshot: few seconds.

Then the host start to write to the new cloned VM and the delta differences are stored in the blocks locked by the “map” previously created.

A similar task can be already done today using snapshots, but it becomes cumbersome immediately because every clone needs to reside on its own LUN and datastore, this approach, instead, can be applied “inside” a datastore streamlining the deployments. Just imagine a VDI infrastructure relying on such cloning technique! :-) .

I’m sure that storage vendors will try to integrate and innovate their respective VAAI implementations, I hope this post made you realize how powerful can be the still-evolving VAAI approach.

Technorati Tags: , , ,


vSphere 4.1 and its new CPU scheduler

Posted: July 22nd, 2010 | Author: | Filed under: Virtualization | Tags: , , , , , , | 1 Comment »

As you may already know vSphere 4.1 was released last week with much fanfare from VMware, and it’s definitely a worth upgrade, which comes for free (as usual) if you have a support contract.

This new vSphere release, which is a major milestone (as much as it was ESX 3.5 for the 3.0 version) comes with a truckload of new features that others blogger have already covered in depth, here’s a list of the most interesting posts:

Frank Denneman: Load Based TeamingDPM scheduled tasks and VM to Hosts affinity rule

Duncan Epping: Cluster Operational Status

Chad Sakac: vStorage APIs for Array Integration

But I would like to spend some time talking about the new CPU Scheduler which in my opinion is a great improvement, let’s focus on some of the changes:

- Further Relaxed Co-Scheduling

Now the Co-Scheduling enforcement is a per vCPU operation, it means that the VM is not completely stopped when the accumulated vCPUs skew cross the threshold, it’s just the single lagging vCPU that stops and then need to catch up.

- Elimination of CPU Scheduler Cell

The Cell mechanism worked well with 2 and 4 way vSMP with dual and quad core CPUs, but it was becoming a limiting factor in the 8 and 12 core era. Now the VM can be scheduled on every pCPU (not just in a single cell/socket) available on the system, thus utilizing all the processor cache and memory bandwidth available.

- Wide-VM NUMA Support

That’s an enhancement posed to improve performance in large systems that carry big vSMP VMs. A Wide-VM is a VM that has more vCPUs than the available cores on a NUMA node, like when you have a 4-way vSMP VM on a dual-core AMD Opteron. With ESX 4.1 they can take advantage of NUMA Management.

You can also find a very interesting paper directly from VMware which explain in great detail all the features described above, and shows some benchmarks too.

 

Technorati Tags: , , , , ,