Accelerating VAAI

Posted: December 9th, 2010 | Author: | Filed under: Virtualization | Tags: , , , , | 1 Comment »

This post originally appeared on Juku but I find it technical enough to be featured in my personal blog :-)

By now everyone and his dog already made a post about VAAI, I would not bother you with an extensive explanation of what is VAAI and why it’s crucial to Virtualization, I will simply refer to a couple of posts that explain its current implementation in details:

My focus will be on how I envision to accelerate VAAI even more, enhancing its storage side.

To explain my point of view I will do an analogy with a common feature found in storage arrays today: Point-in-Time Copies.

Point-in-time copies (sometimes referred to as Snapshots) are a really valuable feature, they provide a consistent point in time of a specified Data set in order to perform various tasks like: backups, environment duplications and so on.

Traditionally PIT copies were made using a technique called Copy-On-Write which is suitable for a small number of PIT for a single LUN but its performance issues take their toll as soon as their first PIT is created, PIT copies concept was pioneered by IBM with its FlashCopy functionality.

NetApp innovated the approach to PIT copies using a different pointer-based snapshot technique, this almost completely eliminated the performance issue and made possible a massive number of multiple snapshots per single LUN enabling the complete potential of the PIT concept, this post explain how the Compellent Storage Center pointer-based snapshots works in detail, however this is not specific to Compellent, almost all the next-generation storage arrays (like IBM XIV, NetApp FAS, 3Par InServ, Dell Equallogic, HP Lefthand and many others) use the same approach.

So basically we have a great concept (PIT copies) but with most of its potential still locked by its implementation (Copy-on-Write) and then we have an innovator that enable its full potential with a clever implementation and I’m pretty sure that VAAI is still in its “Copy-on-Write” stage of life :-) .

As you already know VAAI is implemented using an extended SCSI command set, Let’s take as example the most sought-after feature: the Hardware Offloaded Copy.
The hardware offload copy in my opinion can be accelerated to 100000x making all the cloning tasks a matter of few seconds, here’s how:

Keep in mind how a pointer-based snapshot works and bear with me with my explanation:

A 16GB VM sitting in a 128GB Datastore is currently accessed by an ESX host.

Then a VAAI-enabled Clone request is issued by the host, the storage array, instead of doing a real block-to-block copy, simply create a “map” of pointers of the cloned VM on another portion of the datastore, locking its space but without issuing a single block copy, this operation should take the same time as a normal snapshot: few seconds.

Then the host start to write to the new cloned VM and the delta differences are stored in the blocks locked by the “map” previously created.

A similar task can be already done today using snapshots, but it becomes cumbersome immediately because every clone needs to reside on its own LUN and datastore, this approach, instead, can be applied “inside” a datastore streamlining the deployments. Just imagine a VDI infrastructure relying on such cloning technique! :-) .

I’m sure that storage vendors will try to integrate and innovate their respective VAAI implementations, I hope this post made you realize how powerful can be the still-evolving VAAI approach.

Technorati Tags: , , ,


My new blogging effort: Juku.it !

Posted: December 7th, 2010 | Author: | Filed under: Storage, Virtualization | Tags: , , , | 1 Comment »

It’s been a long time since my last post, as you may already know I’ve been very busy obtaining the VCDX certification and I’ve been also knee deep in getting a new blog online: Juku.it

Me and Enrico decided to convey our blogging effort into a more open and agnostic form, without being tied to a specific vendor (I work as Architect at a small consulting firm called Cinetica) and so Juku was born, as told in the “Why Juku?” section Jukus are private Japanese schools and they’re intended to help students improve performance in their regular school work and to help them better prepare for exams, and that’s precisely our goal, not to replace the traditional information channels, but to augment them with our opinions in the more unbiased manner possible.

I will continue to post the more technical articles and my personal thoughts on P2V It! so don’t just unsubscribe it :-)

Technorati Tags: , , ,


My new Storage Tools page

Posted: September 14th, 2010 | Author: | Filed under: Storage | Tags: , , , | 1 Comment »

Last week I updated my old WWN Decoder page (renamed as “Storage Tools“) with three useful storage widgets: RAID Space Calculator, a RAW IOPS Calculator and a Replication Bandwidth Calculator.

The IOPS calculator is a bit simplistic right now, I’m trying to improve it to include latency and other determining factors in the IOPS calculation.

Comments and suggestions are VERY welcomed!

Technorati Tags: , ,


How to join a NetApp FAS to Active Directory

Posted: July 29th, 2010 | Author: | Filed under: Storage | Tags: , , , , | No Comments »

Couple of weeks ago I was preparing a demo lab for a technology event held by my company here in San Marino and I had to join a couple of NetApp filers to an Active Directory environment.

The process itself is very simple but there are a couple of things to keep in mind regarding the time so I thought it would be nice to share them.

Before starting, here’s a bit of background on why clock is very important:

Active Directory authentication is based on a protocol called Kerberos, which use a ticketing system to grant access, the system time is very important because:

[...] In order to prevent intruders from resetting their system clocks in order to continue to use expired tickets, Kerberos V5 is set up to reject ticket requests from any host whose clock is not within the specified maximum clock skew of the KDC. Similarly, hosts are configured to reject responses from any KDC whose clock is not within the specified maximum clock skew of the host. The default value for maximum clock skew is 300 seconds, or five minutes. [...]

(taken from the Kerberos V5 System Administrator’s Guide).

So, basically, if the system clock of a machine is not within the 5 minutes range, the Kerberos system deny the authentication saying “clock skew too great”.

In order to avoid this we need to make sure that our NetApp FAS is within the acceptable range because even the join cannot complete if the clocks are not aligned, so first of all, issue a date command with this syntax:

demo02> date 201002171454
Warning: syncing time to an external time source which will eventually override the time set by the date command.

201002171425 which is (YYYYMMDDhhmm) means:

February, 17th 2010 2:54pm

And then we need to configure the NTP server to keep the time in sync with the Domain Controllers:

demo02> options timed.enable off
demo02> options timed.proto ntp
demo02> options timed.servers <NTP SERVER ADDRESS>
demo02> options timed.max_skew 5m
demo02> options timed.enable on

Now you can proceed with the domain join which is a very simple wizard-like interactive procedure, the command is cifs setup and here you can find a transcript:

demo02> cifs setup              
This process will enable CIFS access to the filer from a Windows(R) system.
Use "?" for help at any prompt and Ctrl-C to exit without committing changes.
 
        Your filer does not have WINS configured and is visible only to
        clients on the same subnet.
Do you want to make the system visible via WINS? [n]: 
        A filer can be configured for multiprotocol access, or as an NTFS-only
        filer. Since multiple protocols are currently licensed on this filer,
        we recommend that you configure this filer as a multiprotocol filer
 
(1) Multiprotocol filer
(2) NTFS-only filer
 
Selection (1-2)? [2]: 2
        CIFS requires local /etc/passwd and /etc/group files and default files
        will be created.  The default passwd file contains entries for 'root',
        'pcuser', and 'nobody'.
Enter the password for the root user []: 
Retype the password: 
        The default name for this CIFS server is 'DEMO02'.
Would you like to change this name? [n]: 
        Data ONTAP CIFS services support four styles of user authentication.
        Choose the one from the list below that best suits your situation.
 
(1) Active Directory domain authentication (Active Directory domains only)
(2) Windows NT 4 domain authentication (Windows NT or Active Directory domains)
(3) Windows Workgroup authentication using the filer's local user accounts
(4) /etc/passwd and/or NIS/LDAP authentication
 
Selection (1-4)? [1]: 1
What is the name of the Active Directory domain? [HANDS-ON.LOCAL]: HANDS-ON.LOCAL
        In order to create an Active Directory machine account for the filer,
        you must supply the name and password of a Windows account with
        sufficient privileges to add computers to the HANDS-ON.LOCAL domain.
Enter the name of the Windows user [Administrator@HANDS-ON.LOCAL]: Administrator@HANDS-ON.LOCAL
Password for Administrator@HANDS-ON.LOCAL: 
CIFS - Logged in as Administrator@HANDS-ON.LOCAL.
        The user that you specified has permission to create the filer's
        machine account in several (2) containers. Please choose where you
        would like this account to be created.
 
(1) CN=computers
(2) OU=Domain Controllers
(3) None of the above
 
Selection (1-3)? [1]: 1
CIFS - Starting SMB protocol...
        It is highly recommended that you create the local administrator
        account (DEMO02\administrator) for this filer. This account allows
        access to CIFS from Windows when domain controllers are not
        accessible.
Do you want to create the DEMO02\administrator account? [y]: 
Enter the new password for DEMO02\administrator: 
 
Retype the password: 
        Currently the user "DEMO02\administrator" and members of the group
        "HANDS-ON\Domain Admins" have permission to administer CIFS on this
        filer. You may specify an additional user or group to be added to the
        filer's "BUILTIN\Administrators" group, thus giving them
        administrative privileges as well.
Would you like to specify a user or group that can administer CIFS? [n]: n
Welcome to the HANDS-ON.LOCAL (HANDS-ON) Active Directory(R) domain.
 
CIFS local server is running.

As you can see it’s a really simple and straightforward process, and you can even fire up compmgmt.msc from your Windows box and point it to the NetApp to see and map shares!.

Technorati Tags: , , ,

 


Simple way to extend an aggregate in a NetApp FAS

Posted: July 26th, 2010 | Author: | Filed under: Storage | Tags: , , , , | 4 Comments »

In the last couple of days I had the pleasure of play around with a FAS2020, the smallest unified storage made by NetApp. It’s a very nice machine indeed, it’s really “user friendly” (from a UNIX admin perspective :-) , is packed with great features (Deduplication, Snapshots and so on) and gives you the maximum degree of flexibility when it comes down to troubleshooting.

During my tests with this FAS I found myself with a wrong aggregate layout:

fas2020-01> sysconfig -r
Aggregate aggr0 (online, raid_dp) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal)
 
      RAID Disk	Device  	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------	------  	------------- ---- ---- ---- ----- --------------    --------------
      dparity 	0c.00.0 	0c    0   0   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      parity  	0c.00.1 	0c    0   1   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.2 	0c    0   2   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
 
 
Spare disks
 
RAID Disk	Device  	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------	------  	------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare   	0c.00.3 	0c    0   3   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.4 	0c    0   4   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.5 	0c    0   5   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.6 	0c    0   6   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.7 	0c    0   7   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.8 	0c    0   8   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.9 	0c    0   9   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.10	0c    0   10  SA:B   -  SAS  15000 272000/557056000  274845/562884296 
spare   	0c.00.11	0c    0   11  SA:B   -  SAS  15000 272000/557056000  274845/562884296

As you can see my Aggregate “aggr0″ was comprised of just 3 disks, in fact this is a kind of “best practice” in the NetApp world, because the system volume “vol0″ reside on the first aggregate and is usually kept separate from the real data to preserve the system in case of something bad occurs to the data disks.

But, in my current test situation I had to extend the aggregate 0 to span 11 disks (leave just 1 for spare), using this command:

aggr add aggr0 8@300G

Immediately a stream of messages comes up in console stating that the disks has been added to the aggregate 0:

Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.10 Shelf 0 Bay 10 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXJLK5C] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.9 Shelf 0 Bay 9 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXJM20C] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.8 Shelf 0 Bay 8 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXJLT7C] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.7 Shelf 0 Bay 7 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXK5P2C] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.6 Shelf 0 Bay 6 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXHWVGC] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.5 Shelf 0 Bay 5 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXK4T2C] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.4 Shelf 0 Bay 4 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXK5VXC] to aggregate aggr0 has completed successfully
Wed Mar 31 13:46:21 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /aggr0/plex0/rg0/0c.00.3 Shelf 0 Bay 3 [NETAPP   X287_HVPBP288A15 NA00] S/N [JLXJZ2VC] to aggregate aggr0 has completed successfully
Addition of 8 disks to the aggregate has completed.

And if we check again the system configuration we found out that our aggregate has been extended:

fas2020-01> sysconfig -r         
Aggregate aggr0 (online, raid_dp) (block checksums)
  Plex /aggr0/plex0 (online, normal, active)
    RAID group /aggr0/plex0/rg0 (normal)
 
      RAID Disk	Device  	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      ---------	------  	------------- ---- ---- ---- ----- --------------    --------------
      dparity 	0c.00.0 	0c    0   0   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      parity  	0c.00.1 	0c    0   1   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.2 	0c    0   2   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.3 	0c    0   3   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.4 	0c    0   4   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.5 	0c    0   5   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.6 	0c    0   6   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.7 	0c    0   7   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.8 	0c    0   8   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.9 	0c    0   9   SA:B   -  SAS  15000 272000/557056000  274845/562884296 
      data    	0c.00.10	0c    0   10  SA:B   -  SAS  15000 272000/557056000  274845/562884296 
 
 
Spare disks
 
RAID Disk	Device  	HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------	------  	------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare   	0c.00.11	0c    0   11  SA:B   -  SAS  15000 272000/557056000  274845/562884296

Now if we have volumes on this aggregate that we would like to “restripe” to use the new disks we can issue the reallocate command, like this:

reallocate start -f /vol/vol0

and then check the progress with reallocate status:

fas2020-01> reallocate status            
Reallocation scans are on
/vol/vol0: 
        State: Reallocating: Inode 35941, block 0 of 1 (0%)
     Schedule: n/a
     Interval: 1 day
 Optimization: 1

It’s really simple like that.

Technorati Tags: , , ,