When VMotion breaks ZFS…

Posted: April 26th, 2010 | Author: | Filed under: Storage | Tags: , , , , | 1 Comment »

Last Thursday I was sitting in a customer meeting when my phone rang furiously, it was one of my co-workers with a desperate cry for help, during a VMotion from ESX 3.5 to vSphere 4 (he was conducting an upgrage) a Solaris 10 VM with a big ZPool configured suddenly hanged.

The machine was completely unresponsive so he restarted it with a reset from VMware, the machine rebooted correctly but the ZPool was gone…

Let’s imagine that, after a reboot suddenly your 8TB fileshare disappear… creepy huh?

I thought it was impossible, I did some investigation and finally found out the problem: the 2TB minus 512b limit of vSphere.

The VM was built on ESX 3.5 with 2TB RDM’s, during the VMotion something bad happened, VMware chopped off 512b from the disk view, harming the disk label. Fortunately after a couple of hours of investigation (and trial & error, thanks to Compellent Replays) I found the correct partition alignment and after a relabel of the disk everything went back to life!

It was a fun ride after all :-)


A Next Generation Multiprotocol storage for your homelab (on a budget) – Part 2

Posted: April 22nd, 2010 | Author: | Filed under: Storage, Virtualization | Tags: , , , , | No Comments »

Welcome back to the second part of this tutorial, this time we’ll see how to configure and use the iSCSI target portion of COMSTAR, let’s start immediatly!

root@opensolaris:~# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
igb0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 192.168.0.174 netmask ffffff00 broadcast 192.168.0.255
        ether 0:14:4f:cb:15:90 
igb1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet 10.10.1.1 netmask ffffff00 broadcast 10.10.1.255
        ether 0:14:4f:cb:15:91 
igb2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
        inet 10.10.2.1 netmask ffffff00 broadcast 10.10.2.255
        ether 0:14:4f:cb:15:92 
igb3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet 10.10.3.1 netmask ffffff00 broadcast 10.10.3.255
        ether 0:14:4f:cb:15:93 
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::1/128

Here’s the output for ifconfig, our network interfaces for iSCSI traffic are igb1 and igb2, so we start creating a target-port-group for both

root@opensolaris:~# itadm create-tpg igb1 10.10.1.1
root@opensolaris:~# itadm create-tpg igb2 10.10.2.1

And then create a target based on both of them:

root@opensolaris:~# itadm create-target -t igb1,igb2
Target iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb successfully created

And let’s see the outcome:

root@opensolaris:~# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS 
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb  online   0        
        alias:                  -
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               igb2 = 3,igb1 = 2

Now i’m going to add an Alias to this target to refer it more simply on the initiator side:

root@opensolaris:~# itadm modify-target -l comstar iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb
Target iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb successfully modified

And the list:

root@opensolaris:~# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS 
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb  online   0        
        alias:                  comstar
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               igb2 = 3,igb1 = 2

Now let’s do the setup on the VMware side, let’s open the iSCSI software initiator and put the two IPs in the Dynamic Discovery tab:

After this step if we issue again the itadm list-target -v command you can see that the “SESSIONS” number has increased to two, indicating that the VMware host has opened the session to our new target:

root@opensolaris:~# itadm list-target -v
TARGET NAME                                                  STATE    SESSIONS 
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb  online   2        
        alias:                  comstar
        auth:                   none (defaults)
        targetchapuser:         -
        targetchapsecret:       unset
        tpg-tags:               igb2 = 3,igb1 = 2

Now we’re ready for some Volume mapping and creation!

We start creating a 50G Volume with zfs create:

root@opensolaris:~# zfs create -V 50G DataPool/iSCSIDatastore2

And then create an host-group for the iSCSI connection

root@opensolaris:~# stmfadm create-hg ESX4-iSCSI

And then add the IQN for our VMware host (you can check it in the iSCSI software initiator tab):

root@opensolaris:~# stmfadm add-hg-member -g ESX4-iSCSI iqn.1998-01.com.vmware:esxvdi-04c37843

Now we can create the LUN:

root@opensolaris:~# sbdadm create-lu /dev/zvol/rdsk/DataPool/iSCSIDatastore2
 
Created the following LU:
 
              GUID                    DATA SIZE           SOURCE
--------------------------------  -------------------  ----------------
600144f03ebec50000004ba8a1090001      53687091200     /dev/zvol/rdsk/DataPool/iSCSIDatastore2

Let’s check if everything is set:

root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA8A1090001
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/rdsk/DataPool/iSCSIDatastore2
    View Entry Count  : 0

Finally we map the LUN to the Host:

root@opensolaris:~# stmfadm add-view -h ESX4-iSCSI -n 0 600144F03EBEC50000004BA8A1090001

Now let’s do a refresh on VMware and check if everything’s correct

And also check if the Multipath is working fine:

And Voila! our iSCSI LUN is ready to be used.

This is the end of part 2, on part 3 I will explore the the NAS datastores configuration, on part 4 I will explain how to use Snapshots and Clones and part 5 will be focused on Deduplication, so stay tuned for more!


A Next Generation Multiprotocol storage for your homelab (on a budget) – Part 1

Posted: April 15th, 2010 | Author: | Filed under: Storage, Virtualization | Tags: , , , , | 3 Comments »

So you’ve built your Virtualization home lab using whitebox hardware and you’re running vSphere on it quite nicely, but what about storage?

Don’t tell me you’re using one of those expensive NAS/iSCSI boxes like the ix4… Yeah they can be quite nice for an out-of-the-box experience, but they can’t replicate a real storage infrastructure!

So what can I do to get a multiprotocol storage with NFS / CIFS support for NAS and iSCSI / FC support for SAN? and maybe even with multiple snapshot, cloning capabilities and DEDUPLICATION?

Here’s your answer: OpenSolaris and COMSTAR.

COMSTAR is an OpenSolaris project that’s currently the foundation for the Sun Storage 7000 storage family and it’s an opensource project. It’s an acronym for “COmmon Multiprotocol Scsi TARget” and coupled with the almighty ZFS it brings you a powerful storage solution at no cost.

This post is the first of a series where I’ll explain how to build a complete storage system using off-the-shelf parts, let’s start with the shopping list:

Hardware

  • Any x86 hardware that fit in this HCL will do fine.
  • If you need iSCSI target support you need a NIC supported by OpenSolaris (see HCL above).
  • If you need FC target support you need a QLogic HBA 4Gb or 8Gb (2Gb are not supported), or an Emulex LP10000 or newer card, you can find them on eBay quite cheap.

Software

  • That’s the easy part, you need only an OpenSolaris 2009.06 ISO, after the install everything will be downloaded using the IPS package manager that comes with OpenSolaris.

Now we’re ready to start, first of all we need to install OpenSolaris on our brand new machine, this is something that’s beyond the scope of this howto so I will just point out to an official guide from Sun where they guide you through the installation (which is very simple by the way) you can find it here.

The first step is to to install the COMSTAR stack, to install it we’ll use the IPS package manager with super user privileges (either with su - or pfexec):

pkg install -v storage-server

This should take care of everything, you’ll end up with something like that:

root@opensolaris:/# pkg install -v storage-server
Creating Plan - Before evaluation:
UNEVALUATED:
+pkg:/storage-server@0.1,5.11-0.111:20090508T165041Z
 
After evaluation:
None -> pkg:/storage-server@0.1,5.11-0.111:20090508T165041Z
None -> pkg:/SUNWvscan@0.5.11,5.11-0.111:20090508T164122Z
None -> pkg:/SUNWmda@0.5.11,5.11-0.111:20090508T162120Z
None -> pkg:/SUNWvscankr@0.5.11,5.11-0.111:20090508T164123Z
None -> pkg:/SUNWndmp@0.5.11,5.11-0.111:20090508T162452Z
None -> pkg:/SUNWstmf@0.5.11,5.11-0.111:20090508T163712Z
None -> pkg:/SUNWii@0.5.11,5.11-0.111:20090508T160911Z
None -> pkg:/SUNWscm@0.5.11,5.11-0.111:20090508T163449Z
None -> pkg:/SUNWspsv@0.5.11,5.11-0.111:20090508T163647Z
None -> pkg:/SUNWsmba@3.0.34,5.11-0.111:20090508T163557Z
None -> pkg:/SUNWdmgt@0.5.11,5.11-0.111:20090508T153928Z
None -> pkg:/SUNWisns@0.5.11,5.11-0.111:20090508T161051Z
None -> pkg:/SUNWrdc@0.5.11,5.11-0.111:20090508T163217Z
None -> pkg:/SUNWiscsitgt@0.5.11,5.11-0.111:20090508T161048Z
None -> pkg:/SUNWsmbfskr@0.5.11,5.11-0.111:20090508T163611Z
None -> pkg:/SUNWmms@0.5.11,5.11-0.111:20090508T162204Z
None -> pkg:/SUNWpostgr-83-libs@8.3.7,5.11-0.111:20090508T163014Z
None -> pkg:/SUNWsmbs@0.5.11,5.11-0.111:20090508T163612Z
None -> pkg:/SUNWsmbskr@0.5.11,5.11-0.111:20090508T163614Z
None -> pkg:/SUNWfilebench@0.5.11,5.11-0.111:20090508T154334Z
None -> pkg:/SUNWiscsi@0.5.11,5.11-0.111:20090508T161040Z
Actuators:
      restart_fmri: svc:/system/manifest-import:default
None
DOWNLOAD                                    PKGS       FILES     XFER (MB)
Completed                                  21/21     934/934   32.33/32.33 
 
PHASE                                        ACTIONS
Install Phase                              1898/1898

Then we need to install the new iSCSI target module based on COMSTAR, it’s required to do so because Sun already included a “standard” iSCSI target long before COMSTAR, you can do it with:

pkg install pkg:/SUNWiscsit

Then we’re done with the installs, right now we have everything in place but we need a reboot to continue, if you don’t need to configure any FC HBA for SAN Support (FC Target) you can skip this step, if you need to configure them proceed with this step:
First, we need to identify the device bindings for the standard Initiator driver, we can do it with mdb:

mdb -k

Right now we’re at the mdb prompt, to show the device bindings for the driver qlc (qlc is the initiator driver, the target driver will be qlt) you should issue this command:

::devbindings -q qlc

Here’s a sample output:

root@opensolaris:/# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci qlc fctl md lofs fcip fcp cpc random crypto logindmux ptm ufs nsmb sppp ipc mpt emlxs ]
> ::devbindings -q qlc
ffffff06f68dccc0 pciex1077,2432, instance #0 (driver name: qlc)
ffffff06f68dca40 pciex1077,2432, instance #1 (driver name: qlc)
> $q

My card is a dual port card so the device it’s the same, in my situation the device is pciex1077,2432, right now we need to detach the qlc driver and attach the qlt (the modification will occur during the reboot)

First detach:

update_drv -d -i 'pciex1077,2432' qlc

Then attach the target driver:

update_drv -a -i 'pciex1077,2432' qlt

If your system tells you that the unload will be done upon reboot and that the qlt driver failed to attach don’t worry, it’s completely normal, everything will be done during reboot.

Now we’re almost ready to reboot, before that let’s enable all the subsystems needed so we can immediately start working after reboot is done:

svcadm enable stmf
svcadm enable svc:/network/iscsi/target:default

Then we can reboot with:

reboot -- -r

After the reboot, let’s do a quick check to see if the qlt driver attached correctly, let’s issue this command:

fcinfo hba-port

The output should be similar to this, look for the “Port Mode:” directive, it should say “Target”:

HBA Port WWN: 2100001b329725a5
        Port Mode: Target
        Port ID: 10000
        OS Device Name: Not Applicable
        Manufacturer: QLogic Corp.
        Model: QLE2462
        Firmware Version: 4.5.0
        FCode/BIOS Version: N/A
        Serial Number: not available
        Driver Name: COMSTAR QLT
        Driver Version: 1.0
        Type: F-port
        State: online
        Supported Speeds: 1Gb 2Gb 4Gb
        Current Speed: 4Gb
        Node WWN: 2000001b329725a5
HBA Port WWN: 2101001b32b725a5
        Port Mode: Target
        Port ID: 10100
        OS Device Name: Not Applicable
        Manufacturer: QLogic Corp.
        Model: QLE2462
        Firmware Version: 4.5.0
        FCode/BIOS Version: N/A
        Serial Number: not available
        Driver Name: COMSTAR QLT
        Driver Version: 1.0
        Type: F-port
        State: online
        Supported Speeds: 1Gb 2Gb 4Gb
        Current Speed: 4Gb
        Node WWN: 2001001b32b725a5

Right now we’re ready to create the ZPOOL which is the “core” of the storage backend, it’s outside the scope of this document to explain each protection type and combination possible with ZFS so you can refer to this guide to choose what fit best for you.

In this example my machine is fitted with 4 identical drives, one of them is dedicated to the “rpool” ZPOOL which is the root pool, where the system is installed, here’s the format output

root@opensolaris:/# echo | format
Searching for disks...done
 
AVAILABLE DISK SELECTIONS:
       0. c7t0d0
          /pci@0,0/pci8086,340a@3/pci108e,286@0/disk@0,0
       1. c7t1d0
          /pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0
       2. c7t2d0
          /pci@0,0/pci8086,340a@3/pci108e,286@0/disk@2,0
       3. c7t3d0
          /pci@0,0/pci8086,340a@3/pci108e,286@0/disk@3,0
Specify disk (enter its number): Specify disk (enter its number):

So the c7t0d0 disk is already used by the system, I will create a RAID-Z ZPOOL comprised of the other three disks with this command:

root@opensolaris:/# zpool create DataPool raidz c7t1d0 c7t2d0 c7t3d0

And check the outcome with:

root@opensolaris:/# zpool status DataPool
  pool: DataPool
 state: ONLINE
 scrub: none requested
config:
 
	NAME        STATE     READ WRITE CKSUM
	DataPool    ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    c7t1d0  ONLINE       0     0     0
	    c7t2d0  ONLINE       0     0     0
	    c7t3d0  ONLINE       0     0     0
 
errors: No known data errors

By default ZFS create a filesystem named with the ZPOOL name and mount it under / let’s check if it’s true:

root@opensolaris:~# zfs list -r DataPool
NAME                      USED  AVAIL  REFER  MOUNTPOINT
DataPool                  100G   167G  25,3K  /DataPool

Ok, everything looks fine, so we have the FC and iSCSI targets up and running and we’ve create a disk backend to put our data on, let’s start with the fun part: create and publish volumes.

First of all we need to create a LUN on the disk backend, we can accomplish this task with a one-liner:

root@opensolaris:~# zfs create -V 100G DataPool/TestDatastore1

This command creates a 100G Volume called “TestDatastore1″ under our DataPool ZPOOL, let’s check:

root@opensolaris:~# zfs list -r DataPool
NAME                      USED  AVAIL  REFER  MOUNTPOINT
DataPool                  100G   167G  25,3K  /DataPool
DataPool/TestDatastore1   100G   267G  21,3K  -

Now we need to create the LUN object to be able to map it to a target port:

root@opensolaris:~# sbdadm create-lu /dev/zvol/rdsk/DataPool/TestDatastore1 
 
Created the following LU:
 
              GUID                    DATA SIZE           SOURCE
--------------------------------  -------------------  ----------------
600144f03ebec50000004ba86e460001      107374116864     /dev/zvol/rdsk/DataPool/TestDatastore1

The path /dev/zvol/rdsk/DataPool/TestDatastore1 follow a simple standard which is:

/dev/zvol/rdsk/<ZPOOL NAME>/<VOLUME NAME>

Let’s check if everything looks good:

root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA86E460001
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/rdsk/DataPool/TestDatastore1
    View Entry Count  : 0

Now we can create a group of Initiators (Server HBAs) to map the LUN to them, if you’re familiar with NetApp this is the same concept of Igroups

root@opensolaris:~# stmfadm create-hg ESX4-group

And then add the two WWPNs taken from my ESX4 host:

root@opensolaris:~# stmfadm add-hg-member -g ESX4-group wwn.2100001b329711bd wwn.2101001b32b711bd

(note the wwn. format)

Now we’re ready to map the LUN to the host, with this command:

root@opensolaris:~# stmfadm add-view -h ESX4-group -n 0 600144F03EBEC50000004BA86E460001

And check if everything is fine:

root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA86E460001
    Operational Status: Online
    Provider Name     : sbd
    Alias             : /dev/zvol/rdsk/DataPool/TestDatastore1
    View Entry Count  : 1
 
root@opensolaris:~# stmfadm list-view -l 600144F03EBEC50000004BA86E460001
View Entry: 0
    Host group   : ESX4-group
    Target group : All
    LUN          : 0

As you can see I mapped the LUN as ID 0, now the Host can see the LUN via its multiple FC connections, let’s see how the Opensolaris storage present itself on the VMware side:

This screenshot shows our LUN correctly presented after an HBA rescan, and if we take a look to the paths to the storage:

We can see that everything looks correct, you can even use the Round Robin multipath policy!.

Enough for today, in the next part we will create and publish a LUN via iSCSI and we’ll configure NAS access.


Starting my VCDX Certification

Posted: April 15th, 2010 | Author: | Filed under: VCDX, Virtualization | Tags: , , , , | No Comments »

I’m really excited, Yesterday night I had a call from a nice lady from VMware that confirmed my schedule for the Enterprise Administration Exam on June 24th. Here’s where my (long) journey to VCDX begins, if there’s anyone who can suggest a good study guide (apart from the official one) let me know in the comments!


How Compellent Storage Center works – Part 2

Posted: April 4th, 2010 | Author: | Filed under: Storage | Tags: , , , , , | 4 Comments »

Here we are again with How Compellent works part 2, I’m writing this post just after a lucullian easter lunch with my whole family so please forgive some of my expressions :-)

In the last post we talked about how data is written into PAGES and progressed using Storage Profiles and Data Progression, this time we’ll focus on how Replay works.

Replays are, in fact, snapshots of your volume data at a certain point in time, to apply this concept to the PAGES that Storage Center uses there’s another metadata on each, which identify the category of the PAGE, the categories are:

  • Accessible Recently Accessed – These are the active pages the volume is using the most
  • Accessible Non-recently accessed – Read-write pages that have not been recently used
  • Historical Accessible – Read-only pages that may be read by a volume, applies to Snapshot Volumes only
  • Historical Non-Accessible – Read-only data pages that are not being currently accessed by a volume, applies to Snapshot Volumes only, Snapshot maintains these pages for recovery purposes and they should be placed on the lowest cost storage possible

(the descriptions are copy/pasted from a Compellent document)

Let’s see with a simple graphic how it works, let’s see how a Volume is represented inside Storage Center when a Replay is taken, each blue block is a 2MB PAGE:

According to this diagram when the C1 PAGE has been written the old C Page change its state to Historical Non-Accessible, meanwhile C1 is an Accessible Recently Accessed PAGE, all the other PAGES that resides below the Replay separator are Historical Accessible PAGES.So in this case, the C PAGE will be automatically demoted during the first Data Progression Cycle to a lower cost storage.

In this graphic another Replay is taken, as you can see, even with multiple replays only the changed PAGES are allocated as new, in this scenario the C1 PAGE and E PAGE becomes Historical Non-Accessible PAGES.

Here is the Replay expiration, the expiration could be manually or scheduled, during the replay creation you can choose between a “never expire” Replay or a scheduled expiration, that can be eventually changed on the fly. The grey tinted PAGES are pushed toward the new oldest replay available, meanwhile the orange tinted PAGE is released back into the pool, to be immediately available for other purposes.

But what if we need to access data of a single Replay? maybe map it to another server to replicate the production environment?Here comes the VIEW.Now the graphic is getting complicated, in the following diagram we see the upper part which represent the VIEW created from a Replay which is in purple, and the lower part which represent the Volume with multiple snapshots like the examples above:

Here we have created this VIEW and mapped to another server (every VIEW is R/W) where we decided to create a new development environment based on production data that we snapshotted using a Replay, now every READ operation goes through the Replay like the arrows say, if there’s the need to write a new PAGE it will be written in the VIEW space, consuming in fact only the ∆ changes.Now the development environment has the need to be snapshotted itself, can we do it? OF COURSE we can :-) , let’s see with the usual diagram how it works:

So you can clearly see how this technology is really powerful and space-savvy, it is also applicable recursively so you can create a view made from a replay of a view which is a view of a replay etc… etc…Next time I’ll take care of the replication with QOS and deduplication, as always feedback and comments are very welcomed!.