Posted: April 26th, 2010 | Author: Fabio Rapposelli | Filed under: Storage | Tags: rebuild, VMotion, VMware, ZFS, ZFS labels | 1 Comment »
Last Thursday I was sitting in a customer meeting when my phone rang furiously, it was one of my co-workers with a desperate cry for help, during a VMotion from ESX 3.5 to vSphere 4 (he was conducting an upgrage) a Solaris 10 VM with a big ZPool configured suddenly hanged.
The machine was completely unresponsive so he restarted it with a reset from VMware, the machine rebooted correctly but the ZPool was gone…
Let’s imagine that, after a reboot suddenly your 8TB fileshare disappear… creepy huh?
I thought it was impossible, I did some investigation and finally found out the problem: the 2TB minus 512b limit of vSphere.
The VM was built on ESX 3.5 with 2TB RDM’s, during the VMotion something bad happened, VMware chopped off 512b from the disk view, harming the disk label. Fortunately after a couple of hours of investigation (and trial & error, thanks to Compellent Replays) I found the correct partition alignment and after a relabel of the disk everything went back to life!
It was a fun ride after all
Posted: April 22nd, 2010 | Author: Fabio Rapposelli | Filed under: Storage, Virtualization | Tags: Storage, SUN, Virtualization, VMware, ZFS | No Comments »
Welcome back to the second part of this tutorial, this time we’ll see how to configure and use the iSCSI target portion of COMSTAR, let’s start immediatly!
root@opensolaris:~# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
igb0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.0.174 netmask ffffff00 broadcast 192.168.0.255
ether 0:14:4f:cb:15:90
igb1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 10.10.1.1 netmask ffffff00 broadcast 10.10.1.255
ether 0:14:4f:cb:15:91
igb2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 4
inet 10.10.2.1 netmask ffffff00 broadcast 10.10.2.255
ether 0:14:4f:cb:15:92
igb3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
inet 10.10.3.1 netmask ffffff00 broadcast 10.10.3.255
ether 0:14:4f:cb:15:93
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
inet6 ::1/128
Here’s the output for ifconfig, our network interfaces for iSCSI traffic are igb1 and igb2, so we start creating a target-port-group for both
root@opensolaris:~# itadm create-tpg igb1 10.10.1.1
root@opensolaris:~# itadm create-tpg igb2 10.10.2.1
And then create a target based on both of them:
root@opensolaris:~# itadm create-target -t igb1,igb2
Target iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb successfully created
And let’s see the outcome:
root@opensolaris:~# itadm list-target -v
TARGET NAME STATE SESSIONS
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb online 0
alias: -
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: igb2 = 3,igb1 = 2
Now i’m going to add an Alias to this target to refer it more simply on the initiator side:
root@opensolaris:~# itadm modify-target -l comstar iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb
Target iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb successfully modified
And the list:
root@opensolaris:~# itadm list-target -v
TARGET NAME STATE SESSIONS
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb online 0
alias: comstar
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: igb2 = 3,igb1 = 2
Now let’s do the setup on the VMware side, let’s open the iSCSI software initiator and put the two IPs in the Dynamic Discovery tab:

After this step if we issue again the itadm list-target -v command you can see that the “SESSIONS” number has increased to two, indicating that the VMware host has opened the session to our new target:
root@opensolaris:~# itadm list-target -v
TARGET NAME STATE SESSIONS
iqn.1986-03.com.sun:02:ea4ee368-d1dc-cd43-ef63-fbf61e4b4ccb online 2
alias: comstar
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: igb2 = 3,igb1 = 2
Now we’re ready for some Volume mapping and creation!
We start creating a 50G Volume with zfs create:
root@opensolaris:~# zfs create -V 50G DataPool/iSCSIDatastore2
And then create an host-group for the iSCSI connection
root@opensolaris:~# stmfadm create-hg ESX4-iSCSI
And then add the IQN for our VMware host (you can check it in the iSCSI software initiator tab):
root@opensolaris:~# stmfadm add-hg-member -g ESX4-iSCSI iqn.1998-01.com.vmware:esxvdi-04c37843
Now we can create the LUN:
root@opensolaris:~# sbdadm create-lu /dev/zvol/rdsk/DataPool/iSCSIDatastore2
Created the following LU:
GUID DATA SIZE SOURCE
-------------------------------- ------------------- ----------------
600144f03ebec50000004ba8a1090001 53687091200 /dev/zvol/rdsk/DataPool/iSCSIDatastore2
Let’s check if everything is set:
root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA8A1090001
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/DataPool/iSCSIDatastore2
View Entry Count : 0
Finally we map the LUN to the Host:
root@opensolaris:~# stmfadm add-view -h ESX4-iSCSI -n 0 600144F03EBEC50000004BA8A1090001
Now let’s do a refresh on VMware and check if everything’s correct

And also check if the Multipath is working fine:

And Voila! our iSCSI LUN is ready to be used.
This is the end of part 2, on part 3 I will explore the the NAS datastores configuration, on part 4 I will explain how to use Snapshots and Clones and part 5 will be focused on Deduplication, so stay tuned for more!
Posted: April 15th, 2010 | Author: Fabio Rapposelli | Filed under: Storage, Virtualization | Tags: Storage, SUN, Virtualization, VMware, ZFS | 3 Comments »
So you’ve built your Virtualization home lab using whitebox hardware and you’re running vSphere on it quite nicely, but what about storage?
Don’t tell me you’re using one of those expensive NAS/iSCSI boxes like the ix4… Yeah they can be quite nice for an out-of-the-box experience, but they can’t replicate a real storage infrastructure!
So what can I do to get a multiprotocol storage with NFS / CIFS support for NAS and iSCSI / FC support for SAN? and maybe even with multiple snapshot, cloning capabilities and DEDUPLICATION?
Here’s your answer: OpenSolaris and COMSTAR.
COMSTAR is an OpenSolaris project that’s currently the foundation for the Sun Storage 7000 storage family and it’s an opensource project. It’s an acronym for “COmmon Multiprotocol Scsi TARget” and coupled with the almighty ZFS it brings you a powerful storage solution at no cost.
This post is the first of a series where I’ll explain how to build a complete storage system using off-the-shelf parts, let’s start with the shopping list:
Hardware
- Any x86 hardware that fit in this HCL will do fine.
- If you need iSCSI target support you need a NIC supported by OpenSolaris (see HCL above).
- If you need FC target support you need a QLogic HBA 4Gb or 8Gb (2Gb are not supported), or an Emulex LP10000 or newer card, you can find them on eBay quite cheap.
Software
- That’s the easy part, you need only an OpenSolaris 2009.06 ISO, after the install everything will be downloaded using the IPS package manager that comes with OpenSolaris.
Now we’re ready to start, first of all we need to install OpenSolaris on our brand new machine, this is something that’s beyond the scope of this howto so I will just point out to an official guide from Sun where they guide you through the installation (which is very simple by the way) you can find it here.
The first step is to to install the COMSTAR stack, to install it we’ll use the IPS package manager with super user privileges (either with su - or pfexec):
pkg install -v storage-server
This should take care of everything, you’ll end up with something like that:
root@opensolaris:/# pkg install -v storage-server
Creating Plan - Before evaluation:
UNEVALUATED:
+pkg:/storage-server@0.1,5.11-0.111:20090508T165041Z
After evaluation:
None -> pkg:/storage-server@0.1,5.11-0.111:20090508T165041Z
None -> pkg:/SUNWvscan@0.5.11,5.11-0.111:20090508T164122Z
None -> pkg:/SUNWmda@0.5.11,5.11-0.111:20090508T162120Z
None -> pkg:/SUNWvscankr@0.5.11,5.11-0.111:20090508T164123Z
None -> pkg:/SUNWndmp@0.5.11,5.11-0.111:20090508T162452Z
None -> pkg:/SUNWstmf@0.5.11,5.11-0.111:20090508T163712Z
None -> pkg:/SUNWii@0.5.11,5.11-0.111:20090508T160911Z
None -> pkg:/SUNWscm@0.5.11,5.11-0.111:20090508T163449Z
None -> pkg:/SUNWspsv@0.5.11,5.11-0.111:20090508T163647Z
None -> pkg:/SUNWsmba@3.0.34,5.11-0.111:20090508T163557Z
None -> pkg:/SUNWdmgt@0.5.11,5.11-0.111:20090508T153928Z
None -> pkg:/SUNWisns@0.5.11,5.11-0.111:20090508T161051Z
None -> pkg:/SUNWrdc@0.5.11,5.11-0.111:20090508T163217Z
None -> pkg:/SUNWiscsitgt@0.5.11,5.11-0.111:20090508T161048Z
None -> pkg:/SUNWsmbfskr@0.5.11,5.11-0.111:20090508T163611Z
None -> pkg:/SUNWmms@0.5.11,5.11-0.111:20090508T162204Z
None -> pkg:/SUNWpostgr-83-libs@8.3.7,5.11-0.111:20090508T163014Z
None -> pkg:/SUNWsmbs@0.5.11,5.11-0.111:20090508T163612Z
None -> pkg:/SUNWsmbskr@0.5.11,5.11-0.111:20090508T163614Z
None -> pkg:/SUNWfilebench@0.5.11,5.11-0.111:20090508T154334Z
None -> pkg:/SUNWiscsi@0.5.11,5.11-0.111:20090508T161040Z
Actuators:
restart_fmri: svc:/system/manifest-import:default
None
DOWNLOAD PKGS FILES XFER (MB)
Completed 21/21 934/934 32.33/32.33
PHASE ACTIONS
Install Phase 1898/1898
Then we need to install the new iSCSI target module based on COMSTAR, it’s required to do so because Sun already included a “standard” iSCSI target long before COMSTAR, you can do it with:
pkg install pkg:/SUNWiscsit
Then we’re done with the installs, right now we have everything in place but we need a reboot to continue, if you don’t need to configure any FC HBA for SAN Support (FC Target) you can skip this step, if you need to configure them proceed with this step:
First, we need to identify the device bindings for the standard Initiator driver, we can do it with mdb:
Right now we’re at the mdb prompt, to show the device bindings for the driver qlc (qlc is the initiator driver, the target driver will be qlt) you should issue this command:
Here’s a sample output:
root@opensolaris:/# mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci qlc fctl md lofs fcip fcp cpc random crypto logindmux ptm ufs nsmb sppp ipc mpt emlxs ]
> ::devbindings -q qlc
ffffff06f68dccc0 pciex1077,2432, instance #0 (driver name: qlc)
ffffff06f68dca40 pciex1077,2432, instance #1 (driver name: qlc)
> $q
My card is a dual port card so the device it’s the same, in my situation the device is pciex1077,2432, right now we need to detach the qlc driver and attach the qlt (the modification will occur during the reboot)
First detach:
update_drv -d -i 'pciex1077,2432' qlc
Then attach the target driver:
update_drv -a -i 'pciex1077,2432' qlt
If your system tells you that the unload will be done upon reboot and that the qlt driver failed to attach don’t worry, it’s completely normal, everything will be done during reboot.
Now we’re almost ready to reboot, before that let’s enable all the subsystems needed so we can immediately start working after reboot is done:
svcadm enable svc:/network/iscsi/target:default
Then we can reboot with:
After the reboot, let’s do a quick check to see if the qlt driver attached correctly, let’s issue this command:
The output should be similar to this, look for the “Port Mode:” directive, it should say “Target”:
HBA Port WWN: 2100001b329725a5
Port Mode: Target
Port ID: 10000
OS Device Name: Not Applicable
Manufacturer: QLogic Corp.
Model: QLE2462
Firmware Version: 4.5.0
FCode/BIOS Version: N/A
Serial Number: not available
Driver Name: COMSTAR QLT
Driver Version: 1.0
Type: F-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2000001b329725a5
HBA Port WWN: 2101001b32b725a5
Port Mode: Target
Port ID: 10100
OS Device Name: Not Applicable
Manufacturer: QLogic Corp.
Model: QLE2462
Firmware Version: 4.5.0
FCode/BIOS Version: N/A
Serial Number: not available
Driver Name: COMSTAR QLT
Driver Version: 1.0
Type: F-port
State: online
Supported Speeds: 1Gb 2Gb 4Gb
Current Speed: 4Gb
Node WWN: 2001001b32b725a5
Right now we’re ready to create the ZPOOL which is the “core” of the storage backend, it’s outside the scope of this document to explain each protection type and combination possible with ZFS so you can refer to this guide to choose what fit best for you.
In this example my machine is fitted with 4 identical drives, one of them is dedicated to the “rpool” ZPOOL which is the root pool, where the system is installed, here’s the format output
root@opensolaris:/# echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c7t0d0
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@0,0
1. c7t1d0
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@1,0
2. c7t2d0
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@2,0
3. c7t3d0
/pci@0,0/pci8086,340a@3/pci108e,286@0/disk@3,0
Specify disk (enter its number): Specify disk (enter its number):
So the c7t0d0 disk is already used by the system, I will create a RAID-Z ZPOOL comprised of the other three disks with this command:
root@opensolaris:/# zpool create DataPool raidz c7t1d0 c7t2d0 c7t3d0
And check the outcome with:
root@opensolaris:/# zpool status DataPool
pool: DataPool
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
DataPool ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c7t1d0 ONLINE 0 0 0
c7t2d0 ONLINE 0 0 0
c7t3d0 ONLINE 0 0 0
errors: No known data errors
By default ZFS create a filesystem named with the ZPOOL name and mount it under / let’s check if it’s true:
root@opensolaris:~# zfs list -r DataPool
NAME USED AVAIL REFER MOUNTPOINT
DataPool 100G 167G 25,3K /DataPool
Ok, everything looks fine, so we have the FC and iSCSI targets up and running and we’ve create a disk backend to put our data on, let’s start with the fun part: create and publish volumes.
First of all we need to create a LUN on the disk backend, we can accomplish this task with a one-liner:
root@opensolaris:~# zfs create -V 100G DataPool/TestDatastore1
This command creates a 100G Volume called “TestDatastore1″ under our DataPool ZPOOL, let’s check:
root@opensolaris:~# zfs list -r DataPool
NAME USED AVAIL REFER MOUNTPOINT
DataPool 100G 167G 25,3K /DataPool
DataPool/TestDatastore1 100G 267G 21,3K -
Now we need to create the LUN object to be able to map it to a target port:
root@opensolaris:~# sbdadm create-lu /dev/zvol/rdsk/DataPool/TestDatastore1
Created the following LU:
GUID DATA SIZE SOURCE
-------------------------------- ------------------- ----------------
600144f03ebec50000004ba86e460001 107374116864 /dev/zvol/rdsk/DataPool/TestDatastore1
The path /dev/zvol/rdsk/DataPool/TestDatastore1 follow a simple standard which is:
/dev/zvol/rdsk/<ZPOOL NAME>/<VOLUME NAME>
Let’s check if everything looks good:
root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA86E460001
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/DataPool/TestDatastore1
View Entry Count : 0
Now we can create a group of Initiators (Server HBAs) to map the LUN to them, if you’re familiar with NetApp this is the same concept of Igroups
root@opensolaris:~# stmfadm create-hg ESX4-group
And then add the two WWPNs taken from my ESX4 host:
root@opensolaris:~# stmfadm add-hg-member -g ESX4-group wwn.2100001b329711bd wwn.2101001b32b711bd
(note the wwn. format)
Now we’re ready to map the LUN to the host, with this command:
root@opensolaris:~# stmfadm add-view -h ESX4-group -n 0 600144F03EBEC50000004BA86E460001
And check if everything is fine:
root@opensolaris:~# stmfadm list-lu -v
LU Name: 600144F03EBEC50000004BA86E460001
Operational Status: Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/DataPool/TestDatastore1
View Entry Count : 1
root@opensolaris:~# stmfadm list-view -l 600144F03EBEC50000004BA86E460001
View Entry: 0
Host group : ESX4-group
Target group : All
LUN : 0
As you can see I mapped the LUN as ID 0, now the Host can see the LUN via its multiple FC connections, let’s see how the Opensolaris storage present itself on the VMware side:

This screenshot shows our LUN correctly presented after an HBA rescan, and if we take a look to the paths to the storage:

We can see that everything looks correct, you can even use the Round Robin multipath policy!.
Enough for today, in the next part we will create and publish a LUN via iSCSI and we’ll configure NAS access.
Posted: April 4th, 2010 | Author: Fabio Rapposelli | Filed under: Storage | Tags: Compellent, Data Instant Replay, Data Progression, Snapshots, Storage, Tiering | 4 Comments »
Here we are again with How Compellent works part 2, I’m writing this post just after a lucullian easter lunch with my whole family so please forgive some of my expressions
In the last post we talked about how data is written into PAGES and progressed using Storage Profiles and Data Progression, this time we’ll focus on how Replay works.
Replays are, in fact, snapshots of your volume data at a certain point in time, to apply this concept to the PAGES that Storage Center uses there’s another metadata on each, which identify the category of the PAGE, the categories are:
- Accessible Recently Accessed – These are the active pages the volume is using the most
- Accessible Non-recently accessed – Read-write pages that have not been recently used
- Historical Accessible – Read-only pages that may be read by a volume, applies to Snapshot Volumes only
- Historical Non-Accessible – Read-only data pages that are not being currently accessed by a volume, applies to Snapshot Volumes only, Snapshot maintains these pages for recovery purposes and they should be placed on the lowest cost storage possible
(the descriptions are copy/pasted from a Compellent document)
Let’s see with a simple graphic how it works, let’s see how a Volume is represented inside Storage Center when a Replay is taken, each blue block is a 2MB PAGE:

According to this diagram when the C1 PAGE has been written the old C Page change its state to Historical Non-Accessible, meanwhile C1 is an Accessible Recently Accessed PAGE, all the other PAGES that resides below the Replay separator are Historical Accessible PAGES.So in this case, the C PAGE will be automatically demoted during the first Data Progression Cycle to a lower cost storage.

In this graphic another Replay is taken, as you can see, even with multiple replays only the changed PAGES are allocated as new, in this scenario the C1 PAGE and E PAGE becomes Historical Non-Accessible PAGES.

Here is the Replay expiration, the expiration could be manually or scheduled, during the replay creation you can choose between a “never expire” Replay or a scheduled expiration, that can be eventually changed on the fly. The grey tinted PAGES are pushed toward the new oldest replay available, meanwhile the orange tinted PAGE is released back into the pool, to be immediately available for other purposes.
But what if we need to access data of a single Replay? maybe map it to another server to replicate the production environment?Here comes the VIEW.Now the graphic is getting complicated, in the following diagram we see the upper part which represent the VIEW created from a Replay which is in purple, and the lower part which represent the Volume with multiple snapshots like the examples above:

Here we have created this VIEW and mapped to another server (every VIEW is R/W) where we decided to create a new development environment based on production data that we snapshotted using a Replay, now every READ operation goes through the Replay like the arrows say, if there’s the need to write a new PAGE it will be written in the VIEW space, consuming in fact only the ∆ changes.Now the development environment has the need to be snapshotted itself, can we do it? OF COURSE we can
, let’s see with the usual diagram how it works:

So you can clearly see how this technology is really powerful and space-savvy, it is also applicable recursively so you can create a view made from a replay of a view which is a view of a replay etc… etc…Next time I’ll take care of the replication with QOS and deduplication, as always feedback and comments are very welcomed!.