Archive for April, 2010
I figured I would post my views on Duncan Epping’s question on what we would like to see pertaining to Tiering in the “Cloud”
Like with most things my view is it should be my option.
I know many cloud providers would like to just do it in the background, and for a large majority of data sets I would choose to let them tier (Assuming it is a cheaper option) .That being said if we ever get to the point of hosting mission critical application on cloud storage/VM environment there are some things that i would either A) Want to ensure a level of performance by not allowing tiering or B) know the access patterns of my dataset would not play well with being tiering
Though this brings up the concern that all the current tiering technologies are different, the way tiering works on one vendor is so completely different than the other it may not be possible without experience on all those vendors and or a LOT of research to know if my data will work on tiered storage.
Part of this may be the storage engineer in me wanting to maintain control but any vendor I choose to provide cloud storage would have to be exceptionally willing to troubleshoot performance problems if they wanted to tier my data because troubleshooting storage problems can sometimes be very very difficult i would have to know that they would be able to do it before introducing another variable like autonomic tiering.
All in all I am a fan of tiering but require the granular level control over it that I get in house if I were to think about doing it out of house in respect to storage.
I am sure some day they will get to that level with open storage platforms
This is probably going to be one of a few posts on this topics, I spend a lot of time working with details and features and fixes but today I had the question asked of me that seamed so basic that I was flustered with that I could not cohesively answer. When working with a technology so much you start to have a problem seeing the forest through the trees that sometimes it is worth stepping back to explain it to those that are not already familiar with it
So you may be asking why do I(we) need a SAN?
This is a general topic discussion rather then a delve into a specific product or brand
One of the big reasons for a SAN is for performance reasons. Most of your SAN’s today allow wide striping across many disks.
Why is this good you may ask? What it allows is for a volume to share the performance resources of ALL those disks. Instead of your SQL database being limited to the 160 or less IOPS you can get from a single SATA disk you now can spread it across 100 disks so your SQL server can do 16,000 IOPS burst if need be. This enables you to use larger cheaper disks and just spread out the load. Of course doing this there are lots of details to watch (average performance need, hot spots, configuration challenges and such)
Also to consider under this topic is tiering
The ability to move workloads around between performance profiles is the generic description I will use for this. Most vendors include the ability to manually move whole volumes between different pools of disks (for example between a SAS 15K pool and a SATA pool) and do it while online. Some are even offering (sometimes controversial) autonomic tearing where blocks at a sub LUN level are moved around or are prioritized based on performance statistics at a block level (So the active part of your SQL database could move to SSD disks while leaving the non active part on SATA)
There are many ways that tiering is done, many of them with up and down sides but having the option can be a life saver when the performance demands of a server workload is either misjudged or changes.
Dynamic Resource Allotment and Shared resources
Resource sharing is a key aspect to efficiency, Rather have islands of storage in each server you can share that disk and only give it to the servers that need it. You are also able to increase the space alloted to the server as time goes on without interruption to service.
When you live with local disks you end up with a lot of stranded wasted space because you have to attempt to plan for what the server WILL need in the future and allot that space to start with otherwise you have to have down time in order to add new disks to the server. When dealing with SAN’s you just have a logical LUN which can be moved, and expanded live and on the fly making way for increased growth
Most new SAN’s even take this to the next level with Thin provisioning, Even when the server See’s 100 GB of space given to it the SAN only allots the 50 GB you are ACTUALLY using. This further reduces the number of times you have to touch a server as well as the space savings on the SAN side.
Large shared pools of disk have some advantages to local disk in availability as well. Unlike low disk count local RAID groups you can deal with Larger higher order disk groups with more redundant RAID sets without throwing TONES of disk in each server. An example would be RAID 6 or RAID DP which provides dual parity disks so you can survive a double disk failure.
Add to that most support features like the ability to replicate all your data to another SAN, some have high availability cluster capability where you can separate the heads a great distance apart.
This enables many features in the end having a shared storage network allows the data to be separate from the server so replacing servers, connecting new servers, moving data and replicating data all become a fluid dynamic thing
You want to copy the data that is on one server to a test server? Easy clone the volume and attach it to the test server!
You want a Microsoft cluster quarom drive that all servers can talk to? Easy just have them all log into the same volume
Your server died? Easy just rebuild the server from scratch or from image and connect it back to the old servers data
A SAN is a huge enabler of technology for the enterprise allowing a dynamic and fluid attribution of resources in a highly available way when where how you need it and only as much as you need.
Catching this Press Release I am a bit confused at this claim.
Data Domain Encryption is a new software option that provides the industry’s first encryption of data at rest on deduplication storage
Maybe I am mis-reading the claim but has not other vendors such as Asigra as well as other MSP style backup vendors both transmitted and stored their data at-rest both in a deduped and encrypted fashion. Maybe it was intended to restrict the scope of the claim but I wonder why it is so broad when it looks like it is indeed not true.
Maybe a misunderstanding by me