Data – it isn’t all about Terabytes!

At a recent event we were fortunate to have a Dell customer speaking as an advocate for the Dell Compellent Storage solution.  He spoke in great detail about the process of moving from a traditional storage solution to Dell Compellent.  He made some great points, but the one that struck me most was the discovery they made when they started the automated tiering on the system.  First, let me give you some background on tiering.  Within the storage system you typically have multiple types of disk.  These range from high-end, and expensive, fast disks to slower and therefore cheaper disks.  The strategy of tiering is that you do not want to have data on fast expensive disks if it is rarely accessed.  Similarly, if data is accessed regularly by many people then you want to place this data on the best most accessible disks.  The target is that all data is accessible at an appropriate speed for the user base.  The real benefit to the customer is that they do not have to estimate the amount of fast disks they need to meet their user base needs. The tiering process on Compellent not only tells them where the data should be, but also moves the data to that appropriate disk.

So back to our customer, he gave details on how they set the system up to analyse the data once a week for the purpose of tiering.  After the first week they were very surprised by the results.  These results were borne out week after week and the simple fact was they had completed misunderstood the lifecycle of the data within their systems. Data that they believed to be critical and needed to be on the first tier for fast access was never accessed after creation and other data that they attached little of no importance to and believed should be slow, cheap disks was in fact accessed regularly by multiple users.  It is very easy for system administrators to be stuck in the frame of mind that requires them to think solely in terms of megabytes, terabytes and IOPS and lose sight of why they are providing the system and more importantly how it is used on a day to day basis.

Let me give you a hypothetical example.  A patient comes to a hospital for a MRI scan.  What is the most important data, the actual picture of the MRI scan or the patient’s address?  Obviously the MRI scan is critically important to the diagnosis of whatever health issue they may be dealing with and the patient’s address is very much secondary.  After all they have the patients name and some form of reference number.  Anyone looking at the system would be able to make an objective decision based on this information and decide where the best place to keep this data.  But they could be wrong and this is what the tiering process would highlight.

The MRI scan is stored once and accessed by the MRI technician and sent to doctor or consultant dealing directly with the patient.  This results in the data being accessed twice.  There may be a follow up some months after that or if the patient is fortunate then the scan may never be needed again.  However, almost every time with each access of the patient’s records, they validate they are looking at the right patient by confirming the address.  So everyone from the consultant and the MRI technician to the hospital reception and ultimately the billing department look at the address.

In terms of the system the address is far more important than the actual scan.  It is accessed by far more people and more times than the scan. Important is of course the wrong word and we have to start thinking about data in different terms. This is critical for system administrators to understand their data and more importantly to be able to articulate this to their own internal customers.  You can imagine the conversation between the system administrator and the doctor where they tell them the MRI scan is not “important”.

When analysing data, the system has no natural bias or prejudice. Unlike people who will make subjective decisions based on their own “feel” for what they believe to be right.  The Compellent solution provides a mechanism for organisations to get to grips with lifecycle of their data.  The results may surprise many, except those that have been using Compellent for some time.  It is not budgetary constraints or penny pinching that drives post installation sales on Compellent.  It is simple, unbiased analysis that results in over 80% of sales on existing systems being slower, cheaper disks.


14 thoughts on “Data – it isn’t all about Terabytes!

  1. Right… but any storage system should be able to give you usage statistics…
    Now, the fact that most SAN admin don’t look at them this way is another problem.
    I believe that auto-tiering is not the best solution to this problem.

  2. Dave, thanks for reading. Even if you have usage statistics and a good admin how long would it take to move individual bits of data around to the optimum layer each day. Good auto-tiering removes the management task and therefore frees up the admins time so they can spend more time delivering real value to an organisation. Most organisations today are trying to do more without increasing resources and a lot are trying to just do the same amount of work with even less resources. Freeing up time is the key factor here. This isn’t like some functions that fail to deliver on expectations. We all know that snapshot functionality is one of those things that often admisn turn on and a day later turn off after not properly spec’ing out the capacity requirements. With Compellent customers have one thing in common they all turn on the tiering function and leave it on. That’s 100% of Compellent customers. Makes a compelling arguement! No pun intended.

    • I know.
      But i do not believe that auto tiering is the answer to performance and disk usage problems.
      It is down to how you access the data.
      Moving it back and forth just uses ressources that you could use to better puproses.
      I believe that efficient Quality of Service is superior to auto-tiering.
      Because your performance doesn’t come essentially from the media you use. But from the ressources and the priority you allocate to the I/O requests.
      Moving data around is just fooling people.

      • I have to add: of course 100% of Compellent users turn auto-tiering on. That’s’why they bought it in the first place 🙂

  3. Dave we would be better having that conversation in front of our customers. Can your system automatically allocate those resources and prioritise? Not sure about the fooling people comment? If the system performs better with tiering turned on rather than off is somebody being fooled?

    • I don’t think that having this conversation in front of our customer would benefit Compellent. In fact, the product I’m a supporter of won many deals against Compellent. More than it lost anyway.
      My point is: Compellent is providing performance or basing its performance claims based on physical spindle carateristics only (or mainly ?)
      Where everyone knows that physical disks are limited in term of performance. Now multiplying disks is a solution. But it has a cost that the customer is not always willing to pay… because he doesn’t need the space associated with the increase of spindles number.
      Using faster disks or different technologies (SATA, SAS, FC, SSD, etc…) is another solution.
      But the mere fact of having to analyze data flow and moving data (by arbitrary chunk size) takes ressources away from the customer.

      Take payroll as an example : files are accessed 5 days per month (roughly). if takes 48h to distingiush a pattern before moving data to faster spindles.
      Now since payroll data hasn’t been accessed in a while, it’s going to take 48h for your system to distinguish the new access. in the meantime, users are suffering from slow disks. Then during the data movement, your systems takes ressources away from everyone because it has to move data regularly accesed from fast disk to put the payrol data. it may take one day, two ? I don’t know.
      let’s be optimistic and say 1.
      So day 1 and 2 or payroll applications are slow.
      Day 3, everyone is slightly slower because payroll data has to be promoted.
      Day 4/5, payroll data is ok, but the users who were used to fast access are now suffering from some latency because payroll data took over some space in fast spindles.
      day 7/8 payroll data is not accessed anymore but still sitting on fast spindles.
      day 9: payroll data is demoted to slower spindles (process is taking away resources from the system)
      Day10: back to normal
      20 days later: the process starts again…

      and in the meantime, other application will trigger the auto-tiering feature, monopolizing system resources.
      Now if you have only a few application, the auto-tiering feature will probably be used much less. you create a bunch of volumes and you let the system decide once and for all where the data will be. I think that’s a waste of money to pay for an “essential feature” that will just be used once.
      So it’ll be a great idea if you have no clue what you’re doing with your SAN and you have a bunch of application.
      But I may be wrong, but I believe that the resource taken by Auto-Tiering would be much more useful if employed for decent QoS.

      So I believe that auto tiering based mainly/only on disk technology is not the most appropriate way of providing the best performance at a reasonnable cost.

  4. Hmm Interesting read Pete! … I’m a fan of the compellent solution, have been ever since I first saw it about 5 years ago, when in a previous role! I like the auto-tiering, the virtual provisioning capability not to mention the dynamic block allocaiton that is available, and many other features! I assume the other SAN manufacturers have responded with their own technical answers in the last few years.

    Data lifecycle discussions usually tend to be interesting! … and then someone does an analysis which yields often interesting results!!

  5. Ok.
    So it doesn’t take away ressources from the users… If the users doesn’t fully use its HW.
    So it is more efficient if the customer has period of quiet use long enough for the system to move the data.
    So he buys HW but he has to leave it idle regularly if he wants it to perform at its best ?
    And if he doesn’t have an idle system (why would he buy it if not to use it ?) Then the migration process happens as I described 🙂

  6. Dave I am up-front about who I represent, who are you with now? Pillar?
    And while I agree that everyone wants to use 100% of their assets 100% of the time, you would have to have an extreme environment to get that. The most I managed in the past was a system supporting an organisation with a presence in India, Ireland and the US. So we had near 24/7 utilisation. But we still had times of the day where the system was running at less then others.

    If your system is running continuously at 100% do you not have to reduce the priority of some processes at the time your theoritical payroll kicks in and therefore users will see a degradationof service on other processes.

    Our system is designed for ease of management and thereby reducing the overhead of administration. Not everybody can afford to have somebody dedicated to managing the priority of processes on a SAN on a daily basis.

    • Pillar Data Systems has been acquired by Oracle last June :-).

      Now having an asset running at a high % of use is just data consolidation because it can.
      If you have to pay for a system but leave some ressource free for it to work correctly, you still payfor the unusable spindles.

      If your system runs slower as soon as you allocate more than 30/40% of the space (I’m not talking about Compellent here), you still pay for the 70/60% that you can’t use…

      Now. The Oracle Pillar Axiom doesn’t need anyone to manage the QoS ona daily basis. In fact, it doesn’t need anyone after the LUN creation.
      I’m fully available for a demo whenever you want. 🙂

    • Also: what is more critical to your business: the nurse/patient waiting to get the address displayed or the doctor waiting for a patient file to be displayed to diagnose and treat ?

      I would say it’s’more important to display the patient MRI fast by giving the IoPS a higher priority and not only a fast media.

      But i think this discussion deserves better than a few written exchange 🙂

      When do you drop by for a cuppa and i’ll demonstrate it to you, I happen to have such a SAN array at home.

  7. Okay lets take those points in turn. First, on critical to the business, thats exactly the point of the article. We attach emotion to our decisions, even when you setup priorities at the time of LUN creation. The system is unemotional and knows it can move the MRI to a slower disk with NO lose of performance because on the faster disks more people are using them. Move that data to a slower disk and you also have fewer people there. So having something on the fastest disk doesnt translate directly into better performance.

    Second, I agree we should avoid taking this much further. People who are reading this may not realise that we are actually friends who go back a good way and we have shared some good times on motorbikes together. Remember the IBF Airborne Division? 🙂
    And we have had many a debate over motorcycle matters on the Irish Biker Forum when we both always wanted the last word 😉

    If I drop over to your place I will expect a bit more than a cuppa!! I would expect you to lay on a full BBQ seeing as I will have taken the motorbike on the ferry to France!!!

  8. Well said !
    I let you have the last word… for now ^o^
    i hope to be able to spend a few days in Dublin in October.
    So, if you can’t make it to France before, we’ll finish this then 🙂
    Bbq is ready and I’m putting new wallpaper in the guest room, so you have no excuse 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s