Friday 16 March 2012

Why I don’t like maintenance plans


If you create a maintenance plan (SQL 9.0.3042) to backup databases and select All databases as below;


















Then at some point in the future if one of the databases is taken offline the job created by the maintenance plan will fail.

From what I have seen the job will still backup the rest of the databases (which is the only saving grace), you just get a lot of errors in you SQL log like the below;

BACKUP failed to complete the command BACKUP DATABSE SQLServer365. Check the backup application log for detailed messages.

Shortly followed by;

BackupDiskFile::OpenMedia: Backup device 'D:\Backups\SQLServer365\ SQLServer365_backup_201203160500.bak' failed to open. Operating system error 2(error not found).

Yes I know, you should know that the database was taken offline and have amended the maintenance plan accordingly.  In my experience though I have found that people create maintenance plans as a quick temporary fix on non-critical servers  (which becomes permanent).  These servers are often forgotten about until there is a problem.

I avoid creating maintenance plans, yes they are quick and easy but are often everything or nothing.  You are much better creating maintenance routines tailored to your environment.  I am in the process of creating a maintenance routine for backups, consistency checks and index maintenance to work across all flavours of SQL Server with a whole host of configurable parameters. 

I will provide this once complete.

Chris

Wednesday 14 March 2012

How Resilient Is Your Recovery Plan?


In this modern age of technology DBA’s face an ever increasing demand from businesses; our databases must perform, be secure, highly available and scalable and equally as important recoverable.  There is an obsession with the 5 9’s within the industry, and rightly so, but just how resilient is your recovery plan?  Can you honestly say hand on heart you have covered everything?  I can’t, but it isn’t through want of trying.

Performance and Security
Let’s say then that you are happy with the performance of your databases, you have normalised to 3rd normal form in the vast majority of cases without compromising performance.  All the right security measures are in place, Active Directory Groups with access available only for production DBA’s, you also have a controlled staged release procedure through Development, UAT and Live.   You monitor a wide variety of metrics and analyse them on a weekly basis for auditing and capacity planning.   You have a solid index strategy to squeeze out every possible ounce of performance.  Performance and Security you have covered pretty well but being a DBA you are always looking for ways to make things better.

Highly Available
You have a 2 node active passive Windows Failover Cluster, have tested all the possible failover scenarios and are happy that failovers and failbacks occur successfully.  Again you are happy with the high availability of your environment.

Scalability
You have plenty of room for “growth” both in size and load on the environment, there is also a contingency for additional resources (for ITIL people I mean storage, CPU and memory ;) if required.  Performance and utilisation is monitored and reviewed weekly as part of your standards.  You are comfortable that the environment scales.

Recoverable
You have configured Database Mirroring in high safety with manual failover to prevent false failovers as the DR site in a different country.  Failover has been tested and the databases are served to the applications successfully failback also works a treat.  You take full backups on a Sunday differentials every night and transaction log backups every 15 minutes.  The backups are then backed up to tape and sent to an offsite storage facility.

Good Times (GT’s), so far so good you have ticked all the required boxes up to this point.  Now let’s throw in some scenarios and see how this environment would work.

Performance problems reported
                Your solid release process highlighted a potential risk and the affected processes are rolled back to guarantee performance while the issues are being addressed.

Potential Security Breach
                This was found proactively as during your weekly review you noticed login failures and raised the issue with your security team to resolve.

Running low on disk space
Again this is proactive, based on current growth rates you predict that a volume will run out of space in 6 months, a change request is made and the volume expanded.

A Server in the cluster fails
                No problem here, failover is automatic you resolve the error with the problematic node and add it back into the cluster.

Entire cluster / primary site failure
                Again no problem you manually failover from the principal to the failover partner, the application and operation teams follow their DR plans and hey presto with little downtime your environment is up and serving customers again.

Data Loss
                At 12:00 reports of errors in the application reach your team, Investigation finds that a disgruntled DBA who was working their notice period deleted 100,000 records from the order table at 11:05 on a Friday morning.  A decision is made to recover from backup, the backups are kept on disk for one week so you look at restoring the affected database.  This is where the problem arises.

Your backup script consists of restoring the full backup from Sunday The differential from Thursday night and then all transaction logs from Friday with a STOP AT command at the last transaction before the delete.  The full backup restores fine but the differential backup from Thursday night fails as it is incomplete.  You investigate further and have to result to restoring the full backup from Sunday the differential backup from Wednesday and all transaction logs from Thursday up until 23:45 which is the last transaction log before the incomplete differential.

Yes that’s right you have lost just over 12 hours’ worth of data!  Things didn’t have to be this way though.  The solution is to verify your backups; if you would have verified the consistency of your backups this could have been avoided.

Summary
I know from experience trying to justify the cost of a server can be a painstaking task but justifying one that does not serve clients is even more difficult.  The spin I always use is;

“How much would it cost the company if we lost a day’s worth of data?” 

As a techie it is a no brainer, but persuading someone to part with their money that they see no day to day benefit from having is a different matter.  From their standpoint it is not revenue generating and the risk is worth taking.

In an ideal world all backups would be verified by restoring them but this is simply not practical for everyone as the budget may simply not available.  Take into consideration also that most environments I have worked in have had 40+ SQL Servers running different versions; this would potentially require more than one server to verify the backups.

Recommendations
What I would recommend is;

1 – Get buy in from other team members and line managers, the more people you have that agree with you on this the stronger the case will be when it is presented to the person with the cheque book!

2 - Prioritise your SQL Servers and verify the backups of the most critical databases as often as possible.  If you lose data in some databases you can get by, for example IT Internal databases (SQLSentry, Solarwinds).

Believe me it may one day save you and quite possibly your job!

Saturday 3 March 2012

A Script A Day Finished!

Well that's it, my "A Script A Day" series is finished.  I hope you found the scripts I provided useful, I've used all the scripts in real world DBA tasks.  I tried to provide a variety of scripts covering a number of different areas.  This series I feel will appeal more to new or inexperienced DBA's but I'm sure that some more seasoned DBA's will be able to take something away from it.  It's been a thoroughly enjoyable project that I have taken great pride in doing.  As promised here is the pdf with all the scripts in for easy access.


Thanks to everyone who commented on the posts and special thanks to David Riley, Chris Taylor and Paul Anderton with whom I continue to learn and share my experiences.

Friday 2 March 2012

A Script A Day - Day 29 - The Importance of Being Idle

Today’s script is also one I used in my migration on Wednesday.  It again uses string manipulation to generate a script, this time the restore database script.  Now granted this quick script wouldn't work if there are any secondary data files and is reliant on the logical file names and file locations etc etc. 

The point of me using this script is because I am very anal when it comes to standards, I like to make sure drive letters and paths are consistent as well as naming conventions for all databases and objects.  Adhering to standards makes your life as a DBA much easier especially when it comes to tasks like migrations.

/*
      -----------------------------------------------------------------
      The Importance of Being Idle (Results To Text Ctrl+T)
      -----------------------------------------------------------------
     
      For more SQL resources, check out SQLServer365.blogspot.co.uk

      -----------------------------------------------------------------

      You may alter this code for your own purposes.
      You may republish altered code as long as you give due credit.
      You must obtain prior permission before blogging this code.
 
      THIS CODE AND INFORMATION ARE PROVIDED "AS IS"
     
      -----------------------------------------------------------------
*/
-- Change database context
USE master;
GO
-- Create restore database script
SELECT
      'RESTORE DATABASE ' + [name] + ' FROM DISK = ''D:\Migration\Backup\' + [name] + '_migration_20120301.bak''' +
      ' WITH REPLACE, MOVE ''' + [name] + '_Data''' + ' TO ''D:\Data\' + [name] + '_Data.mdf'',' + ' MOVE ''' + [name] + '_Log''' + ' TO ''L:\Log\' + [name] + '_Log.ldf'';'
FROM
      sysdatabases;
GO

Enjoy!

Chris

A Script A Day - Day 28 - String Manipulation

Today’s script is one that I used earlier this week.  On Thursday I migrated a server from SQL Server 200 to SQL Server 2008 R2.  Now I love migrations, I don’t think I’m weird but I buzz off the addrenaline rush when working under pressure and I get a great deal of satisfaction when a migration is complete.  As part of the migration I backed up the databases ready to be restored to the new server.  As there where a number of databases to backup and of course I don’t use the GUI where I can help it the easiest way for me to generate the backup script prior to the migration was using the below script.  The Query results I then save to a script file ready for use.

/*
      -----------------------------------------------------------------
      String Manipulation (Results To Text Ctrl+T)
      -----------------------------------------------------------------
     
      For more SQL resources, check out SQLServer365.blogspot.co.uk

      -----------------------------------------------------------------

      You may alter this code for your own purposes.
      You may republish altered code as long as you give due credit.
      You must obtain prior permission before blogging this code.
 
      THIS CODE AND INFORMATION ARE PROVIDED "AS IS"
     
      -----------------------------------------------------------------
*/
-- Change database context
USE master;
GO
-- Create backup database script
SELECT
      'BACKUP DATABASE ' + [name] + ' TO DISK = ''D:\Migration\Backup\' + [name] + '_migration_20120301.bak''' + ';'
FROM
      sysdatabases;
GO

Enjoy!

Chris