Replacing Cursors with Set-Based SQL Queries – Part 2

July 6, 2011

Part 2 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a recent post I discussed cursors and mentioned, at a high level, an example of replacing a cursor with a set-based series of SQL statements. In this post, I’m going to provide a simplified version of the SQL for both the cursor and the replacement. I’ll also discuss how I did the SQL conversion and why the new version is so much faster.

The following SQL has been shortened and cleaned up slightly from the original version. Although the version as shown will not run stand-alone, it will illustrate how the cursor was transformed to the set-based version.

Here is the original cursor which took about 20 minutes to run:

DECLARE  CornerCursor   CURSOR FAST_FORWARD FOR
	SELECT 
		StudentNumber,
		SchoolCode,
		CurTransEligibility,
		GradeLevel,
		GeoCode,
		HomeZone,
		BilingualCode,
		SpedCode
	FROM 
		--  [Tables for getting the student data]
 
OPEN CornerCursor
 
FETCH NEXT FROM CornerCursor INTO 
	@StudentNumber,
	@SchoolCode,
	@CurTransEligibility,
	@GradeLevel,
	@GeoCode,
	@HomeZone,
	@BilingualCode,
	@SpedCode
 
WHILE @@FETCH_STATUS = 0
BEGIN
	-- For the grade level, see if the student is ineligible 
              -- for transportation [too close to school].
	IF EXISTS(SELECT Sch FROM TransWalkingDistance 
                       WHERE SchCode = @SchoolCode 
                           AND GeoCode = @GeoCode 
                           AND GradeLevel = @GradeLevel)
	BEGIN
		UPDATE 
		      EnrollmentRecord
		SET
		      TransCode = '',	
                                 LastUpdateUserID= 'CScrpW' -- ineligible
		WHERE 
		      StudentNo = @StudentNo 
                                  AND SchCode = @SchoolCode
 
		GOTO FetchNext
	END
 
 
              IF ((rtrim(@BilingualCode) = '' 
                   OR RTRIM(isnull(@BilingualCode,''))  = 'S')
                   AND (@SpedCode = '' 
                           OR SUBSTRING(@SpedCode, 1, 1) IN ('S', 'R') 
                           OR @SpedCode IN ('V1', 'V2', 'P1', 'P2'))) 
	BEGIN
		IF @GradeLevel <> 'H' 
			AND NOT EXISTS
			(SELECT School FROM BufferZoneLookup 
			      WHERE School = @School  
                                               AND GradeLevel = @GradeLevel 
			      AND (GeoCode = @GeoCode 
                                               OR HomeZone IN (@HomeZone, 'C','H')))
		BEGIN
			-- Code here to run a similar update to the update 
                                         -- statement above, but setting the transportation
			-- code to 'C0RR' and the LastUpdateUser to 'CScrpZ'
		END
 
		GOTO FetchNext
	END
 
	-- Two additional update statements here, each with different 
              -- conditions.  There's a default one as well if none of the above 
              -- updates were used.
 
	FetchNext:
		FETCH NEXT FROM CornerCursor INTO 
		@StudentNumber,
		@SchoolCode,
		@CurTransEligibility,
		@GradeLevel,
		@GeoCode,
		@HomeZone,
		@BilingualCode,
		@SpedCode
	END
END
CLOSE CornerCursor
DEALLOCATE CornerCursor

Notice with the above SQL that there are a sequence of update statements surrounded by if/else conditions. The updates also run in a way where if one update runs, none of the following ones can run. I used these clues to help me design the set-based version. I decided to do things in the following order:

1) Encapsulate the if/else conditions referred to in the cursor into a temporary table with multiple bit columns. For these, more than one of the conditions [bits] can be set.
2) Use the temp table with the bit field settings to determine what the transportation eligibility code should be. The update statements simply modify the transportation code and last update user, so that’s what I set when populating the second temporary table. In the logic, the order of the case statement matches the order of the updates in the cursor.
3) Use the transportation codes set in #2 to actually do the updates.

Here’s the resulting SQL:

-- This query will populate a table with different bit fields 
-- representing the if/else conditions used throughout the cursor.  
-- So essentially this is centralizing disperse if/else conditions into 
-- one record for each student that can be then used by the next query.
SELECT
	studentdata.StudentNumber,
	studentdata.SchoolCode,
	studentdata.TransCode,
	studentdata.GradeLevel,
	studentdata.BilingualCode,
	studentdata.SpedCode,
	studentdata.GeoCode,
	studentdata.HomeZone,
	(CASE WHEN studentdata.sch IN('4261','1010','1020','1340')
		THEN 1 ELSE 0 END) AS 'CanNotBeOutOfZone', -- Citywide
              (CASE WHEN isnull(SchZone.sch, '') <> '' THEN 1 ELSE 0 END) 
                                                            AS 'InZoneOrBufferZone',
	(CASE WHEN isnull(TransWalkingDistance.sch, '') <> '' THEN 1 ELSE 0 END)
                                                            AS 'IsInWalkingDistance',
	(CASE WHEN studentdata.grade IN ('K0', 'K1') 
                     AND studentdata.isExtendedDay = 0 THEN 1 ELSE 0 END) 
                                                            AS 'IsHalfDayKindergarten',
	-- Bilingual non-spanish students are always transportation eligible.
	(CASE WHEN NOT isnull(studentdata.BilingualCode, '') = '' 
                  AND NOT (isnull(studentdata.BilingualCode,'')) = 'S' THEN 1 ELSE 0 END)
                                                            AS 'BilingualIneligibleForC0RR',
	-- A few additional case statement options omitted [SpedIneligibleForC0RR].
INTO #StudentsToProcess
FROM
	--  [List of tables] similar to the one in the cursor definition.   
 
-- Use the bit fields to set what the transportation code would 
-- be for the student.  The case statement to set PropsedTransCodeBasedOnData 
-- runs in the same precedence order as the original cursor.  In other words, 
-- order is important to keep the results the same.
SELECT  studentdata.studentno, studentdata.transcode AS 'curTransCode',
	-- in the walkeligible range
	(CASE WHEN IsInWalkingDistance = 1 THEN ''
	-- Set out of zone.  High school grades and some other 
              -- categories cannot be out of zone.
	WHEN BilingualIneligibleForC0RR = 0
	    AND NOT sys.grade IN('09','10','11','12') 
	    AND InZoneOrBufferZone = 0 
	    AND CanNotBeOutOfZone = 0 THEN 'C0RR'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') = '') THEN 'C5*R'
	WHEN (IsHalfDayKindergarten = 1 
                                   AND isnull(sys.transcode, '') <> '') THEN sys.transcode
	ELSE 'C5RR' END) AS 'PropsedTransCodeBasedOnData'
INTO #ProposedTranscodeChanges
FROM 
	-- [List of tables] 
 
-- Run the update - set trans eligibility codes [c5rr, c0rr, etc].  
-- Each update used a different user ID tag, so set this appropriately.
UPDATE studentdata SET transcode = propsedTransCodeBasedOnData,
LastUpdateUser = (CASE isnull(propsedTransCodeBasedOnData, '') 
              WHEN 'c5rr' THEN 'CScrpC'
	WHEN 'c0rr' THEN 'CScrpZ' 
              WHEN '' THEN 'CScrpW' 
              WHEN 'C5*R' THEN 'CScrpK' 
              ELSE 'CScrpX' END)
FROM 
	-- [List of tables]

I could have likely combined the SQL statements above so only 1 or 2 queries would be necessary. But doing so would have been at the cost of increased query complexity and maintainability. The queries run in under 10 seconds, a significant performance improvement over the cursor-based version!

Why is the cursor so slow? It is simple – each student is processed one by one. To make matters worse, multiple queries are run within the body of the cursor, so over 250,000 queries could theoretically be run for 55,000 students. Why run that many queries when 3 queries will do?


Find this post helpful or interesting? Add site to favorites

School Data Move Done!

June 30, 2011

A brief mention that we are done with the school year flip process for transitioning to the new school year.

I know I haven’t been quite as active on the site the last week or so – I do hope to post more over the upcoming days/weeks. But today I’ll just share a short update – at BPS, we got through the annual school year flip process. In other words, we archived the 2010-2011 school data and moved the 2011-2012 school data into the active tables. Yay! So while school doesn’t start until September for most at BPS, the new school year has already begun for us at the central office.

I certainly plan to discuss the school year flip process in more detail in a future post. Needless to say, there was a lot of SQL involved with the process.


Find this post helpful or interesting? Add site to favorites

Replacing Cursors with Set-Based SQL Queries – Part 1

June 26, 2011

Part 1 of a discussion about replacing cursors with SQL statements for significant speed improvements.

In a previous post I discussed where it is best to put logic – SQL or code. In my view, SQL is great for processing sets of data and poorly structured for row-based processing. To put it bluntly, I try to avoid cursors in most circumstances. Is this rule absolute? Not necessarily. Sometimes the operation cannot be done in [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Google as the Ultimate SQL Repository

June 22, 2011

A discussion of how Google can serve as an invaluable resource when facing a SQL-based challenge.

Today I briefly want to mention a question a colleague at work asked: How can a comma delimited string be split using T-SQL but without using a cursor/loop? I wasn’t sure at first, but as is often the case, Google served as an invaluable resource. I found a useful solution here.

This example question and solution shows something important – in general, [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Table Backups and the TableArchive Database

June 20, 2011

A discussion of how a separate database is used for storing table backups.

In a recent article, one of the things I mentioned was saving table backups before important and/or risky processes. For example, I generally do this before truncating the next year school assignment and school waitlist tables and reloading the new assignments/waitlists. Sometimes these backups are removed soon after, but some backups are kept on-hand for a few months or a year in [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Being Cautious With SQL Based Processes

June 16, 2011

Why being careful is such an important mindset when working with SQL and databases.

At the Boston Public Schools, certain processes need to be run at particular times of the school year. There’s a couple of specific windows of time where we print choice forms for students. There’s a few occurrences where we run a process to assign students to schools. There’s a specific day [generally June 30th] where we have a significant process to [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

SQL Books Arrived Today

June 14, 2011

A small post regarding a couple SQL books I bought from Amazon.

A couple new books arrived in the mail from Amazon.com. Both are by Stéphane Faroult, whose videos I mentioned a few days ago.

The books are:

1) Refactoring SQL Applications
2) The Art of SQL

I’m not sure I’m going to read them cover to cover – more likely I’ll scan through and read through parts that interest me or that I’d like to learn more [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Database Documentation with a Data Dictionary Part 2

June 13, 2011

A continuation of the data dictionary discussion with information about how the documentation is done.

Today I’m going to continue discussing the data dictionary tables we’ve set up at the Boston Public Schools. The SQL for creating the tables and inserting the baseline example records is in my previous data dictionary post. The data inserted uses a few tables in the AdventureWorks database, and I’ll continue with this example.

At BPS, there were well over a hundred TableInfo [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Some Good SQL-Related YouTube Videos

June 11, 2011

A brief discussion and links to a series of YouTube videos from Stéphane Faroult.

I’ll continue the discussion of the table dictionary tomorrow. Today I’m sharing an informative and funny short series of youtube videos I found from Stéphane Faroult:Part 1Part 2Part 3
The videos are humorous, thought provoking, and informative. The humorous nature of the discussion and the visuals he uses are important as these aspects make the information memorable. Keep in mind he’s presenting [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

Database Table Documentation with a Data Dictionary

June 9, 2011

Why table and column documentation is useful and some SQL for setting up the data dictionary tables.

At Boston Public Schools, we have many hundreds of data tables. Some are used all the time, others are used only during certain times of the year, while others are simply backups or are no longer used. We recently went through the process of identifying and documenting the important tables in the system. The result of this was a data dictionary [...] Continue Reading…


Find this post helpful or interesting? Add site to favorites

 
Copyright 2011 Andrew Zwicker. Theme by Shlomi Noach, openark.org. Help With SQL - Blogged