How a cross join-based SQL statement can be used to quickly populate a numbers tables.
When I write joins in SQL statements, 99% of the time I write inner and left joins. Why such a high frequency? Because these two join types cover almost all the conceivable table interactions one might need. But every once in a while I have an unconventional task to do, and sometimes a different join type is warranted.
In a future post I’ll cover full joins and why they could be helpful in some [albeit rare] circumstances. For now, I’ll cover an example of using a Cartesian [cross] join.
At Boston Public Schools, we assign students MBTA passes, and the passes need to be in a certain range. I needed to come up with some logic to populate a table with a list of numbers. Logic afterwards is used to assign passes to students as appropriate. The sql below is similar to what I came up with.
The example assumes 800,000 numbers need to be inserted into the [already existing but empty] TPassNumbers table. While this is far more records than would be needed for the number of students at BPS, having such a high number illustrates the speed improvement using a cross join.
Here’s the conventional solution:
DECLARE @counter INT SET @counter = 0 While @counter < 800000 BEGIN INSERT INTO TpassNumbers (tpassNumber) VALUES(@counter) SET @counter = @counter + 1 END
It is simple and gets the job done. Unfortunately, it takes over a minute to run. While this isn’t a terribly long time, this solution doesn’t seem to scale very well.
What does scale well? Joins scale. So here’s an example using cross joins that functionally does the same thing as the while loop above.
----- Use a table variable DECLARE @numbers TABLE ( num INT ) ----- Populate a table with numbers 0 to 9 DECLARE @counter INT SET @counter = 0 While @counter < 10 BEGIN INSERT INTO @numbers VALUES(@counter) SET @counter = @counter + 1 END INSERT INTO TpassNumbers (tpassNumber) SELECT ( n100000place.num*100000 + n10000place.num*10000 + n1000place.num*1000 + n100place.num*100 + n10place.num*10 + n1place.num ) AS 'number' FROM @numbers n1place CROSS JOIN @numbers n10place CROSS JOIN @numbers n100place CROSS JOIN @numbers n1000place CROSS JOIN @numbers n10000place CROSS JOIN @numbers n100000place WHERE n100000place.num < 8 ----- < 800000 ORDER BY NUMBER
It is a bit more complicated than the while loop. But it scales – it runs in 3 seconds. That’s a 20 fold speed improvement. I’ll take that any day.
Find this post helpful or interesting? Add site to favorites