Suping Chen's Notes: November 2013

optimize sql for inner join with huge data.
We have a table include more than 1M records,
we want get data that the 3 fields is duplicate or only once's list
before we use "in" or "exists",
the sql when records is less, it is no problem, give us good return,
when records become huge, the sql could get return ,the main problem is inner jion,
So, the change is using CTE (common table expression),
the big improve is before even no return till timeout(more than 20 minutes)
now is only 30 senconds.
And another thing I learn from here is create index view
and deterministic and nondeteministic functions,
because the 1 column is caclute by user funtion, so I could not just create index for improve performance.
Any way, problem resovled.

SELECT Fl.*
, ROW_NUMBER() OVER (ORDER BY Fl.Name ) as num
FROM [dbo].[a] Fl


where exists
(
     select Fl1.r1
    from [dbo].[a]
    where

    and Fl.r1= Fl1.r1
    and Fi.r2= Fi1.r2
    and Fi.r3= Fi1.r3
    group by
     Fl1.r1
      ,Fi1.r2
      ,Fi1.r3
    having count(*) <= 1
)

;with R as
(
select Fl1.r1 as r1, Fl1.r2 as r2 ,Fl1.r3 as r3
    from [dbo].[a] Fl1
    group by
    r1
    ,r2
    ,r3
    having count(*) > 1
)
select t.*
from R inner join   [a] as t
on t.r1= R.r1
and t.r2= R.r2
and t.r3= R.r3

Refrence
http://dba.stackexchange.com/questions/28584/select-the-newest-data-inner-join-group-by-maybe

Suping Chen's Notes

Friday, November 1, 2013

Optimising join on large table ( (inner join + group by)