Friday, November 1, 2013

Optimising join on large table ( (inner join + group by)

optimize sql for inner join with huge data.
We have a table include more than 1M records,
we want get data that the 3 fields is duplicate or  only once's list
before we use "in" or "exists",
the sql when records is less, it is no problem, give us good return,
when records become huge, the sql could get return ,the main problem is inner jion,
So, the change is using CTE (common table expression),
the big improve is before even no return till timeout(more than 20 minutes)
now is only 30 senconds.
And another thing I learn from here is create index view
and deterministic and nondeteministic functions,
because the 1 column is caclute by user funtion, so I could not just create index for improve performance.
Any way, problem resovled.

SELECT Fl.*
 , ROW_NUMBER() OVER (ORDER BY Fl.Name ) as num
 FROM [dbo].[a] Fl
   
   
 where  exists 
 (
     select Fl1.r1
    from [dbo].[a]     
    where
    
    and Fl.r1= Fl1.r1
    and Fi.r2= Fi1.r2
    and Fi.r3= Fi1.r3
    group by
     Fl1.r1
      ,Fi1.r2
      ,Fi1.r3
    having count(*) <= 1
 )


;with R as
 (
 select Fl1.r1 as r1, Fl1.r2 as r2 ,Fl1.r3 as r3
    from [dbo].[a] Fl1                  
    group by
    r1
    ,r2
    ,r3
    having count(*) > 1
 )
 select t.*
 from R inner join   [a] as t
 on t.r1= R.r1
 and t.r2= R.r2
 and t.r3= R.r3

Refrence
http://dba.stackexchange.com/questions/28584/select-the-newest-data-inner-join-group-by-maybe