如何高效删除表中重复记录,仅保留一条?
网上比较多的方法是如下的sql:
delete from 表 a
where (a.Id, a.seq) in
(select Id, seq from 表 group by Id, seq having count(*) > 1)
and rowid not in
(select min(rowid) from 表 group by Id, seq having count(*) > 1);
没有在(id,seq)上建立联合 index的时候,将会对表进行三次全表扫描,如果表很大的话,并不能高效删除重复记录.
如果改造为下面的SQL
delete 表
where rowid in (with t_save as (select Id, seq, min(rowid) min_rowid
from 表
group by Id, seq
having count(*) > 1)
select rowid
from 表 a
where exists (select 1
from t_save b
where b.id = a.id
and b.seq = a.seq
and b.min_rowid <> a.rowid)
)
将对表只有两次全表扫描.
如果重复的记录比较少,我们可以增加如下的hint:
delete 表
where rowid in (with t_save as (select Id, seq, min(rowid) min_rowid
from 表
group by Id, seq
having count(*) > 1)
select /*+ leading(b) use_nl(b a) */rowid
from 表 a
where exists (select 1
from t_save b
where b.id = a.id
and b.seq = a.seq
and b.min_rowid <> a.rowid)
)
并在(id,seq)上建立联合index,速度将会更快.
--转自