본문 바로가기
Server/Oracle

[Oracle] Oracle DB 12c에서 달라진 Top-N 쿼리 (스크랩)

by 뒹굴거리는프로도 2017. 12. 20.
반응형

 


 

결론 : 오라클 12c의 SQL은 SQL:2008을 따라서 Top-N 쿼리문이 DB2, PostgreSQL, Sybase의 쿼리문과 같아졌습니다!

 

오라클 11g 이전의 버전에서는 Top-N 쿼리를 수행하기 위해 인라인 뷰를 사용하여 where 조건절에 rownum을 사용하여 걸러내는 방식으로 쿼리를 작성해왔습니다.

SELECT *
  FROM (SELECT employee_id, last_name
          FROM hr.employees
         ORDER BY employee_id)
 WHERE rownum <= 5;

EMPLOYEE_ID LAST_NAME
----------- -------------------------
        100 King
        101 Kochhar
        102 De Haan
        103 Hunold
        104 Ernst

 

Execution Plan ---------------------------------------------------------- ----------------------------------------------------------------------------------------------- | Id  | Operation                     | Name          | Rows  | Bytes | Cost (%CPU)| Time     | ----------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT              |               |     5 |   135 |     2   (0)| 00:00:01 | |*  1 |  COUNT STOPKEY                |               |       |       |            |          | |   2 |   VIEW                        |               |     5 |   135 |     2   (0)| 00:00:01 | |   3 |    TABLE ACCESS BY INDEX ROWID| EMPLOYEES     |   107 |  1284 |     2   (0)| 00:00:01 | |   4 |     INDEX FULL SCAN           | EMP_EMP_ID_PK |     5 |       |     1   (0)| 00:00:01 | -----------------------------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter(ROWNUM<=5)

 

타 DBMS를 사용해왔던 개발자들에게 오라클의 인라인 뷰를 사용한 Top-N 쿼리문은 몹시나 거추장스러운 것으로 불만의 대상이었죠.

(타 DBMS에서의 Top-N 쿼리 처리방식은 오라클 성능 고도화 2권에서도 나오고 있습니다.)

그런데 오라클 12c에서는 이런 개발자들의 불만을 해결할만한 새로운 방식의 쿼리 작성법이 추가 되었습니다.

바로 로우 제한 절(row limiting clause)의 추가입니다. 먼저 상위 5개의 결과를 추출하는 쿼리는 아래와 같습니다.

 

SELECT employee_id, last_name
  FROM hr.employees
 ORDER BY employee_id
 FETCH FIRST 5 ROWS ONLY;

EMPLOYEE_ID LAST_NAME
----------- -------------------------
        100 King
        101 Kochhar
        102 De Haan
        103 Hunold
        104 Ernst

 

 

Execution Plan ---------------------------------------------------------- --------------------------------------------------------------------------------------------- | Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     | --------------------------------------------------------------------------------------------- |   0 | SELECT STATEMENT         |                  |   107 |  5671 |     2   (0)| 00:00:01 | |*  1 |  VIEW                    |                  |   107 |  5671 |     2   (0)| 00:00:01 | |*  2 |   WINDOW SORT PUSHED RANK|                  |   107 |  1284 |     2   (0)| 00:00:01 | |   3 |    VIEW                  | index$_join$_001 |   107 |  1284 |     2   (0)| 00:00:01 | |*  4 |     HASH JOIN            |                  |       |       |            |          | |   5 |      INDEX FAST FULL SCAN| EMP_EMP_ID_PK    |   107 |  1284 |     1   (0)| 00:00:01 | |   6 |      INDEX FAST FULL SCAN| EMP_NAME_IX      |   107 |  1284 |     1   (0)| 00:00:01 | ---------------------------------------------------------------------------------------------  Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=5)    2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMPLOYEE_ID")<=5)    4 - access(ROWID=ROWID)

 

ORDER BY 절을 사용하여 정렬을 한 상태에서 FETCH FIRST 5 ROWS ONLY 절로 상위 5개의 결과값만 추출하도록 제한을 걸어두었지요.

만약 1~5위의 결과는 생략하고 6~10위의 결과만 보기 원한다면 어떻게 하면 될까요?

 

SELECT employee_id, last_name
  FROM hr.employees
 ORDER BY employee_id
OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY;

EMPLOYEE_ID LAST_NAME
----------- -------------------------
        105 Austin
        106 Pataballa
        107 Lorentz
        108 Greenberg
        109 Faviet

 

바로 OFFSET 5 ROWS를 추가하여 5개의 결과값을 건너뛰고서 그 이후의 결과값을 추출하라고 DB에게 요구할 수 있습니다.

당연한 내용이겠지만 OFFSET에 음수 값을 줄 경우, 오라클은 이를 0으로 간주하여 맨 처음값부터 출력을 하게되고, 마찬가지로 NULL 값을 입력하거나 row의 개수보다 더 큰 값을 주었을 때에도 0으로 간주합니다. (참 똑똑하지요?)

뿐만 아니라 백분률을 토대로 Top-N 쿼리를 수행해내기도 하는데요.

 

SELECT employee_id, last_name, salary
  FROM hr.employees
 ORDER BY salary
 FETCH FIRST 5 PERCENT ROWS ONLY;

EMPLOYEE_ID LAST_NAME                     SALARY
----------- ------------------------- ----------
        132 Olson                           2100
        136 Philtanker                      2200
        128 Markle                          2200
        127 Landry                          2400
        135 Gee                             2400
        191 Perkins                         2500

6 rows selected.

 

위의 예에서 보시다시피 salary로 정렬된 값에서 상위 5% 이내의 결과만 출력하도록 PERCENT 키워드를 넣어서 쿼리문을 작성할 수도 있습니다.

그런데 위의 쿼리를 보면 뭔가 석연찮은 점을 발견할 수 있습니다. 바로 중복값의 처리입니다.

보다시피 Perkins와 같이 salary가 2500인 사람이 더 있는데, 단지 정렬 순위에서 맨 위에 있다는 이유로 나머지 사람들은 잘려버리고 Perkins만 상위 5%로 살아남은 것이지요. (저 사람, 운이 좋은건가요?)

나머지 salary가 2500인 사람들도 억울하지 않게 쿼리를 수정해야겠지요? 아래의 쿼리문처럼 WITH TIES 옵션을 주어서 말입니다.

 

SELECT employee_id, last_name, salary
  FROM hr.employees
 ORDER BY salary
 FETCH FIRST 5 PERCENT ROWS WITH TIES;

EMPLOYEE_ID LAST_NAME                     SALARY
----------- ------------------------- ----------
        132 Olson                           2100
        136 Philtanker                      2200
        128 Markle                          2200
        127 Landry                          2400
        135 Gee                             2400
        191 Perkins                         2500
        119 Colmenares                      2500
        140 Patel                           2500
        144 Vargas                          2500
        182 Sullivan                        2500
        131 Marlow                          2500

11 rows selected.

 

이렇게 편리한 구문의 추가로 인해 12c부터는 인라인 뷰를 만들기 위한 괄호와의 숨바꼭질은 어느 정도 사라질 것 같군요!



 

반응형