[이진분류 Score 모델] 모집단 안정성 검증 - (1) PSI (Population Stability Index) (with Python)

728x90

신용평가모형의 안정성을 검증하는 지표는 모집단 안정성을 검증하는 PSI 지표와 평가항목의 안정성을 검증하는 CAR 지표가 있다.

오늘은 PSI에 대해 알아보려고 한다.

PSI (Population Stability Index)

PSI는 신용평가모형 개발 당시의 신용 등급별 분포와 실제 운영할 때의 신용 등급별 분포가 유사한지, 아니면 등급의 분포가 변화되었는지 검증하는 지표이다.

PSI 지표가 중요한 이유는, 보통 금융 기관에서는 신용등급이 일정 수준 이하일 경우 대출 혹은 신용카드 발급이 제한된다.

모형 개발 당시에는 8등급 이하가 전체 모집단의 5% 이하일 것으로 예상이 되어, 전체 고객 중 95% 이상이 대출이 취급 되거나, 신용카드가 발급될 것으로 예측했다고 하자.

하지만, 개발된 모형을 지속적으로 운영하면서 8등급 이하기 10%로 상승했다고 가정하면, 금융 기관 입장에서는 신용 등급 미달인 사람에게 대출하거나, 신용카드를 발급함으로써 리스크가 매우 높아지게 되고, 과거 개발 당시의 시뮬레이션 수치에 신뢰성이 떨어진다.

반대로 8등급 이하가 전체 모집단의 3%로 하락했다고 하자. 그럼 금융기관 입장에서는 동일 리스크 수준을 감안할 때, 8등급은 대출과 신용카드 발급을 어느정도 해줘도 될 수준이었을 것이다.

따라서 모형 개발 후에 지속적으로 등급별 분포의 변동성을 검증하는 것이 필요하다.

산출식

PSI 산출 수식은 다음과 같다.

PSI = sum((개발 당시 등급별 분포 - 실제 운영 시 등급별 분포) * ln(개발 당시 등급별 분포 / 실제 운영 시 등급별 분포))

PSI 값이 커질수록 등급의 변동성이 큰것을 알 수 있다.

값에 따른 위험도는 아래 표와 같이 해석할 수 있다.

기준	등급
<0.1	안정적
0.1 <=PSI < 0.2	미세한 변화를 보임
0.2<= PSI	중요한 변화를 보임. 모델 교체를 권장함

예시 with Python

PSI 에 대한 파이썬 코드는 아래와 같이 함수로 구현될 수 있다.

import numpy as np

def calculate_psi(expected, actual, buckettype='bins', buckets=10, axis=0):
    '''Calculate the PSI (population stability index) across all variables

    Args:
       expected: numpy matrix of original values
       actual: numpy matrix of new values, same size as expected
       buckettype: type of strategy for creating buckets, bins splits into even splits, quantiles splits into quantile buckets
       buckets: number of quantiles to use in bucketing variables
       axis: axis by which variables are defined, 0 for vertical, 1 for horizontal

    Returns:
       psi_values: ndarray of psi values for each variable

    Author:
       Matthew Burke
       github.com/mwburke
       worksofchart.com
    '''

    def psi(expected_array, actual_array, buckets):
        '''Calculate the PSI for a single variable

        Args:
           expected_array: numpy array of original values
           actual_array: numpy array of new values, same size as expected
           buckets: number of percentile ranges to bucket the values into

        Returns:
           psi_value: calculated PSI value
        '''

        def scale_range (input, min, max):
            input += -(np.min(input))
            input /= np.max(input) / (max - min)
            input += min
            return input


        breakpoints = np.arange(0, buckets + 1) / (buckets) * 100

        if buckettype == 'bins':
            breakpoints = scale_range(breakpoints, np.min(expected_array), np.max(expected_array))
        elif buckettype == 'quantiles':
            breakpoints = np.stack([np.percentile(expected_array, b) for b in breakpoints])



        expected_percents = np.histogram(expected_array, breakpoints)[0] / len(expected_array)
        actual_percents = np.histogram(actual_array, breakpoints)[0] / len(actual_array)

        def sub_psi(e_perc, a_perc):
            '''Calculate the actual PSI value from comparing the values.
               Update the actual value to a very small number if equal to zero
            '''
            if a_perc == 0:
                a_perc = 0.0001
            if e_perc == 0:
                e_perc = 0.0001

            value = (e_perc - a_perc) * np.log(e_perc / a_perc)
            return(value)

        psi_value = np.sum(sub_psi(expected_percents[i], actual_percents[i]) for i in range(0, len(expected_percents)))

        return(psi_value)

    if len(expected.shape) == 1:
        psi_values = np.empty(len(expected.shape))
    else:
        psi_values = np.empty(expected.shape[axis])

    for i in range(0, len(psi_values)):
        if len(psi_values) == 1:
            psi_values = psi(expected, actual, buckets)
        elif axis == 0:
            psi_values[i] = psi(expected[:,i], actual[:,i], buckets)
        elif axis == 1:
            psi_values[i] = psi(expected[i,:], actual[i,:], buckets)

    return(psi_values)

참고

https://www.niceinfo.co.kr/creditrating/bi_score_4.nice

NICE평가정보㈜

신용희망캠페인 서비스가 정비중입니다. 1588-2486으로 연락바랍니다.

www.niceinfo.co.kr

https://github.com/mwburke/population-stability-index/blob/master/psi.py

728x90

지구는 둥그니까

[이진분류 Score 모델] 모집단 안정성 검증 - (1) PSI (Population Stability Index) (with Python)

PSI (Population Stability Index)

산출식

예시 with Python

댓글

티스토리툴바