Credit Risk Management using Python

Credit Risk Management using Python

Credit risk is the risk of an economic loss from the failure of a counterparty to fulfil its contractual obligations. Its effect is measured by the cost of replacing cash flows if the other party defaults.

Credit Risk applies to anyone. If you fail to pay insurance premium it’s credit risk or the insurance company does not cover your risk its credit risk. As an analyst, your job is to minimize such risk and anticipate the loss that can occur to better prepare against it.

Drivers of Credit Risk

The distribution of credit risk can be viewed as a compound process driven by these variables:

Default, which is a discrete state for the counterparty — either the counterparty is in default or not. This occurs with some probability of default (PD). The likelihood that someone will default on a loan is the probability of default.

Credit exposure (CE), which is the economic or market value of the claim on the counterparty. It is also called exposure at default (EAD) at the time of default.

Loss given default (LGD), which represents the fractional loss due to default. As an example, take a situation where default results in a fractional recovery rate of 30% only. LGD is then 70% of the exposure.

Measurement of Credit Risk

The evolution of credit risk measurement tools has gone through these four steps:

  1. Notional amounts, adding up simple exposures
  2. Risk-weighted amounts, adding up exposures with a rough adjustment for risk.
  3. Notional amounts combined with credit ratings, adding up exposures adjusted for default probabilities
  4. Internal portfolio credit models, integrating all dimensions of credit risk


Using Python for Credit Risk

Probabilities of default as an outcome from machine learning Learn from data in columns (features) Classification models (default, non-default) Two most common models: Logistic regression Decision tree

Training a logistic regression:

Logistic regression available within the scikit-learn package from 

sklearn.linear_model import LogisticRegression

Called as a function with or without parameters 

clf_logistic = LogisticRegression(solver= ‘lbfgs’) 

Uses the method .fit() to train, np.ravel(training_labels))`

 Training Columns: all of the columns in our data except loan_status Labels: loan_status (0,1)

Separate the data into training columns and labels

X = cr_loan.drop(‘loan_status’ , axis = 1)
y = cr_loan[[‘loan_status’]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)

Use model predictions to set better thresholds Can also be used to approve or deny new loans For all new loans, we want to deny probable defaults.Use the test data as an example of new loans Acceptance rate: what percentage of new loans are accepted to keep the number of defaults in a portfolio low Accepted loans which are defaults have an impact similar to false negatives

import numpy as np # Compute the threshold for 85% acceptance rate threshold = np.quantile(prob_default, 0.85)
# Compute the quantile on the probabilities of default preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.804 else 0)

Even with a calculated threshold, some of the accepted loans will be defaults These are loans with prob_default values around where our model is not well calibrated

Bad Rate=Accepted Defaults/Total Accepted Loans

#Calculate the bad rate np.sum(accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()

Selecting acceptance rates

 The first acceptance rate was set to 85%, but other rates might be selected as well Two options to test different rates: Calculate the threshold, bad rate, and losses manually Automatically create a table of these values and select an acceptance rate The table of all the possible values is called a strategy table

# Set all the acceptance rates to test 
accept_rates = [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05]

# Create lists to store thresholds and bad rates
thresholds = [] bad_rates = []
for rate in accept_rates:
# Calculate threshold 
threshold = np.quantile(preds_df['prob_default'], rate).round(3)
 # Store threshold value in a list thresholds.append(np.quantile(preds_gbt['prob_default'], rate).round(3)) 
# Apply the threshold to reassign loan_status test_pred_df['pred_loan_status'] = \ test_pred_df['prob_default'].apply(lambda x: 1 if x > thresh else 0)
 # Create accepted loans set of predicted non-defaults 
accepted_loans = test_pred_df[test_pred_df['pred_loan_status'] == 0]
 # Calculate and store bad rate bad_rates.append(np.sum((accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()).round(3))
strat_df = pd.DataFrame(zip(accept_rates, thresholds, bad_rates), columns = ['Acceptance Rate' , 'Threshold' , 'Bad Rate'])

Total expected loss:

# Probability of default (PD) 
# Exposure at default = loan amount (EAD)
# Loss given default = 1.0 for total loss (LGD)

This article covers the basics of credit risk management. After calculating the total expected loss we should calculate the unexpected loss which depends on the std deviation of LGD.

“Not taking risks one doesn’t understand is often the best form of risk management.”― Raghuram G. Rajan, Fault Lines: How Hidden Fractures Still Threaten the World Economy