After working weeks on excel going through the balance sheet of banks making data entries, the ex-coder in me shouted: “Its time to use some grey cells, use python”.So for this weekend, I tried and understand how do I combine ROE and .py.

So I started by understanding the length to which python could help me.The first thing that I came across were pandas not the animal.So pandas is a great library for anyone who wants to do data visualisation .Then I came across Anaconda it almost feels like a jungle safari.Anaconda is more of a platform which contains things like Jupiter notebook Spyder and other great IDE’s.My hunt for the collection was completed by numpy and matplotlib.pyplot.

Along with these great libraries and IDE’s I just had to find out what I needed as an output .So I started with regression ,I tried to find out

#### Regression between debt/equity and equity:

import pandas as pd

import matplotlib.pyplot as plt

from scipy.stats import linregress

import numpy as np

import seaborn as sns

dataframe = pd.read_excel("Ratio.xlsx",sheet_name='NBFC')

y= dataframe['Equity']

x = dataframe['Debt/Equity']

slope, intercept, r_value, p_value, std_err = linregress(x, y)

line_x = np.arange(x.min(), x.max())

line_y = slope*line_x + intercept

plt.plot(line_x, line_y, label='$%.2fx + %.2f$, $R^2=%.2f$' % (slope, intercept, r_value**2))

plt.show()

I read the excel and applied linregress function The following was the output

I read the excel and applied linregress function The following was the output

I guess in my use case D/E increased with increase in equity does that mean that if a company has more equity it would have taken more debt also and that to increasing in a significant proportion ?I don’t know let me know if you think the same.

After this I wanted to do cluster analysis to understand the distribution of NBFC with respect to their loan books and balance sheet size .Cluster analysis helps us form renters around which data is spread.This helps in forming groups and really helps in marketing segmentation.

#### Cluster analysis

import numpy as np

import pandas as pd

importx as plt

# import hierarchical clustering libraries

import scipy.cluster.hierarchy as sch

from sklearn.cluster import AgglomerativeClustering

# create dendrogram

df = pd.read_excel("Ratio.xlsx",sheet_name='NBFC')

# create clusters

hc = AgglomerativeClustering(n_clusters=4, affinity = 'euclidean', linkage = 'ward')

# save clusters for chart

points=np.vstack((df['Net Interest Income'],df['Operating Profit'] )).T

y_hc = hc.fit_predict(points)

plt.scatter(points[y_hc ==0,0], points[y_hc == 0,1], s=100, c='red')

plt.scatter(points[y_hc==1,0], points[y_hc == 1,1], s=100, c='black')

plt.scatter(points[y_hc ==2,0], points[y_hc == 2,1], s=100, c='blue')

plt.scatter(points[y_hc ==3,0], points[y_hc == 3,1], s=100, c='cyan')

This helped me realize that most nbfc’s have a loan book size of less than 2000 Cr. And also w can form 2 major cluster’s of nbfc’s one with loan book of 0–2000 and balance sheet of 0–2000 and other of loan book of 2000–4000 and balance sheet of 3000–6000 Cr.

I am still learning finance and python and would love to know if there are any great tutorials for it. Let me know in the comments