Stock Market and Modern Portfolio analysis

Stock market prediction has always been the philosopher’s stone for many who seek to convert the lead into gold. It’s has been elusive for most of us however as we enter into the age of information the tables seem to have turned.

#### Getting Started

When I say get rich I mean rich in data after all data is the new currency. Before we begin let’s just make one thing clear we cannot predict with 100% accuracy what the price is going to be even a confidence interval of 90% is unattainable. For this article, I am referencing datacamp course: Machine Learning for Finance in Python

#### Exploratory Data Analysis:

The first step needed to be done is to smell check the data. Here we generally look at the columns and datatype.

print(lng_df.head()) # examine the DataFrames

print(spy_df.head()) # examine the SPY DataFrame

# Plot the Adj_Close columns for SPY and LNG

spy_df['Adj_Close'].plot(label='SPY', legend=True)

lng_df['Adj_Close'].plot(label='LNG', legend=True, secondary_y=True)

plt.show() # show the plot

plt.clf() # clear the plot space

plt.clf() # clear the plot space

#### Handling Anomalies:

To handle anomalies we generally use scatter plot and box plot to locate the data points and use mean, median,ffill or bfill to replace them. Since this is a time-series data we won’t be checking for anomalies.

#### Feature Engineering

Feature Engineering is done to create new data from the existing dataset with high correlation with the target variable. Over here there are two features which we will generate.

Pct_change is the per cent change in value concerning the previous value

# Create 5-day % changes of Adj_Close for the current day, and 5 days in the future

lng_df['5d_future_close'] = lng_df['Adj_Close'].shift(-5)

lng_df['5d_close_future_pct'] = lng_df['5d_future_close'].pct_change(5)

lng_df['5d_close_pct'] = lng_df['Adj_Close'].pct_change(5)

# Calculate the correlation matrix between the 5d close pecentage changes (current and future)

corr = lng_df[['5d_close_pct', '5d_close_future_pct']].corr()

print(corr)

# Scatter the current 5-day percent change vs the future 5-day percent change

plt.scatter(lng_df['5d_close_pct'], lng_df['5d_close_future_pct'])

plt.show()

Another common technical indicator is the relative strength index (RSI). This is defined by:

RSI=100−1001+RSRSI=100−1001+RS

RS=average gain over n periods average loss over n periods

feature_names = ['5d_close_pct'] # a list of the feature names for later

# Create moving averages and rsi for timeperiods of 14, 30, 50, and 200

for n in [14, 30, 50, 200]:

# Create the moving average indicator and divide by Adj_Close

lng_df['ma' + str(n)] = talib.SMA(lng_df['Adj_Close'].values,

timeperiod=n) / lng_df['Adj_Close']

# Create the RSI indicator

lng_df['rsi' + str(n)] = talib.RSI(lng_df['Adj_Close'].values, timeperiod=n)

# Add rsi and moving average to the feature name list

feature_names = feature_names + ['ma' + str(n), 'rsi' + str(n)]

print(feature_names)

#### Modelling

Neural nets can capture complex interactions between variables but are difficult to set up and understand. Recently, they have been beating human experts in many fields, including image recognition and gaming (check out AlphaGo) — so they have great potential to perform well.

To build our nets we’ll use the `keras`

library. This is a high-level API that allows us to quickly make neural nets, yet still, exercise a lot of control over the design. The first thing we’ll do is create almost the simplest net possible — a 3-layer net that takes our inputs and predicts a single value. Much like the `sklearn`

models, `keras`

models have a `.fit()`

the method that takes arguments of `(features, targets)`

.

from keras.models import Sequential

from keras.layers import Dense

# Create the model

model_1 = Sequential()

model_1.add(Dense(100, input_dim=scaled_train_features.shape[1], activation='relu'))

model_1.add(Dense(20, activation='relu'))

model_1.add(Dense(1, activation='linear'))

# Fit the model

model_1.compile(optimizer='adam', loss=sign_penalty)

history = model_1.fit(scaled_train_features, train_targets, epochs=25)

# Create loss function

def sign_penalty(y_true, y_pred):

penalty = 100.

loss = tf.where(tf.less(y_true * y_pred, 0), \

penalty * tf.square(y_true - y_pred), \

tf.square(y_true - y_pred))

return tf.reduce_mean(loss, axis=-1)

# Plot the losses from the fit

plt.plot(history.history['loss'])

# Use the last loss as the title

plt.title('loss:' + str(round(history.history['loss'][-1], 6)))

plt.show()

from sklearn.metrics import r2_score

# Calculate R^2 score

train_preds = model_1.predict(scaled_train_features)

test_preds = model_1.predict(scaled_test_features)

print(r2_score(train_targets, train_preds))

print(r2_score(test_targets, test_preds))

# Plot predictions vs actual

plt.scatter(train_preds, train_targets, label='train')

plt.scatter(test_preds,test_targets,label='test')

plt.legend()

plt.show()

#### Modern Portfolio Theory

Modern portfolio theory (MPT) is a theory on how risk-averse investors can construct portfolios to optimize or maximize expected return based on a given level of market risk, emphasizing that risk is an inherent part of higher reward. According to the theory, it’s possible to construct an “efficient frontier” of optimal portfolios offering the maximum possible expected return for a given level of risk.

Our first step towards calculating modern portfolio theory (MPT) portfolios is to get daily and monthly returns. Eventually, we’re going to get the best portfolios of each month based on the Sharpe ratio.

The Sharpe ratio was developed by Nobel laureate William F. Sharpe and is used to help investors understand the return of an investment compared to its risk. The ratio is the average return earned more than the risk-free rate per unit of volatility or total risk. Subtracting the risk-free rate from the mean return allows an investor to better isolate the profits associated with risk-taking activities. Generally, the greater the value of the Sharpe ratio, the more attractive the risk-adjusted return.

The easiest way to do this is to put all our stock prices into one DataFrame, then to resample them to the daily and monthly time frames. We need daily price changes to calculate volatility, which we will use as our measure of risk.

full_df = pd.concat(lng_df,spy_df,smlv_df, axis=1).dropna()

# Resample the full dataframe to monthly timeframe

monthly_df = full_df.resample('BMS').fir

st()

# Calculate daily returns of stocks

returns_daily = full_df.pct_change()

# Calculate monthly returns of the stocks

returns_monthly = monthly_df.pct_change().dropna()

print(returns_monthly.tail())

# Daily covariance of stocks (for each monthly period)

covariances = {}

rtd_idx = returns_daily.index

for i in returns_monthly.index:

# Mask daily returns for each month and year, and calculate covariance

mask = (rtd_idx.month == i.month) & (rtd_idx.year == i.year)

# Use the mask to get daily returns for the current month and year of monthy returns index

covariances[i] = returns_daily[mask].cov()

print(covariances[i])

portfolio_returns, portfolio_volatility, portfolio_weights = {}, {}, {}

# Get portfolio performances at each month

for date in sorted(covariances.keys()):

cov = covariances[date]

for portfolio in range(10):

weights = np.random.random(3)

weights /= np.sum(weights) # /= divides weights by their sum to normalize

returns = np.dot(weights, returns_monthly.loc[date])

volatility = np.sqrt(np.dot(weights.T, np.dot(cov, weights)))

portfolio_returns.setdefault(date, []).append(returns)

portfolio_volatility.setdefault(date, []).append(volatility)

portfolio_weights.setdefault(date, []).append(weights)

print(portfolio_weights[date][0])

# Get latest date of available data

date = sorted(covariances.keys())[-1]

# Plot efficient frontier

# warning: this can take at least 10s for the plot to execute...

plt.scatter(x=portfolio_volatility[date], y=portfolio_returns[date], alpha=.1)

plt.xlabel('Volatility')

plt.ylabel('Returns')

plt.show()

### Get best Sharpe ratios

We need to find the “ideal” portfolios for each date so we can use them as targets for machine learning. We’ll loop through each date in, then loop through the portfolios we generated with `portfolio_returns[date]`

. We’ll then calculate the Sharpe ratio, which is the return divided by volatility (assuming a no-risk return of 0).

We use `enumerate()`

to loop through the returns for the current date (`portfolio_returns[date]`

) and keep track of the index with `i`

. Then we use the current date and current index to get the volatility of each portfolio with `portfolio_volatility[date][i]`

. Finally, we get the index of the best Sharpe ratio for each date using `np.argmax()`

. We’ll use this index to get the ideal portfolio weights soon.

# Empty dictionaries for sharpe ratios and best sharpe indexes by date

sharpe_ratio, max_sharpe_idxs = {}, {}

# Loop through dates and get sharpe ratio for each portfolio

for date in portfolio_returns.keys():

for i, ret in enumerate(portfolio_returns[date]):

# Divide returns by the volatility for the date and index, i

sharpe_ratio.setdefault(date, []).append(ret / portfolio_volatility[date][i])

# Get the index of the best sharpe ratio for each date

max_sharpe_idxs[date] = np.argmax(sharpe_ratio[date])

# Calculate exponentially-weighted moving average of daily returns

ewma_daily = returns_daily.ewm(span=30).mean()

# Resample daily returns to first business day of the month with the first day for that month

ewma_monthly = ewma_daily.resample('BMS').first()

# Shift ewma for the month by 1 month forward so we can use it as a feature for future predictions

ewma_monthly = ewma_monthly.shift(1).dropna()

print(ewma_monthly.iloc[-1])

### Make features and targets

To use machine learning to pick the best portfolio, we need to generate features and targets. Our features were just created in the last exercise — the exponentially weighted moving averages of prices. Our targets will be the best portfolios we found from the highest Sharpe ratio.

targets, features = [], []

# Create features from price history and targets as ideal portfolio

for date, ewma in ewma_monthly.iterrows():

# Get the index of the best sharpe ratio

best_idx = max_sharpe_idxs[date]

targets.append(portfolio_weights[date][best_idx])

features.append(ewma) # add ewma to features

targets = np.array(targets)

features = np.array(features)

print(targets[-5:])

# Get most recent (current) returns and volatility

date = sorted(covariances.keys())[-1]

cur_returns = portfolio_returns[date]

cur_volatility = portfolio_volatility[date]

# Plot efficient frontier with sharpe as point

plt.scatter(x=cur_volatility, y=cur_returns, alpha=0.1, color='blue')

best_idx = max_sharpe_idxs[date]

# Place an orange "X" on the point with the best Sharpe ratio

plt.scatter(x=cur_volatility[best_idx], y=cur_returns[best_idx], marker='x', color='orange')

plt.xlabel('Volatility')

plt.ylabel('Returns')

### Make predictions with a random forest

To fit a machine learning model to predict ideal portfolios, we need to create train and test sets for evaluating performance. We will do this as we did in previous chapters, where we take our `features`

and `targets`

arrays, and split them based on a `train_size`

we set. Often the train size maybe around 70-90% of our data.

# Make train and test features

train_size = int(0.85 * int(features.shape[0]))

train_features = features[:train_size]

test_features = features[train_size:]

train_targets = targets[:train_size]

test_targets = targets[train_size:]

# Fit the model and check scores on train and test

rfr = RandomForestRegressor(n_estimators=300, random_state=42)

rfr.fit(train_features, train_targets)

print(rfr.score(train_features, train_targets))

print(rfr.score(test_features, test_targets))

# Get predictions from model on train and test

train_predictions = rfr.predict(train_features)

test_predictions = rfr.predict(test_features)

# Calculate and plot returns from our RF predictions and the SPY returns

test_returns = np.sum(returns_monthly.iloc[train_size:] * test_predictions, axis=1)

plt.plot(test_returns, label='algo')

plt.plot(returns_monthly['SPY'].iloc[train_size:], label='SPY')

plt.legend()

plt.show()

### Evaluate returns

Let’s now see how our portfolio selection would perform as compared with just investing in the SPY. We’ll do this to see if our predictions are promising, despite the low R2 value.

We will set a starting value for our investment of Rs.1000, then loop through the returns from our predictions as well as from SPY. We’ll use the monthly returns from our portfolio selection and SPY and apply them to our starting cash balance. From this, we will get a month-by-month picture of how our investment is doing, and we can see how our predictions did overall vs the SPY. Next, we can plot our portfolio from our predictions and compare it to SPY.

# Calculate the effect of our portfolio selection on a hypothetical $1k investment

cash = 1000

algo_cash, spy_cash = [cash], [cash] # set equal starting cash amounts

for r in test_returns:

cash *= 1 + r

algo_cash.append(cash)

# Calculate performance for SPY

cash = 1000 # reset cash amount

for r in returns_monthly['SPY'].iloc[train_size:]:

cash *=(1+r)

spy_cash.append(cash)

# Plot the algo_cash and spy_cash to compare overall returns

plt.plot(algo_cash, label='algo')

plt.plot(spy_cash, label='SPY')

plt.legend() # show the legend

plt.show()

The algorithm doesn’t take into account event-based changes like an election or a sudden increase in petrol prices. Another dataset which we can look into is the movement of shipment, the number of vehicles sold. We need to consider various macro factors to improve our model.