Mathematical Foundations of Machine Learning : Linear Algebra

Rahul
4 min readAug 22, 2022

--

Introduction to Linear Algebra

Linear Algebra is the first topic in this series, and it is necessary as linear algebra lies at the heart of ML and is the main component of deep learning. It develops an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces and enables machines to recognize patterns and make predictions.

“Solving for unknowns within the system of linear equations”

Consider the following example:

  • The sheriff has a 180 km/h car speed
  • The robber has a 150 km/h car speed and a five-minute head start
  • how long does it take the sheriff to catch the robber?
  • what distance will they have traveled at that point?

Keeping in mind the above problem statement, let’s begin with the Data Structures for Algebra

What Linear Algebra is

Our dependencies for this plot that we are going to create are NumPy the number library and matplotlib for creating plots,

we’re going to start by simulating the time(t), starting at 0 mins to 40 mins and we can have as my points as we like between our start point and our end point.

so we’ve created the time variable ‘t’ which consists of 1000 points ranging from 0 to 40. Now we can pass the time value into the functions of d_r=2.5t i.e. distance traveled by the robber and d_s=3(t-5) i.e. the distance traveled by the sheriff.

Now for plotting it we can start off by simply creating a matplotlib plot between time(mins) and distance(km)

so just by looking at the above plot, it’s clear that the sheriff has to cover 75 kilometers which’ll take him 30 minutes to catch the robber at his current speed.

Alternatively, the problem can be solved algebraically, so let’s do that too to check our result:

since, equation 1: d=2.5t, and equation 2: d=3(t-5)

now both of the equations are equal to ‘d’ so:

2.5t=3(t-5)

2.5t=3t-15

0.5t=-15

t=30min

the ‘t’ is giving us 30 min, now we know the time we can substitute the ‘t’ value from any of the above two equations to calculate the distance,

d=2.5(30)

d=75kms

Now there would have been no solution if the sheriff’s car was the same speed as the robber’s, and we can have infinite solutions if the sheriff and robber had the same starting time.

In linear algebra, there are only three options: one, no, or infinite solutions

It is impossible for lines to cross multiple times.

In a given system of equations:

  • there could be many equations
  • there could be many unknowns in each equation

In the example that I showed above, there were two equations and two unknowns, however, let’s consider another example where we’re building a regression model where we’re trying to predict the house price, so for a given house we have a number of features or variables that we’re collecting to predict the house, some of the features might be distance to school, number of bedrooms and so on. We also have to have a y-intercept that’ll allow us across all the house prices to have an average house price

y = a + bx1 + cx2 + …… + m.xn

without a y-intercept it becomes difficult to fit a good model

Here we have many unknowns and we can have unlimited rows where every time we get a house price and features associated with that house price that’s another row for our system of equations.

For any house ‘i’ in the dataset, yi = price, and xi,1 to xi,m are its features. We can solve for parameters a,b,c to m.

It’s typical in machine learning problems to have thousands of rows representing different houses and maybe up to a dozen features, and they’ll be even much more in deep learning models.

--

--

Rahul
Rahul

Written by Rahul

Learning the Mathematics that lies behind Data Science

No responses yet