Let’s make one thing clear. There are *lots* of programming languages, tools and frameworks out there in the world of web development. In most cases, there’s not a single best way to solve a problem. However, as a beginner, you shouldn’t attempt to learn every new thing for the sole sake of learning, or just because some people are hyped about it. On the contrary, you should aim to acquire the bare minimum skill set so that you can start building stuff as soon as possible.

For instance, you’re going to have to learn JavaScript in order to write browser (front-end) code no matter what. And guess what, there’s something called Node.js which lets you write server-side code using JavaScript as well. Now I’m not arguing here whether Node.js is the best option for developing your back-end or not. However, it’s certainly convenient to be able to build an entire web application by having to learn just one programming language. A deal too good to pass up, don’t you think?

You can (and should) always learn about other technology trends once you master the basics and have a solid understanding of how the web works.

I’m going to divide the steps in this roadmap into several chapters which are sorted below from basic to advanced:

I’m also going to label each item in a category as *fundamental*, *optional* or *reference*. I strongly advise against advancing to a category before completing all the *fundamental* steps in a previous category. You can postpone *optional* steps if you want, or skip them altogether if you’re feeling too lazy. *References* are not meant to be gone through from start to end. You’ll find them useful in the future when you forget some detail and want to quickly look it up, or when you need further information about a specific topic in the corresponding category.

By the way, I didn’t mention this above but if you don’t know how to use Git, stop whatever you’re doing and learn it ASAP. Don’t worry, it’s not going to take long. Actually this 30 min YouTube video is more than enough to get you started. Just make sure that you know how to use the following 7 simple commands for starters:

Lastly I just want to say that if you haven’t used Visual Studio Code yet, you should definitely give it a try. I think it’s an amazing code editor, especially for JavaScript.

Alright let’s begin…

Learning HTML & CSS is definitely where you want to start as a beginner. Once you have a solid knowledge, these two will let you build structured, elegant and responsive websites with static content such as a blog or a portfolio page. You’ll also use this knowledge in the following chapters where you’ll deal with browser-side JavaScript code or a front-end framework such as React.

**HTML Crash Course:**HTML for absolute beginners.**CSS Crash Course:**CSS for absolute beginners.**Responsive Design Tutorial:**Responsiveness isn’t just a nice feature anymore in modern web design, it’s the default.**CSS Grid Tutorial:**CSS Grid lets you easily arrange HTML elements in two dimensions on a web page.

**CSS Flexbox Tutorial:**This is a useful yet optional item because it doesn’t really provide any extra functionality if you already know CSS Grid.**SASS Basics:**SASS makes it simple to deal with CSS in complex websites. You’ll most likely find it useful in the future, but it’s not mandatory at this point.**Social Network Theme With SASS:**This video clearly explains how to design a realistic website from scratch using SASS.

**HTML**and**CSS:**Don’t try to memorize everything. You can always look it up.**Netlify**and**Heroku:**Come back here when you want to deploy your app.

At this point, I have to also mention some CSS frameworks such as Bootstrap and Materialize. Most people will advise you to learn one because these can really speed up your development process. IMHO this approach isn’t going to help you fully grasp the basics of web design. I think you should stay away from them until you’re comfortable with styling websites with your bare hands.

Being the only programming language that can possibly be used on the front-end side, JavaScript is an essential part of web development. As I mentioned earlier, you’re going to be able to develop back-end code with it too, once you learn Node.js as well.

**The Modern JavaScript Tutorial:**This is by far the most important item in this list, maybe even in this whole roadmap. It’s going to take some time, but you should absolutely complete at least the first two parts. Since JavaScript is going to be your primary tool as a web developer, you won’t regret investing your time on learning it properly. The good news is, learning JavaScript is fun and you can build cool stuff with it as you learn.**What the heck is the event loop anyway?:**Knowing how JavaScript works behind the scenes will give you a deeper understanding of the code you write.**async & await in JavaScript:**Promises and`async/await`

might get confusing if you’re not familiar with asynchronous programming. Check this video out after reading the corresponding section in The Modern JavaScript Tutorial in order to see some real action.

**Practical JavaScript:**I’m listing this one here because The Modern JavaScript Tutorial covers everything in it and then some. You can consider it as an optional preliminary step for this chapter.**Modular Javascript:**This one is a bit outdated in terms of JavaScript language features, but it’s still a good tutorial for learning basic design patterns.**RegEx Tutorial 1**and/or**RegEx Tutorial 2:**While not strictly a JavaScript thing, knowing a little bit of RegEx can make a big difference in your efficiency as a developer.

On a side note, it’s not necessary to learn jQuery unless you have to read and/or modify existing code that uses it. Vanilla JavaScript is equally powerful today except for a few use cases.

**The Modern JavaScript Tutorial:**Let’s face it, you won’t remember everything in this tutorial after reading it once.**Chrome DevTools:**This is an excellent reference if you use Chrome developer tools. Besides JavaScript, you can find lots of useful information in it regarding HTML & CSS too.

Node.js is an asynchronous event-driven JavaScript runtime built on Chrome’s V8 JavaScript engine, which is designed to build scalable web applications. You’ll have acquired the minimum skill set to build an entire dynamic web application after completing this chapter.

**A Beginner’s Guide to NPM:**This is a quick guide on how to install and get started with Node.js and the Node Package Manager (NPM).**Node.js Crash Course:**A good introductory video to learn Node.js basics.**Express Crash Course:**In this video, you’re going to learn about a very popular Node.js framework called Express, which is widely used for developing web applications.**Building a RESTful API with Node.js:**An amazing YouTube playlist which walks you through the process of building a practical REST API using Node.js and Express. You’ll also learn how to interact with a NoSQL database called MongoDB along the way.

**Node.js Presentation:**A presentation of Node.js by its original developer Ryan Dahl. Check it out if you want to learn what kinds of problems Node.js has been designed to solve and how it handles I/O compared to some other back-end frameworks.

**HTTP Reference:**Lots of useful information including HTTP request methods, response codes, headers and so on.

You should’ve learned a little bit about MongoDB in the last chapter if you completed all the fundamentals. Now it’s time to learn some SQL. I think MySQL is a good place to start learning about relational databases.

One thing I would recommend you is, don’t just go with the flow and automatically opt for NoSQL just because everyone is using MongoDB with Node.js these days. This decision requires rational thinking and comparing the pros and cons of each option. There’s a good chance that a relational database is going to fit your needs better for your first real application as a beginner.

**MySQL Crash Course:**A very good intro to SQL.**Using MySQL with Node.js:**Learn how to make use of SQL queries as part of a REST API built with Node.

**SQL Quick Reference**and**SQL Keywords Reference:**Because ain’t nobody got time for memorizing all that.

Lastly, Object Relational Mapping (ORM), e.g. Sequelize can be tempting to use at first but I say stay away from them before fully grasping SQL. Have a look at this if you want to know why.

So far you’ve been building your front-end using plain JavaScript. And that’s perfectly fine if you want to become a back-end developer. In that case, please proceed to the next chapter in an orderly fashion.

If you’re still here, we’ve got a little bit of additional work to do. Almost all front-end or full-stack development job openings these days require some knowledge about a front-end framework, particularly either React, Angular or Vue.

At the end of the day, choosing between them really boils down to personal preference unless you want to apply to a specific job. I opted for React after some research, and therefore that’s what I recommend you to learn as well. You might also want to check out some statistics if you’re feeling skeptical about it.

**Intro to React:**Learn React by building your own tic-tac-toe game!**Create React App:**Learn how to quickly initialize a React project in a Node.js environment.**React Crash Course:**Come on now, finish off all the React on your plate.

**React & Redux Tutorial:**This is optional because it might feel a bit repetitive after completing the fundamentals. You still might want to watch this series if you want to learn about Redux.**React Hooks:**React Hooks is a relatively new and cool feature, although I don’t think it’s something you absolutely have to learn at this point.**Adding SASS to React:**This will only make sense if you haven’t skipped SASS in the first chapter.

Congratulations! You’re now officially a full-stack developer. Go ahead and pat yourself on the back. You must feel like how Neo felt just after he learned Kung Fu. But you still have to complete this chapter if you don’t want to get your ass kicked by Morpheus.

To some of you, Docker might not appear as an absolutely necessary thing to learn. While that’s *not incorrect,* it’s an investment you’re not going to regret.

**Docker Overview:**Just so you know what you’re getting yourself into.**Get Started Part 1**and**Part 2:**Learn how to set up your Docker environment, build an image and run it as a container.**Dockerfile Tutorial by Example:**Learn why and how to create a Dockerfile.**Getting Started with Docker Compose:**Because single-container apps are for losers.**Docker Compose Tutorial:**Keep calm and Docker Compose.

**Data Management in Docker:**Learn how to persist data in host machine even after the container stops.**Environment Variables in Docker Compose:**Learn how to deal with environment variables in Docker Compose.**Networking in Docker Compose:**Learn how to specify custom networks for your services.

**Dockerfile Reference:**In case you forget how to create a Dockerfile.**Dockerfile Best Practices:**In case you forget how you*should*create a Dockerfile.**Docker Compose File Reference (v3):**In case you forget how to create a Docker Compose file.**Docker Compose Command Line Reference:**Usage information for the`docker-compose`

subcommands.

That’s it, I hope you enjoyed following this roadmap. I’m sure there are many more useful resources out there that I’m not aware of. Feel free to leave a comment below and share anything that may be helpful to other readers. Also don’t forget to subscribe if you want to get updates on my future articles.

]]>Here’s what you need in order to follow along:

- You have to create a
*monthly budget*template from the spreadsheet template gallery. Check out my sample sheet to see what it looks like. - You also need a Trello account in order to create cards and webhooks.

The project is open source on Github. Feel free to contribute by providing pull requests, issues, ideas or suggestions.

Also make sure to subscribe if you want to get updates on my future articles.

]]>If you don’t have one already, you can go ahead and create a *monthly budget* template from the spreadsheet template gallery. You can also check out the sample budget sheet that I’ve created to see what it looks like. It’s made up of two pages (sheets):

**Transactions**page lets you insert expense & income entries.**Summary**page lets you keep track of your budget.

The main purpose of this tool is to let you insert entries in Transactions page from CLI, saving you the trouble of opening the actual spreadsheet in a browser.

A spreadsheet URL looks like this:

1 | https://docs.google.com/spreadsheets/d/<SPREADSHEET_ID>/edit#gid=<SHEET_ID> |

Take note of this URL or just the `SPREADSHEET_ID`

after you’ve created a spreadsheet because you’re going to need it later. For example, the ID of my sample budget spreadsheet is `186YX-RyEyz5OcTsoI4QwIyJJMuZVc5MNTYIrF62ztiI`

.

Then you have to simply follow the first 2 steps of this guide which involve

- creating a console project to enable the Google Sheets API:
- downloading a
`credentials.json`

file for authorization: - installing the Google Client Library:

1 | pip3 install --upgrade google-api-python-client oauth2client |

The next step is to generate an authorization token to access your spreadsheets. The script below will open up a browser and request permission from your Google account to generate a `token.json`

file from `credentials.json`

:

1 | from oauth2client import file, client, tools |

This token needs to be created only once, so it’s a good idea to do it as part of the installation procedure of the app. Let’s create an installation script and run `createtoken.py`

as the first step:

1 | python3 createtoken.py |

Before inserting a transaction entry, our app needs to read `token.json`

and authorize. So let’s create the main script and add this authorization step:

1 | #!/usr/bin/env python3 |

Our app will have 4 commands:

- Select Spreadsheet by ID
- Select Spreadsheet by URL
- Append Expense
- Append Income

And here’s how each command is going to be executed by the user:

1 | # select spreadsheet by ID |

`Date`

, `Amount`

, `Description`

and `Category`

parameters are the values to be inserted in Transactions page at

- columns
`B`

to`E`

for expense entry, and - columns
`G`

to`J`

for income entry.

In order to recognize these commands and parameters, some argument parsing has to be done in `budget.py`

:

1 | #!/usr/bin/env python3 |

Notice that `SPREADSHEET_ID`

is written to a file named `spreadsheet.id`

whenever one of the `id`

or `url`

commands is executed. And this file is read while processing the `expense`

and `income`

commands in order to access the selected spreadsheet.

First of all, row and column indices of the last entry has to be determined in order to append a new one. To do that, we read rows 5 to 40 from column `C`

or `H`

(depending on the command) and check the number of existing entries. Here 5 is the minimum row index that a transaction can be inserted, and 40 is the index of the last row in the Transactions page. (You should set this to the total number of rows in your sheet.) Then we store the row index to insert a new entry for the current transaction type in a variable called `rowIdx`

:

1 | rangeName = 'Transactions!C5:C40' if command == 'expense' else 'Transactions!H5:H40' |

Finally we update the corresponding cells with date, amount, description & category parameters stored in the `entry`

variable:

1 | startCol = "B" if command == 'expense' else "G" |

Now we have `budget.py`

ready in our project folder. However, it has to be executable from any directory via CLI. Therefore we need to make sure that it’s in a directory referenced by the `PATH`

environment variable, such as `/usr/bin/`

.

On the other hand, `token.json`

and `spreadsheet.id`

files do not have to be in `PATH`

. They should be located somewhere owned by the user such as `~/.config/budget-cli/`

so that they can be accessed without `sudo`

permission.

We’re going to copy `budget.py`

and `token.json`

in `install.sh`

, and `spreadsheet.id`

will be created inside `~/.config/budget-cli/`

automatically when the `budget id`

or `budget url`

command is executed for the first time:

1 | # create token from credentials |

Notice that I’m renaming `budget.py`

as `budget`

while copying it because I want to use the app like

1 | budget <command> <params> |

as opposed to

1 | budget.py <command> <params> |

Note that this wouldn’t be possible without the first line in `budget.py`

, which is:

1 | #!/usr/bin/env python3 |

By the way, let’s not forget to create an uninstallation script to clean up:

1 | sudo rm -f /usr/bin/budget |

Last but not least, we have to slightly modify the file I/O lines in `budget.py`

taking into account the global file locations. You can find the polished and up-to-date version of it in the Github repository. The latest version also has new cool features like logging the transaction history:

That’s it! Just run `./install.sh`

from the project folder and you should be able to use the app. I hope you enjoyed this little walkthrough. If you’re still here, you should subscribe to get updates on my future articles. Also feel free to leave a comment below and let me know what you think!

We’ll be using the following Python 3 libraries to make our job easier:

- Beautiful Soup 4,
- Requests to access the HTML content,
- LXML as the HTML parser,
- and Multiprocessing to speed things up.

`multiprocessing`

comes with Python 3 by default as far as I know, but you may need to install the others manually using a package manager such as PIP:

1 | pip3 install beautifulsoup4 |

Before we begin, I want to point out that we’ll be scraping the old Reddit, not the new one. That’s because the new site loads more posts automatically when you scroll down:

The problem is that it’s not possible to simulate this scroll-down action using a simple tool like Requests. We’d need to use something like Selenium for that kind of thing. As a workaround, we’re going to use the old site which is easier to crawl using the links located on the navigation panel:

Let’s start by making our program accept some arguments that will allow us to customize our search. Here are some useful parameters:

- keyword to search
- subreddit restriction (optional)
- date restriction (optional)

Let’s say we want to search for the keyword “web scraping”. In this case, the URL we want to go is:`https://old.reddit.com/search?q=%22web+scraping%22`

If we want to limit our search with a particular subreddit such as “r/Python”, then our URL will become:`https://old.reddit.com/r/Python/search?q=%22web+scraping%22&restrict_sr=on`

Finally, the URL is going to look like one of the following if we want to search for the posts submitted in the last year:`https://old.reddit.com/search?q=%22web+scraping%22&t=year`

`https://old.reddit.com/r/Python/search?q=%22web+scraping%22&restrict_sr=on&t=year`

The following is the initial version of our program that builds and prints the appropriate URL according to the program arguments:

1 | import argparse |

Now we can run our program as follows:

1 | python3 scraper.py --keyword="dave weckl" --subreddit="drums" --date="month" |

If you take a look at the page source, you’ll notice that all the post results are stored in `<div>`

s with a `search-result-link`

class. Also note that unless it’s the last page, there will be an `<a>`

tag with a `<rel>`

attribute equal to `nofollow next`

. That’s how we’ll know when to stop advancing to the next page.

Therefore using the URL we built from the program arguments, we can collect the post sections from all pages with a simple function that we’ll call `getSearchResults`

. Here’s the second version of our program:

1 | from bs4 import BeautifulSoup |

Now that we have a bunch of posts in the form of a `bs4.element.Tag`

array, we can extract useful information by parsing each element of this array further. We can extract information such as:

Information | Source |
---|---|

date | `datetime` attribute of the `<time>` tag |

title | `<a>` tag with `search-title` class |

score | `<span>` tag with `search-score` class |

author | `<a>` tag with `author` class |

subreddit | `<a>` tag with `search-subreddit-link` class |

URL | `href` attribute of the `<a>` tag with `search-comments` class |

# of comments | text field of the `<a>` tag with `search-comments` class |

We’re also going to create a container object to store the extracted data and save it as a JSON file (`product.json`

). We’ll load this file in the beginning of our program which may contain data from other keyword searches. When we’re done scraping the current keyword, we’ll append the new content to the existing data. Here’s the third version of our program:

1 | from datetime import datetime |

Now we can search for different keywords by running our program multiple times. The extracted data will be appended to the `product.json`

file after each execution.

So far we’ve been able to scrape information from the post results easily, since this information is available in a given results page. But we might also want to scrape comment information which cannot be accessed from the results page. We must instead parse the comment page of each indiviadual post using the URL that we previously extract in our `parsePosts`

funciton.

If you take a close look at the HTML source of a comment page such as this one, you’ll see that the comments are located inside a `<div>`

with a `sitetable nestedlisting`

class. Each comment inside this `<div>`

is stored in another `<div>`

with a `data-type`

attribute equal to `comment`

. From there, we can obtain some useful information such as:

Information | Source |
---|---|

# of replies | `data-replies` attribute |

author | `<a>` tag with `author` class inside the `<p>` tag with `tagline` class |

date | `datetime` attribute in the `<time>` tag inside the `<p>` tag with `tagline` class |

comment ID | `name` attribute in the `<a>` tag inside the `<p>` tag with `parent` class |

parent ID | `<a>` tag with the `data-event-action` attribute equal to `parent` |

text | text field of the `<div>` tag with `md` class |

score | text field of the `<span>` tag with `score unvoted` class |

Let’s create a new function called `parseComments`

and call it from our `parsePosts`

function so that we can get the comment data along with the post data:

1 | def parseComments(commentsUrl): |

Our program is functionally complete at this point. However, it runs a little slowly because all the work is performed serially by a single process. We can improve the performance by handling the posts by multiple processes using the `Process`

and `Manager`

objects from the `multiprocessing`

library.

The first thing we need to do is to rename the `parsePosts`

function and make it handle only a single post. To do that, we’re simply going to remove the `for`

statement. We also need to change the function parameters a little bit. Instead of passing our original product object, we’ll pass a list object to append the results obtained by the current process.

1 | def parsePost(post, results): |

`results`

is actually a `multiprocessing.managers.ListProxy`

object that we can use to accumulate the output generated by all processes. We’ll later convert it to a regular list and save it in our product. Our main script will now look like as follows:

1 | if __name__ == '__main__': |

This simple technique alone will greatly speed-up the performance. For instance when I perform a search involving 163 posts in my machine, the serial version of the program takes 150 seconds to execute, corresponding to approximately 1 post per second. On the other hand, the parallel version only takes 15 seconds to execute (~10 posts per second) which is 10x faster.

You can check out the complete source code on Github. Also make sure to subscribe to get updates on my future articles and feel free to leave a comment below and let me know what you think!

]]>I started developing the first version of this application a few years ago while learning Java. After a while, I started to learn Eclipse RCP and thought it would be a good idea to build a GUI for my chess game on this platform. Then recently I took an AI course and learned about minimax and alpha-beta pruning algorithms. So I decided to rewrite the application and plug these algorithms in so that I could improve the AI, while recording the whole development process. It took a few months and most of my spare time, and finally it’s complete. I hope you like it!

You might like this tutorial if you

- are a beginner in Java and want to learn more about the language (or object-oriented programming in general)
- want to learn how to build a simple Eclipse RCP application
- are interested in watching the whole process of building a chess game from scratch
- want to see the minimax & alpha-beta pruning algorithms in action
- are
**NOT**looking for a detailed tutorial and discussion focused solely on AI or the alpha-beta pruning algorithm

There are a total of 30 videos in the playlist, which is organized as follows:

Video # | Topic |
---|---|

1 | Introduction |

2 to 11 | Coding the building blocks (board, square, piece etc.) |

12 to 16 | Evaluating the legal moves for different chess pieces |

17 to 23 | Building the GUI, UX and game mechanics |

27 to 30 | Implementing the AI (minimax, alpha-beta-pruning & iterative deepening) |

I apologize for the poor sound quality in the first several videos, especially the one after the intro. Unfortunately, I had to use my laptop’s built-in microphone because I didn’t have one. Thankfully the sound gets better after a while because I managed to get my hands on a decent microphone later on.

Feel free to check out the github repo if you are interested in contributing. I’m aware that there are lots of missing features such as:

- En passant
- Pawn promotion
- Castling
- Check & checkmate (I know…)
- Potential optimizations regarding the AI
- UX improvements

Make sure to subscribe if you want to get updates on my future articles. Also feel free to leave a comment below and let me know what you think!

]]>First of all, let me briefly explain the idea behind one-vs-all classification. Say we have a classification problem and there are $N$ distinct classes. In this case, we’ll have to train a multi-class classifier instead of a binary one.

One-vs-all classification is a method which involves training $N$ distinct binary classifiers, each designed for recognizing a particular class. Then those $N$ classifiers are collectively used for multi-class classification as demonstrated below:

We already know from the previous post how to train a binary classifier using logistic regression. So the only thing we have to do now really is to train $N$ binary classifiers instead of just one. And that’s pretty much it.

We’re going to use this one-vs-all approach to solve a multi-class classification problem from the machine learning course thought by Andrew Ng. The goal in this problem is to identify digits from 0 to 9 by looking at 20x20 pixel drawings.

Here the number of classes $N$ is equal to 10, which is the number of different digits. We’re going to treat each pixel as an individual feature, which adds up to 400 features per image. Here are some examples from our training sample of 5000 images:

The training data is stored in a file called digits.mat. The reason that it’s a ** .mat** file is because this problem is originally a Matlab assignment. No big deal, since it’s pretty easy to import a

`.mat`

`loadmat`

`scipy.io`

1 | import numpy as np |

Let me point out two things here:

We’re using the

function on the`squeeze`

array in order to explicitly make it one dimensional. We’re doing this because`y`

is stored as a 2D matrix in the`y`

file although it’s actually a 1D array.`.mat`

We’re replacing the label 10 with 0. This label actually stands for the digit 0 but it was converted to 10 because of array indexing issues in Matlab.

Remember the ** sigmoid**,

`cost`

`cost_gradient`

1 | def sigmoid(z): |

The final thing we have to do before starting to train our multi-class classifier is to add an initial column of ones to our feature matrix to take into account the intercept term:

1 | X = np.ones(shape=(x.shape[0], x.shape[1] + 1)) |

Now we’re ready to train our classifiers. Let’s create an array to store the model parameters $\theta$ for each classifier. Note that we need 10 sets of model parameters, each consisting of 401 parameters including the intercept term:

1 | classifiers = np.zeros(shape=(numLabels, numFeatures + 1)) |

Then we’re going to train 10 binary classifiers targeted for each digit inside a ** for** loop:

1 | for c in range(0, numLabels): |

Here we create a ** label** vector in each iteration. We set its values to 1 where the corresponding values in

`y`

`label`

`y`

We can evaluate the probability estimations of our optimized model for each class as follows:

1 | classProbabilities = sigmoid(X @ classifiers.transpose()) |

This will give us a matrix of 5000 rows and 10 columns, where the columns correspond to the estimated class (digit) probabilities for all 5000 images.

However, we may need the final predictions of the optimized classifier instead of numerical probability estimations. We can find out our model’s predictions by simply selecting the label with the highest probability in each row :

1 | predictions = classProbabilities.argmax(axis=1) |

Now we have our model’s predictions as a vector with 5000 elements labeled from 0 to 9.

Finally, we can compute our model’s training accuracy by computing the percentage of successful predictions:

1 | print("Training accuracy:", str(100 * np.mean(predictions == y)) + "%") |

`Training accuracy: 94.54%`

An accuracy of 94.5% isn’t bad at all considering we have 10 classes and a very large number of features. Still, we could do even better if we decided to use a nonlinear model such as a neural network.

If you’re still here, you should subscribe to get updates on my future articles. Also feel free to leave a comment below and let me know what you think!

]]>We’ll be looking at another assignment from the machine learning course taught by Andrew Ng. Our objective in this problem is to estimate an applicant’s probability of admission into a university based on his/her results on two exams. Our dataset contains some historical data from previous applicants, which we’ll use as a training sample. Let’s read the dataset file and take a look at the first few examples:

1 | import numpy as np |

`[[ 34.62365962 78.02469282 0. ] [ 30.28671077 43.89499752 0. ] [ 35.84740877 72.90219803 0. ] [ 60.18259939 86.3085521 1. ] [ 79.03273605 75.34437644 1. ]]`

The first two columns correspond to the exam scores and the third column indicates whether the applicant has been admitted to the university. We can visualize this data using a scatter plot:

1 | fig, ax = plt.subplots() |

Logistic regression is a linear model, which means that the **decision boundary** has to be a straight line. This can be achieved with a a simple hypothesis function in the following form:

$h_\theta(x) = g(\theta^Tx)$

where $g$ is the **sigmoid function** which is defined as:

$g(z) = \dfrac{1}{1 + e^{-z}}$

Here’s the Python version of the sigmoid function:

1 | def sigmoid(z): |

The numeric output of the hypothesis function $h_\theta(x)$ corresponds to the model’s confidence in labeling the input:

- If the output is $0.5$, both classes are equally probable as far as the classifier is concerned.
- If the output is $1$, the classifier is 100% confident about class 1.
- If the output is $0$, the classifier is 100% confident about class 0.

In other words, the classifier labels the input based on whether $\theta^Tx$ is positive or negative. Of course this is based on the assumption that the treshold is selected as $0.5$.

In the training stage, we’ll try to minimize the cost function below:

$J(\theta) = \dfrac{1}{m}\sum\limits_{i=1}^{m}[-y^{(i)}log(h_\theta(x^{(i)})) - (1 - y^{(i)})log(1 - h_\theta(x^{(i)}))]$

Notice that this cost function penalizes the hypothesis according to its probability estimation error, as demonstrated below:

We can use the following Python function to compute this cost in our script:

1 | def cost(theta, X, y): |

We’re going to use the ** fmin_cg** function from

`scipy.optimize`

`fmin_cg`

$\dfrac{\partial{J(\theta)}}{\partial{\theta_j}} = \dfrac{1}{m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j$

where $m$ is the number of training examples.

And the Python version is:

1 | def cost_gradient(theta, X, y): |

If you’re not familiar with ** fmin_cg**, it might be a good idea to check out my previous post and/or the official docs before proceeding to the next section.

The final thing we need to do before training our model is to add an additional first column to $x$, in order to account for the intercept term $\theta_0$:

1 | X = np.ones(shape=(x.shape[0], x.shape[1] + 1)) |

Finally we can train our logistic regression model:

1 | initial_theta = np.zeros(X.shape[1]) # set initial model parameters to zero |

`Optimization terminated successfully. Current function value: 0.203498 Iterations: 51 Function evaluations: 122 Gradient evaluations: 122`

Now let’s plot the decision boundary of our optimized model:

1 | x_axis = np.array([min(X[:, 1]) - 2, max(X[:, 1]) + 2]) |

Looks pretty good. Let’s also measure the training accuracy:

1 | predictions = np.zeros(len(y)) |

`Training Accuracy = 89.0%`

Not bad at all. We would probably need something little more flexible than logistic regression if we wanted to come up with a more accurate classifier.

Finally, here’s how we can predict the probability of admission for a student with arbitrary exam scores:

1 | probability = sigmoid(np.array([1, 45, 85]) @ theta) |

`For a student with scores 45 and 85, we predict an admission probability of 0.776195474168`

If you’re still here, you should subscribe to get updates on my future articles. Also feel free to leave a comment below and let me know what you think!

]]>In a nutshell, learning curves show how the training and validation errors change with respect to the number of training examples used while training a machine learning model.

If a model is balanced, both errors converge to small values as the training sample size increases.

If a model has

**high bias**, it ends up**underfitting**the data. As a result, both errors fail to decrease no matter how many examples there are in the training set.If a model has

**high variance**, it ends up**overfitting**the training data. In that case, increasing the training sample size decreases the training error but it fails to decrease the validation error.

The figure below demonstrates each of those cases:

After this incredibly brief introduction, let me introduce you to today’s problem where we’ll get to see learning curves in action. It’s another problem from Andrew Ng’s machine learning course, in which the objective is to predict the amount of water flowing out of a dam, given the change of water level in a reservoir.

The dataset file we’re about to read contains historical records on the change in water level and the amount of water flowing out of the dam. The reason that it’s a ** .mat** file is because this problem is originally a MATLAB assignment. Fortunately, it’s pretty easy to load

`.mat`

`loadmat`

1 | import numpy as np |

The dataset is divided into three samples:

- The
**training sample**consists ofand`x_train`

`y_train`

. - The
**validation sample**consists ofand`x_val`

`y_val`

. - The
**test sample**consists ofand`x_test`

`y_test`

.

Notice that we have to explicitly convert the target variables (** y_train**,

`y_val`

`y_test`

`.mat`

Let’s plot the training sample to see what it looks like:

1 | fig, ax = plt.subplots() |

Alright, it’s time to come up with a strategy. First of all, it’s clear that there’s a nonlinear relationship between $x$ and $y$. Normally we would rule out any linear model because of that. However, we are going to begin by training a linear regression model so that we can see how the learning curves of a model with high bias look like.

Then we’ll train a polynomial regression model which is going to be much more flexible than linear regression. This will let us see the learning curves of a model with high variance.

Finally, we’ll add **regularization** to the existing polynomial regression model and see how a balanced model’s learning curves look like.

I’ve already shown you in the previous post how to train a linear regression model using gradient descent. Before proceeding any further, I strongly encourage you to take a look at it if you don’t have at least a basic understanding of linear regression.

Here I’ll show you an easier way to train a linear regression model using an optimization function called ** fmin_cg** from

`scipy.optimize`

.** fmin_cg** needs a function that returns the cost and another one that returns the gradient of the cost for a given hypothesis. We have to pass those to

`fmin_cg`

- We can completely reuse the
function because it’s independent of the optimization method that we use.`cost`

- From the
function, we can borrow the part where the gradient of the cost function is evaluated.`gradient_descent`

So here’s (almost) all we need in order to train a linear regression model:

1 | def cost(theta, X, y): |

If you look at our ** cost** function, there we evaluate the cross product of the feature matrix $X$ and the vector of model parameters $\theta$. Remember, this is only possible if the matrix dimensions match. Therefore we also need a tiny utility function to insert an additional first column of all ones to a raw feature matrix such as

`x_train`

1 | def insert_ones(x): |

Now let’s train a linear regression model and plot the linear fit on top of the training sample:

1 | X_train = insert_ones(x_train) |

The above plot clearly shows that linear regression is not suitable for this task. Let’s also look at its learning curves and see if we can draw the same conclusion.

While plotting learning curves, we’re going to start with $2$ training examples and increase them one by one. In each iteration, we’ll train a model and evaluate the training error on the existing training sample, and the validation error on the whole validation sample:

1 | def learning_curves(X_train, y_train, X_val, y_val): |

In order to use this function, we have to resize ** x_val** just like we did

`x_train`

1 | X_val = insert_ones(x_val) |

As expected, we were unable to sufficiently decrease either the training or the validation error.

Now it’s time to introduce some nonlinearity with polynomial regression.

In order to train a polynomial regression model, the existing feature(s) have to be mapped to artificially generated polynomial features. Then the rest is pretty much the same drill.

In our case we only have a single feature $x_1$, the change in water level. Therefore we can simply compute the first several powers of $x_1$ to artificially obtain new polynomial features. Let’s create a simple function for this:

1 | def poly_features(x, degree): |

Now let’s generate new feature matrices for training, validation and test samples with 8 polynomial features in each:

1 | x_train_poly = poly_features(x_train, 8) |

Ok, we have our polynomial features but we also have a tiny little problem. If you take a closer look at one of the new matrices, you’ll see that the polynomial features are very imbalanced at the moment. For instance, let’s look at the first few rows of the ** x_train_poly** matrix:

1 | print(x_train_poly[:4, :]) |

`[[ -1.59367581e+01 2.53980260e+02 -4.04762197e+03 6.45059724e+04 -1.02801608e+06 1.63832436e+07 -2.61095791e+08 4.16102047e+09] [ -2.91529792e+01 8.49896197e+02 -2.47770062e+04 7.22323546e+05 -2.10578833e+07 6.13900035e+08 -1.78970150e+10 5.21751305e+11] [ 3.61895486e+01 1.30968343e+03 4.73968522e+04 1.71527069e+06 6.20748719e+07 2.24646160e+09 8.12984311e+10 2.94215353e+12] [ 3.74921873e+01 1.40566411e+03 5.27014222e+04 1.97589159e+06 7.40804977e+07 2.77743990e+09 1.04132297e+11 3.90414759e+12]]`

As the polynomial degree increases, the values in the corresponding columns exponentially grow to the point where they differ by orders of magnitude.

The thing is, the cost function will generally converge much more slowly when the features are imbalanced like this. So we need to make sure that our features are on a similar scale before we begin to train our model. We’re going to do this in two steps:

- Subtract the mean value of each column from itself and make the new mean $0$.
- Divide the values in each column by their standard deviation and make the new standard deviation $1$.

It’s important that we use the mean and standard deviation values from the training sample while normalizing the validation and test samples.

1 | train_means = x_train_poly.mean(axis=0) |

Finally we can train our polynomial regression model by using our ** train_linear_regression** function and plot the polynomial fit. Note that when the polynomial features are simply treated as independent features, training a polynomial regression model is no different than training a multivariate linear regression model:

1 | def plot_fit(min_x, max_x, means, stdevs, theta, degree): |

What do you think, seems pretty accurate right? Let’s take a look at the learning curves.

1 | plt.title("Learning Curves for Polynomial Regression", fontsize=16) |

Now that’s overfitting written all over it. Even though the training error is very low, the validation error miserably fails to converge.

It appears that we need something in between in terms of flexibility. Although we can’t make linear regression more flexible, we can decrease the flexibility of polynomial regression using regularization.

Regularization lets us come up with simpler hypothesis functions that are less prone to overfitting. This is achieved by penalizing large $\theta$ values during the training stage.

Here’s the regularized cost function:

$J(\theta) = \dfrac{1}{2m}\Big(\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2\Big) +

\dfrac{\lambda}{2m}\Big(\sum\limits_{j=1}^{n}\theta_j^2\Big)$

And its gradient becomes:

$\dfrac{\partial J(\theta)}{\partial \theta_0} =

\dfrac{1}{m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}

\quad \qquad \qquad \qquad for \,\, j = 0$

$\dfrac{\partial J(\theta)}{\partial \theta_j} =

\Big(\dfrac{1}{m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}\Big) + \dfrac{\lambda}{m}\theta_j

\,\,\quad \qquad for \,\, j \geq 0$

Notice that we are not penalizing the intercept term $\theta_0.$ That’s because it doesn’t have anything to do with the model’s flexibility.

Of course we’ll need to reflect these changes to the corresponding Python implementations by introducing a regularization parameter ** lamb**:

1 | def cost(theta, X, y, lamb=0): |

We also have to slightly modify ** train_linear_regression** and

`learning_curves`

1 | def train_linear_regression(X, y, lamb=0): |

Alright we’re now ready to train a regularized polynomial regression model. Let’s set $\lambda = 1$ and plot our polynomial hypothesis on top of the training sample:

1 | theta = train_linear_regression(X_train_poly, y_train, 1) |

It is clear that this hypothesis is much less flexible than the unregularized one. Let’s plot the learning curves and observe its bias-variance tradeoff:

1 | plt.title("Learning Curves for Regularized Polynomial Regression", fontsize=16) |

This is apparently the best model we’ve come up so far.

Although setting $\lambda = 1$ has significantly improved the unregularized model, we can do even better by optimizing $\lambda$ as well. Here’s how we’re going to do it:

- Select a set of $\lambda$ values to try out.
- Train a model for each $\lambda$ in the set.
- Find the $\lambda$ value that yields the minimum validation error.

1 | lambda_values = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10]; |

Looks like we’ve achieved the lowest validation error where $\lambda = 3$.

It’s good practice to evaluate an optimized model’s accuracy on a separate test sample other than the training and validation samples. So let’s train our models once again and compare test errors:

1 | X_test = insert_ones(x_test) |

`Test Error = 32.5057492449 | Linear RegressionTest Error = 17.2624144407 | Polynomial RegressionTest Error = 3.85988782246 | Regularized Polynomial Regression (at lambda = 3)`

If you’re still here, you should subscribe to get updates on my future articles. Also feel free to leave a comment below and let me know what you think!

]]>We’ll look at a regression problem from a very popular machine learning course taught by Andrew Ng. Our objective in this problem will be to train a model that accurately predicts the profits of a food truck.

The first column in our dataset file contains city populations and the second column contains food truck profits in each city, both in $10,000$s. Here are the first few training examples:

1 | 6.1101,17.592 |

We’re going to use this dataset as a training sample to build our model. Let’s begin by loading it:

1 | import numpy as np |

Both $x$ and $y$ are one dimensional arrays, because we have one **feature** (population) and one **target variable** (profit) in this problem. Therefore we can conveniently visualize our dataset using a scatter plot:

1 | fig, ax = plt.subplots() |

Now we need to come up with a **straight line** which accurately represents the relationship between population and profit. This is called the **hypothesis function** and it’s formulated as:

$h_\theta(x) = \theta^Tx = \theta_0 + \theta_1x_1 + \theta_2x_2 + … + \theta_nx_n$

where $x$ corresponds to the feature matrix and $\theta$ corresponds to the vector of **model parameters.**

Since we have a single feature $x_1,$ we’ll only have two model parameters $\theta_0$ and $\theta_1$ in our hypothesis function:

$h_\theta(x) = \theta_0 + \theta_1x_1$

As you may have noticed, the number of model parameters is equal to the number of features plus $1$. That’s because each feature is weighted by a parameter to control its impact on the hypothesis $h_\theta(x)$. There is also an independent parameter $\theta_0$ called the **intercept term,** which defines the point where the hypothesis function intercepts the $y$-axis as demonstrated below:

The predictions of a hypothesis function can easily be evaluated in Python by computing the cross product of $x$ and $\theta^T.$ At the moment we have our $x$ and $y$ vectors but we don’t have our model parameters yet. So let’s create those as well and initialize them with zeros:

1 | theta = np.zeros(2) |

Also, we have to make sure that the matrix dimensions of $x$ and $\theta^T$ are compatible with each other for cross product. Currently $x$ has $1$ column but $\theta^T$ has $2$ rows. The dimensions don’t match because of the additional intercept term $\theta_0.$

We can solve this issue by prepending a column to $x$ and set it to all ones. This is essentially equivalent to creating a new feature $x_0 = 1.$ This extra column won’t affect the hypothesis whatsoever because $\theta_0$ is going to be multiplied by $1$ in the cross product.

Let’s create a new variable $X$ to store the extended $x$ matrix:

1 | X = np.ones(shape=(len(x), 2)) |

Finally, we can compute the predictions of our hypothesis as follows:

1 | predictions = X @ theta |

Of course, the predictions are currently all zeros because we haven’t trained our model yet.

The objective in training a linear regression model is to minimize a **cost function**, which measures the difference between actual $y$ values in the training sample and predictions made by the hypothesis function $h_\theta(x)$.

Such a cost function can be formulated as;

$J(\theta) = \dfrac{1}{2m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2$

where $m$ is the number of training examples.

Here’s its Python version:

1 | def cost(theta, X, y): |

Now let’s take a look at the cost of our initial untrained model:

1 | print('The initial cost is:', cost(theta, X, y)) |

`The initial cost is: 32.0727338775`

Since our hypothesis is based on the model parameters $\theta$, we must somehow adjust them to minimize our cost function $J(\theta)$. This is where the **gradient descent** algorithm comes into play. It’s an optimization algorithm which can be used in minimizing differentiable functions. Luckily our cost function $J(\theta)$ happens to be a differentiable one.

So here’s how the gradient descent algorithm works in a nutshell:

In each iteration, it takes a small step in the opposite gradient direction of $J(\theta)$. This makes the model parameters $\theta$ gradually come closer to the optimal values. This process is repeated until eventually the minimum cost is achieved.

More formally, gradient descent performs the following update in each iteration:

$\theta_j := \theta_j - \alpha\frac{1}{m}\sum\limits_{i=1}^{m}(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j$

The $\alpha$ term here is called the **learning rate**. It allows us to control the step size to update $\theta$ in each iteration. Choosing a too large learning rate may prevent us from converging to a minimum cost, whereas choosing a too small learning rate may significantly slow down the algorithm.

Here’s a generic implementation of the gradient descent algorithm:

1 | def gradient_descent(X, y, alpha, num_iters): |

Now let’s use this function to train our model and plot the hypothesis function:

1 | theta = gradient_descent(X, y, 0.02, 600) # run GD for 600 iterations with learning rate = 0.02 |

Our linear fit looks pretty good, right? The algorithm must have successfully optimized our model.

Well, to be honest, it’s been fairly easy to visualize the hypothesis because there’s only one feature in this problem.

But what if we had multiple features? Then it wouldn’t be possible to simply plot the hypothesis to see whether the algorithm has worked as intended or not.

Fortunately, there’s a simple way to debug the gradient descent algorithm irrespective of the number of features:

- Modify the gradient descent function to make it record the cost at the end of each iteration.
- Plot the cost history after the gradient descent has finished.
- Pat yourself on the back if you see that the cost has monotonically decreased over time.

Here’s the modified version of our gradient descent function:

1 | def gradient_descent(X, y, alpha, num_iters): |

Now let’s try learning rates $0.01$, $0.015$, $0.02$ and plot the cost history for each one:

1 | plt.figure() |

It appears that the gradient descent algorithm worked correctly for these particular learning rates. Notice that it takes more iterations to minimize the cost as the learning rate decreases.

Now let’s try a larger learning rate and see what happens:

1 | learning_rate = 0.025 |

Doesn’t look good… That’s what happens when the learning rate is too large. Even though the gradient descent algorithm takes steps in the correct direction, these steps are so huge that it’s going to overshoot the target and the cost diverges from the minimum value instead of converging to it.

Right now we can safely set the learning rate to $0.02$, because it allows us to minimize the cost and it requires relatively fewer iterations to converge.

Now that we’ve learned how to train our model, we can finally predict the food truck profit for a particular city:

1 | theta, _ = gradient_descent(X, y, 0.02, 600) # train the model |

`For population = 70,000, we predict a profit of $ 45905.6621788`