Algorithms are precisely cataloged “recipes” for solving specific problems that we encounter in computer science. Because so much of computer science is an abstraction of real world problems, a good knowledge of algorithms will help you apply them in many different domains. See Kodeclik’s blog post about algorithms in general.
1. Sorting Algorithms
Sorting is a very common task in computer science. Whether we are trying to organize students by their grades, or rank cities by their distance, or arrange images by their magnitude of a certain color ingredient, sorting is imperative to all of these tasks. A good understanding of sorting hence helps in many fields and applications.
Kodeclik’s blog has run a series of posts about famous sorting algorithms. Below is a short summary of them.
Bubble sort: A very simple sorting algorithm that is easy to understand, easy to implement, easy to explain, but not very efficient. As the name indicates bubble sort works by swapping adjacent elements so the largest element bubbles to the end of the array (hence the name), followed by the second largest element, and so on.
Insertion sort: Another easy-to-undestand sorting algorithm that mimics the way we sort cards in our hand. Insertion sort scans from left to right and every time it encounters a mis-positioned element, it positions it at its intended location, and continues the scan.
Merge sort: Among the faster sorting algorithms, merge sort belongs to the family of “divide-and-conquer” algorithms, i.e., algorithms that break down the problem into smaller sub-problems, solve each of them separately, and then merge the results. Mergesort takes the easy way out in the “divide” part and does more work in the “conquer” part.
Quicksort: A state-of-the-art sorting algorithms. Like mergesort it is also a divide-and-conquer algorithm. But unlike mergesort it does more work in the “divide” part so the “conquer” step is trivial or straightforward.
A solid understanding of all these above four algorithms is critical to success as a computer science practitioner.
2. Celebrity Spotting
The second algorithm we will cover is celebrity spotting. Let us first define a celebrity. A celebrity is someone who is known by everybody but who doesn’t know any of them individually.
Assume you are in a party and you are trying to determine if there is a celebrity among the group. Suppose you can go and ask anybody, say, X: ‘do you know Y?’ (where X and Y are people in the party). How will you efficiently determine if there is a celebrity in the party? (Remove yourself from consideration.)
Note that there can be at most 1 celebrity in a group. There cannot be 2 celebrities, for instance. Why not? Let us suppose, for the sake of argument, that both X and Y are celebrities. If X is truly a celebrity, X doesn’t know anybody. In particular X shouldn’t know Y, therefore Y cannot be a celebrity. Similarly, you can argue in the other direction.
Further note that it is possible that there could be no celebrities in a group. So there can be either one celebrity or zero celebrities among a group.
So how do we find if a celebrity exists and who this person is? One possibility is to go to each person and ask them about every other person. If there are 6 people, each person will be asked about 5 other people. (Note that you will have to ask this question in both directions.) You will need to ask 30 questions which seems excessive. If you have n people you will be asking O(n^2) questions. Is there a smarter way to do it?
It turns out that there is a more efficient algorithm for celebrity spotting. Pick any two random people, say Bridget and Tom. Ask Bridget: “Do you know Tom?”. If she says yes, Bridget cannot be the celebrity. If she says no, Tom cannot be the celebrity. So either way, you will have eliminated one person from consideration. Retain the other person and do the same question by adding a new person, and repeat.
At the end of this process, in a group of “n” people, you will have eliminated n-1 people. Now you have 1 person for consideration as a potential celebrity. This person may or may not be a celebrity. We need to verify that he or she is indeed a celebrity. This can be done by asking everybody else if they know this person (and confirming that they do) and also asking the potential celebrity if they know everybody else (and confirming that they do not).
This eliminate-and-verify approach takes O(n) time in contrast to the all-pairs approach which takes O(n^2).
Celebrity spotting is not just a Hollywood problem. The same type of problem happens in web link analysis, e.g., finding webpages that are linked to by others but which themselves do not link to many pages. Google’s Pagerank algorithm uses similar concepts to rank webpages.
3. Euclid’s Algorithm
Euclid’s algorithm is one of the oldest algorithms and is used to find the greatest common divisor (GCD) or highest common factor (HCF) of two numbers. Let us suppose we wish to find the GCD of 75 and 120. One way to do it is to factorize both numbers to their prime factors.
Then we can identify the largest common factors (3 and 5) to find the GCD as 15. This is not as efficient as it sounds and when the numbers get larger the GCD becomes difficult to compute.
Euclid’s algorithm is more efficient and works as follows. Take the smaller number and keep subtracting it from the larger number till you cannot subtract it more. Thus we subtract 75 (once) from 120 to obtain 45. Replace the larger number with this number.
So the problem of computing GCD(75, 120) becomes GCD(75, 45). Note that we have reduced the original GCD problem with a new GCD problem involving smaller numbers.
We repeat this procedure again: 75-45 is 30 and we cannot subtract 45 further. So the problem becomes GCD(30, 45).
Repeating again, 45-30=15. We get GCD(30,15).
Once again, we get: 30-15=15. We get GCD(15, 15). This gets reduced to GCD(0,15). When one of the numbers reduces to 0, the other number is the greatest common divisor, i.e., 15.
Here is a Python implementation of Euclid’s algorithm:
4. Finding Shortest Paths
The shortest path problem is the task of finding the shortest path from a given vertex or location to another vertex or location. For instance, Google Maps is able to find shortest paths for you from any city to any other city and is internally executing a shortest path algorithm.
Finding shortest paths is not an easy problem and depends on a lot of factors. You are given a graph or network where the nodes denote cities or locations and the edges denote roads. The edges are often labeled with a number denoting distance or cost. The goal of the shortest path algorithm is to find paths from a start vertex to an end vertex such that the sum of edge weights is the smallest possible, e.g., shortest distance traveled or lowest tolls paid.
The shortest path problem has many variants. For instance the edges can be undirected (meaning they can be traversed in both directions) or they could be directed (meaning they can be traversed only in the direction highlighted). Imagine roads being either two-way or one-way and you will see why this distinction is important.
Furthermore, the edges can have only positive weights (denoting distance), or you can consider a scenario where edges can have positive or negative weights. Negative weights might not make sense in a travel map but they occur in other domains. For instance, if the network denotes people transacting money, negative edges might denote somebody being owed money.
The problem with negative edges is that if there is a cycle of negative edges, then you can traverse such a cycle indefinitely to lower your total cost indefinitely, so technically there is no shortest path.
Two popular algorithms for finding shortest paths are Dijkstra’s algorithm and the Bellman-Ford algorithm. Both algorithms are suitable when you have a single starting point (the source) and wish to find shortest paths to every other point or destination in the graph. Dijkstra’s algorithm cannot handle negative cycles but the Bellman-Ford algorithm can (meaning it will detect if such a cycle exists and exit the algorithm pointing out the existence of such a cycle).
5. Binary Search
The final algorithm we will consider is the binary search algorithm. Assume you have an array of numbers that is already sorted (this is important). Suppose we wish to find the location of a specific number.
For instance, consider the array [1,2,...,100], which contains 100 elements. In general the numbers will not be consecutive but let us assume that for ease of presentation.Suppose we wish to look for the number 67.
One way to do it is to go through the array sequentially but we can be smarter because we know that the array is already sorted.
Imagine how you will leaf through a book. Suppose I told you to goto page 500. You will not comb through the pages one by one. You will open up a page somewhere approximately in the middle and see what the page number is. If the page number is say, 300, you know that the page you are looking for is between the random page and the end of the book. Similarly, if the page number is say, 700, the page you are looking for is between the start page and the random page. This is the essence of binary search. At each step you approximately divide up the search space in half (hence the name binary).
Here is the code for binary search in Python.
This produces the output:
So you have learnt five different algorithms in this blogpost and they form the basis of many diverse applications in computer science. Write to us about how (and where) you find them useful! Checkout also our blogpost on sorting algorithms, like insertion sort, bubble sort, merge sort, and quicksort.
Kodeclik is an online coding academy for kids and teens to learn real world programming. Kids are introduced to coding in a fun and exciting way and are challeged to higher levels with engaging, high quality content.