Proč se učit datové struktury a algoritmy?

V tomto článku se naučíme, proč by se každý programátor měl pomocí příkladů naučit datové struktury a algoritmy.

Tento článek je určen těm, kteří se právě začali učit algoritmy a přemýšleli o tom, jak účinné bude posílení jejich kariérních / programovacích dovedností. Je také pro ty, kteří se diví, proč velké společnosti jako Google, Facebook a Amazon najímají programátory, kteří jsou mimořádně dobří v optimalizaci algoritmů.

Co jsou to algoritmy?

Neformálně není algoritmus nic jiného než zmínka o krocích k vyřešení problému. Jsou v podstatě řešením.

Například algoritmus k řešení problému faktoriálů může vypadat asi takto:

Problém: Najděte faktoriál čísla n

 Inicializace faktu = 1 Pro každou hodnotu v v rozsahu 1 až n: Vynásobte fakt faktem v fakt obsahuje faktoriál n

Zde je algoritmus napsán v angličtině. Pokud by byl napsán v programovacím jazyce, místo toho bychom jej nazvali kódováním . Tady je kód pro nalezení faktoriálu čísla v C ++.

 int factorial(int n) ( int fact = 1; for (int v = 1; v <= n; v++) ( fact = fact * v; ) return fact; )

Programování je vše o datových strukturách a algoritmech. K uchovávání dat se používají datové struktury, zatímco k řešení problému s těmito daty se používají algoritmy.

Datové struktury a algoritmy (DSA) podrobně procházejí řešením standardních problémů a poskytují vám přehled o tom, jak efektivní je jejich použití. Také vás naučí vědu o hodnocení účinnosti algoritmu. To vám umožní vybrat si to nejlepší z různých možností.

Používání datových struktur a algoritmů k tomu, aby byl váš kód škálovatelný

Čas je drahocenný.

Předpokládejme, že Alice a Bob se pokouší vyřešit jednoduchý problém s nalezením součtu prvních 10 ¹¹ přirozených čísel. Zatímco Bob psal algoritmus, Alice jej implementovala, což dokazuje, že je stejně jednoduché jako kritika Donalda Trumpa.

Algoritmus (Bob)

 Inicializujte součet = 0 pro každé přirozené číslo n v rozsahu 1 až 1011 (včetně): přidejte n k součtu součet je vaše odpověď

Kód (Alice)

 int findSum() ( int sum = 0; for (int v = 1; v <= 100000000000; v++) ( sum += v; ) return sum; )

Alice a Bob pociťují ze sebe euforii, že by mohli postavit něco svého vlastního téměř za okamžik. Pojďme se vplížit do jejich pracovního prostoru a poslouchat jejich konverzaci.

 Alice: Pojďme spustit tento kód a zjistit součet. Bob: Spustil jsem tento kód před několika minutami, ale stále nezobrazuje výstup. Co je s tím špatně?

Jejda! Něco se pokazilo! Počítač je nejvíce deterministický stroj. Návrat zpět a pokus o opětovné spuštění nepomůže. Pojďme tedy analyzovat, co se děje s tímto jednoduchým kódem.

Dva z nejcennějších zdrojů pro počítačový program jsou čas a paměť .

Doba potřebná pro spuštění kódu počítačem je:

 Čas pro spuštění kódu = počet instrukcí * čas pro provedení každé instrukce

Počet pokynů závisí na použitém kódu a čas potřebný k provedení každého kódu závisí na vašem počítači a kompilátoru.

V tomto případě je celkový počet provedených instrukcí (řekněme x) , což jex = 1 + (10¹¹ + 1) + (10¹¹) + 1x = 2 * 10¹¹ + 3

Předpokládejme, že počítač může provést pokyny za jednu sekundu (může se lišit v závislosti na konfiguraci stroje). Doba potřebná ke spuštění nad kódem jey = 10⁸

 Doba potřebná ke spuštění kódu = x / y (více než 16 minut)

Je možné optimalizovat algoritmus tak, aby Alice a Bob nemuseli čekat 16 minut při každém spuštění tohoto kódu?

Jsem si jistý, že jste již uhodli správnou metodu. Součet prvních N přirozených čísel je dán vzorcem:

 Součet = N * (N + 1) / 2

Jeho převod do kódu bude vypadat asi takto:

 int součet (int N) (návrat N * (N + 1) / 2;)

This code executes in just one instruction and gets the task done no matter what the value is. Let it be greater than the total number of atoms in the universe. It will find the result in no time.

The time taken to solve the problem, in this case, is 1/y (which is 10 nanoseconds). By the way, the fusion reaction of a hydrogen bomb takes 40-50 ns, which means your program will complete successfully even if someone throws a hydrogen bomb on your computer at the same time you ran your code. :)

Note: Computers take a few instructions (not 1) to compute multiplication and division. I have said 1 just for the sake of simplicity.

More on Scalability

Scalability is scale plus ability, which means the quality of an algorithm/system to handle the problem of larger size.

Consider the problem of setting up a classroom of 50 students. One of the simplest solutions is to book a room, get a blackboard, a few chalks, and the problem is solved.

But what if the size of the problem increases? What if the number of students increased to 200?

The solution still holds but it needs more resources. In this case, you will probably need a much larger room (probably a theater), a projector screen and a digital pen.

What if the number of students increased to 1000?

The solution fails or uses a lot of resources when the size of the problem increases. This means, your solution wasn't scalable.

What is a scalable solution then?

Consider a site like Khanacademy, millions of students can see videos, read answers at the same time and no more resources are required. So, the solution can solve the problems of larger size under resource crunch.

If you see our first solution to find the sum of first N natural numbers, it wasn't scalable. It's because it required linear growth in time with the linear growth in the size of the problem. Such algorithms are also known as linearly scalable algorithms.

Our second solution was very scalable and didn't require the use of any more time to solve a problem of larger size. These are known as constant-time algorithms.

Memory is expensive

Memory is not always available in abundance. While dealing with code/system which requires you to store or produce a lot of data, it is critical for your algorithm to save the usage of memory wherever possible. For example: While storing data about people, you can save memory by storing only their age not the date of birth. You can always calculate it on the fly using their age and current date.

Examples of an Algorithm's Efficiency

Here are some examples of what learning algorithms and data structures enable you to do:

Example 1: Age Group Problem

Problems like finding the people of a certain age group can easily be solved with a little modified version of the binary search algorithm (assuming that the data is sorted).

The naive algorithm which goes through all the persons one by one, and checks if it falls in the given age group is linearly scalable. Whereas, binary search claims itself to be a logarithmically scalable algorithm. This means that if the size of the problem is squared, the time taken to solve it is only doubled.

Suppose, it takes 1 second to find all the people at a certain age for a group of 1000. Then for a group of 1 million people,

the binary search algorithm will take only 2 seconds to solve the problem
the naive algorithm might take 1 million seconds, which is around 12 days

The same binary search algorithm is used to find the square root of a number.

Example 2: Rubik's Cube Problem

Imagine you are writing a program to find the solution of a Rubik's cube.

This cute looking puzzle has annoyingly 43,252,003,274,489,856,000 positions, and these are just positions! Imagine the number of paths one can take to reach the wrong positions.

Fortunately, the way to solve this problem can be represented by the graph data structure. There is a graph algorithm known as Dijkstra's algorithm which allows you to solve this problem in linear time. Yes, you heard it right. It means that it allows you to reach the solved position in a minimum number of states.

Example 3: DNA Problem

DNA is a molecule that carries genetic information. They are made up of smaller units which are represented by Roman characters A, C, T, and G.

Imagine yourself working in the field of bioinformatics. You are assigned the work of finding out the occurrence of a particular pattern in a DNA strand.

It is a famous problem in computer science academia. And, the simplest algorithm takes the time proportional to

 (number of character in DNA strand) * (number of characters in pattern)

A typical DNA strand has millions of such units. Eh! worry not. KMP algorithm can get this done in time which is proportional to

 (number of character in DNA strand) + (number of characters in pattern)

The * operator replaced by + makes a lot of change.

Considering that the pattern was of 100 characters, your algorithm is now 100 times faster. If your pattern was of 1000 characters, the KMP algorithm would be almost 1000 times faster. That is, if you were able to find the occurrence of pattern in 1 second, it will now take you just 1 ms. We can also put this in another way. Instead of matching 1 strand, you can match 1000 strands of similar length at the same time.

And there are infinite such stories…

Final Words

Generally, software development involves learning new technologies on a daily basis. You get to learn most of these technologies while using them in one of your projects. However, it is not the case with algorithms.

Pokud neznáte dobře algoritmy, nebudete schopni určit, zda můžete optimalizovat kód, který právě píšete. Očekává se od nich, že je předem znáte a použijete je, kdykoli je to možné a kritické.

Konkrétně jsme hovořili o škálovatelnosti algoritmů. Softwarový systém se skládá z mnoha takových algoritmů. Optimalizace kteréhokoli z nich vede k lepšímu systému.

Je však důležité si uvědomit, že to není jediný způsob, jak zajistit škálovatelnost systému. Například technika známá jako distribuované výpočty umožňuje nezávislým částem programu běžet na více strojích společně, čímž je ještě více škálovatelná.