Hi,
Anyone suggest/Explain code optimization, time complexity of program. I came across Big O(n), Big O(...). can anyone explain what these means and how to find these for programs.
--
Thanks in advance
Hi,
Anyone suggest/Explain code optimization, time complexity of program. I came across Big O(n), Big O(...). can anyone explain what these means and how to find these for programs.
--
Thanks in advance
Code optimization and time-complexity are separate issues (at least, IMO).
Algorithmic complexity
Time-complexity just means to analyse what is the relationship between the size of the data and the amount of time it will take to perform a given operation (algorithm) on that data. Typically, N usually represents the size of the data, like the number of elements in an array, number of nodes in a graph, number of words in a dictionary, or number of entries in a database, and so on. Of course, sometimes there are more than one such number to represent the size of the data, in which case other letters are used. Basically, the Big O just represents the fact that we don't care about the actual time that it takes (e.g., how many seconds to perform the algorithm on 1000 elements), but rather only care about the biggest functional relationship between time and data-size. So, if you see O(1), which we call "constant-time", it really means that the time taken to perform the algorithm is the same (constant) regardless of the amount of data. If you see O(N), which we call "linear-time", it means that the time taken to perform the algorithm grows as linear function of the size of the data (i.e., twice the size equal twice the time). And so on, you'll see things like O(N^2) (quadratic-time) or O(N^3) for more complex algorithms. The reason why you won't see things like O(N^2 + N + 1) is because there is usually an assumption that N is large, in which case, only the largest term dominates and the others can be discarded.
Also, in some other algorithms, especially "dynamic programming" algorithms, you'll see O(logN) which means usually indicates a "divide-and-conquer" approach, this time-complexity is called log-time, and the log is a base-2 logarithm (but it actually doesn't matter what the base is). The easiest algorithm to consider in this case is the binary-search algorithm which searches for a value in a sorted array by checking the middle element and searching into whichever half-range contains the element, and doing so recursively until the size of the range is 1. In this case, at every step, you split the search-range in half, so, how many times do you need to split a number in half to get to 1? It takes log(N) splits to reach 1, and that is why the algorithm has a time-complexity of O(logN), because it is going to perform logN steps before completing.
The way you figure out the time-complexity of an algorithm is essentially by considering the core step that is being done and trying to figure out how many times (in relation to how many elements in the data) that step will be performed. For example, if you initialize an array of N elements to zero, then the core step is setting an individual element to zero, and that core step will have to be done N times, and thus, the algorithm is O(N). Finding the average of a list of numbers would be the same, the core step is adding an element to the average-value, and that will be done N times, thus O(N). And so on. There are formal methods and stuff to calculate complexities in a more rigorous fashion, but most of the time it is easier to simply look at the algorithm (usually, the number of nested loops tells you a lot).
Another similar aspect to algorithm analysis is space-complexity (or memory-complexity), which is a similar thing but it relates the amount of additional memory storage required by the algorithm as a function of the size of the data. These will also be expressed with the Big O notation.
Finally, many algorithms to not have a fixed amount of steps, they are adaptive or dependent on the data. For example, some sorting algorithms will be very fast if the array is already sorted to begin with (usually in O(N) time, essentially just checking that the array is sorted), or if it is nearly sorted. In other words, sometimes the time that the algorithm will take to complete depends on the data it is given as input. If that is the case, we usually talk about the worst-case, average-case and best-case time-complexities. Basically, this just means what their names imply. The average-case and worst-case are the two that matter the most in general. In a sorting problem, for example, the best-case will usually occur if the array is already sorted, the worst-case might be when the array is sorted in reverse order, and the average-case might be just any random sequence of numbers. Of course, these cases depend on the application of the algorithm (the kind of data it receives).
And then, there are probabilistic algorithms (i.e., non-deterministic), but that is a whole other ball-park, with different methods to analyse their performance (or their rate of convergence to a solution).
Code optimization
Code optimization usually refers to the actual optimization of the implementation of a given algorithm or data structure. This is quite different from algorithmic complexity. In other words, while algorithmic complexity deals with the functional relationship between execution-time and data-size, as in O(N) (twice the data, twice the time), you could say that code optimization deals with the actual execution-time, which would be represented by the "constant-term" (or constant-factor) that multiplies the Big O function. For example, we know that an algorithm in O(N) means that execution-time = C * N
where C is some unknown constant. Code optimization deals mostly with trying to make that C constant-factor as small as possible.
This is highly empirical, meaning that you just test and tweak, and test and tweak, and so on, until you have the performance that you wanted (or that you can't improve further). There are tools like "profilers" that can be used for this purpose, mostly to identify which parts of the code takes the most time to execute when testing it, and then you can focus on improving that particular part. Because this requires that you test your program, you can't (and shouldn't) really do code optimization until you have a working version of your code. Once you have a working version, you can test the performance, identify problems, and optimize.
Code optimization has several layers and sub-areas. Mostly, people talk of nano-optimization, micro-optimization, and macro-optimization. Nano-optimization usually deals with instruction-by-instruction optimizations, usually when a particular function gets called millions of times per second, it can be worth it to squeeze every single clock-cycle out of it. Micro-optimization is far more common, it usually deals with trying to choose the correct data-structures and looping mechanisms, eliminating conditionals and temporaries, and things like that that generally improve the performance, without needing to actually examine the machine code and optimize each instruction. While macro-optimization is mostly a software-engineering problem of trying to structure the overall software such that you minimize expensive operations such as transferring data over the network or just avoiding having to do too much copying of data around the software, and so on. In some sense, you could say that nano-optimization is at the nano-second scale (about the time for a few instructions), micro-optimization is at the micro-second scale (about the time for an algorithm or look-up operation) and macro-optimization is at the second scale (about the scale of the noticeable time differences between a poor software design and a good one).
There is another area of code optimization which deals with memory models as well. The main term used is "cache optimization" which refers to minimizing the amount of cache-misses that occur when running a given algorithm. But memory optimization is a lot more than that, especially when it comes to concurrent and distributed/parallel computing models. This is quite a bit more complicated than other code optimization problems.
Complexity usually measured in terms of how the amount of a resource (time or space) required to run a program increases as the size of the program input increases.
Most common measure of complexity is the amount of time the program takes to run. If the algorithm of a program takes longer than the polynomial time(explained later), it is said to be intractable. If the algorithm takes lesser or within the polynomial time, it is said to be tractable.
O(1) = constant
O(log a n) = logarithmic
O(n) = linear
O(n log a n) = “n log n”
O(n^2) = quadratic
O(n^3) = cubic
O(n^r) = polynomial
O(a^n) = exponential
O(n!) = factorial
Anything that is larger than exponential is said to be exponential time algorithm or simply intractable. So, from constant to polynomial time algorithm can be classified as tractable while after polynomial can be classified as intractable. As the program goes down from constant to factorial, it has a reduced efficiency. I will provide an example of Big O notation.
(1) for(i = 0; i < n; i++)
{
(2) for(k = 0; k < n; k++)
{
(3) A[i,k] = 0;
}
}
Line (3) takes O(1) time
Line (2) takes O(n) time, since it executed according to the value of n. If the value of n is 5, the inner loop goes 5 times.
Line (1) takes O(n^2) time. If the the value of n is 2, the outer loop executes once and the inner loop executes twice. After the outer loop finishes the execution, it will have 4 execution counters. (4 times).
Good way to see this would be assigning a counter variable in the inner loop and printing the variable outside the loop.
Well, this is the summary of what i can explain. For more better understanding, look at the Hanoi Tower problem and Travelling Salesman problem.
Thank you friends.