## Fixing the problem with large powers

Equipped with all the toolboxes of asymptotic analysis, we will start optimizing our algorithm. However, since we have already seen that our program does not work properly for even moderately large values of power, let's first fix that. There are two ways of fixing this; one is to actually give the amount of space it requires to store all the intermediate products, and the other is to do a trick to limit all the intermediate steps to be within the range of values that the `long`

datatype can support. We will use binomial theorem to do this part.

As a reminder, binomial theorem says *(x+y) ^{n}*

*= x*

^{n}*+*

^{n}*C*

_{1}*x*

^{n-1}*y +*

^{n}*C*

_{2}*x*

^{n-2}*y*

^{2}*+*

^{n}*C*

_{3}*x*

^{n-3}*y*

^{3}*+*

^{n}*C*

_{4}*x*

^{n-4}*y*

^{4}*+ …*

^{n}*C*

_{n-1}*x*

^{1}*y*

^{n-1}*+ y*for positive integral values of

^{n}*n*. The important point here is that all the coefficients are integers. Suppose,

*r*is the remainder when we divide a by

*b*. This makes

*a = kb + r*true for some positive integer

*k*. This means

*r = a-kb*, and

*r*

^{n}*= (a-kb)*.

^{n}If we expand this using binomial theorem, we have *r ^{n}*

*= a*

^{n}*-*

^{n}*C*

_{1}*a*

^{n-1}*.kb +*

^{n}*C*

_{2}*a*

^{n-2}*.(kb)*

^{2}*-*

^{n}*C*

_{3}*a*

^{n-3}*.(kb)*

^{3}*+*

^{n}*C*

_{4}*a*

^{n-4}*.(kb)*

^{4}*+ …*

^{n}*C*

_{n-1}*a*

^{1}*.(kb)*

^{n-1}*± (kb)*.

^{n}Note that apart from the first term, all other terms have *b* as a factor. Which means that we can write *r ^{n}*

*= a*

^{n}*+ bM*for some integer

*M*. If we divide both sides by

*b*now and take the remainder, we have

*r*

^{n}*% b = a*

^{n}*% b*, where

*%*is the Java operator for finding the remainder.

The idea now would be to take the remainder by the divisor every time we raise the power. This way, we will never have to store more than the range of the remainder:

public static long computeRemainderCorrected(long base, long power, long divisor){ long baseRaisedToPower = 1; for(long i=1;i<=power;i++){ baseRaisedToPower *= base; baseRaisedToPower %= divisor; } return baseRaisedToPower; }

This program obviously does not change the time complexity of the program; it just fixes the problem with large powers. The program also maintains a constant space complexity.

### Improving time complexity

The current running time complexity is *O(2 ^{x}*

*)*, where

*x*is the size of the input as we have already computed. Can we do better than this? Let's see.

What we need to compute is *(base ^{power}*

*) % divisor*. This is, of course, the same as

*(base*

^{2})^{power/2}*% divisor*. If we have an even

*power*, we have reduced the number of operations by half. If we can keep doing this, we can raise the

*power*of

*base*by

*2*in just

^{n}*n*steps, which means our loop only has to run

*lg(power)*times, and hence, the complexity is

*O(lg(2*

^{x}*)) = O(x)*, where

*x*is the number of bits to store

*power*. This is a substantial reduction in the number of steps to compute the value for large powers.

However, there is a catch. What happens if the *power* is not divisible by *2*? Well, then we can write *(base ^{power}*

*)% divisor = (base ((base*

^{power-1}*))%divisor = (base ((base*

^{2})^{power-1}*)%divisor*, and

*power-1*is, of course, even and the computation can proceed. We will write up this code in a program. The idea is to start from the most significant bit and move towards less and less significant bits. If a bit with

*1*has

*n*bits after it, it represents multiplying the result by the base and then squaring

*n*times after this bit. We accumulate this squaring by squaring for the subsequent steps. If we find a zero, we keep squaring for the sake of accumulating squaring for the earlier bits:

public static long computeRemainderUsingEBS(long base, long power, long divisor){ long baseRaisedToPower = 1; long powerBitsReversed = 0; int numBits=0;

First reverse the bits of our `power`

so that it is easier to access them from the least important side, which is more easily accessible. We also count the number of bits for later use:

while(power>0){ powerBitsReversed <<= 1; powerBitsReversed += power & 1; power >>>= 1; numBits++; }

Now we extract one bit at a time. Since we have already reversed the order of bit, the first one we get is the most significant one. Just to get an intuition on the order, the first bit we collect will eventually be squared the maximum number of times and hence will act like the most significant bit:

while (numBits-->0){ if(powerBitsReversed%2==1){ baseRaisedToPower *= baseRaisedToPower * base; }else{ baseRaisedToPower *= baseRaisedToPower; } baseRaisedToPower %= divisor; powerBitsReversed>>>=1; } return baseRaisedToPower; }

We test the performance of the algorithm; we compare the time taken for the same computation with the earlier and final algorithms with the following code:

public static void main(String [] args){ System.out.println(computeRemainderUsingEBS(13, 10_000_000, 7)); long startTime = System.currentTimeMillis(); for(int i=0;i<1000;i++){ computeRemainderCorrected(13, 10_000_000, 7); } long endTime = System.currentTimeMillis(); System.out.println(endTime - startTime); startTime = System.currentTimeMillis(); for(int i=0;i<1000;i++){ computeRemainderUsingEBS(13, 10_000_000, 7); } endTime = System.currentTimeMillis(); System.out.println(endTime - startTime); }

The first algorithm takes 130,190 milliseconds to complete all 1,000 times execution on my computer and the second one takes just 2 milliseconds to do the same. This clearly shows the tremendous gain in performance for a large power like 10 million. The algorithm for squaring the term repeatedly to achieve exponentiation like we did is called... well, exponentiation by squaring. This example should be able to motivate you to study algorithms for the sheer obvious advantage it can give in improving the performance of computer programs.