1,105,288 Community Members

Assembly

Member Avatar
alanso
Newbie Poster
22 posts since Mar 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Guys..My task is to convert for loop c++ to asm....and the speed shoud be fast....i managed to convert to asm the loop part but when i run and enter 2000 it shows 33 second is it possible to be lower the 5 second....

#include <iostream>
#include <ctime>

using namespace std;

void main ()
{
    time_t start, end;
    unsigned short a, b, c, y, count=0;
    float diff;

    cout << "Enter y : ";
    cin >> y;
    cout << "Calculation start..." << endl;
    start = clock();

     //this is loop part which need to change to increase the speed

    _asm{

            mov dx,y    // Set value of y to dx
            mov ax,0    // AX as a, start from 0
    set1:   mov bx,0    // BX as b, start from 0 / reset to 0
    set2:   mov cx,0    // CX as c, start from 0 / reset to 0

    again:  push ax     // Store AX value to Stack
            add ax,ax   // This 3 line to calculate the
            add ax,bx   // value of 2a + b - c
            sub ax,cx   // 

    again2: cmp ax,dx   // Compare the value (2a + b - c) with y (DX)
            jne skip    // IF not equal then jump to skip
            inc count   // Else IF equal then count++; 

    skip:   cmp cx,dx   // Comparing c (CX) with y (DX)
            jge up      // IF c >= y, jump to up
            inc cx      // Else c++;
            pop ax      // Restore the 'a' (AX) value form stack 
                        // (because AX value changed when calculate the value 2a+b-c)
            jmp again   // Jump to again to recount and recheck

    up:     cmp bx,dx   // Comparing b (BX) with y (DX)
            jge up2     // IF b >= y, jump to up2
            inc bx      // Else b++;
            pop ax      // Restore 'a' value.
            jmp set2    // Jump to set2 which will reset c value to 0

    up2:    pop ax      // Restore 'a' value
            cmp ax,dx   // Compare a (AX) with y (DX)
            jge stop    // IF a >= y, jump to stop
            inc ax      // Else a++;
            jmp set1    // Jump to set1 which will reset b value to 0

stop:
    }


    //Do not change the code below
    end = clock();
    diff = (float)(end - start)/CLOCKS_PER_SEC; 
    cout << "Calculation complete." << endl;
    cout << "Time used is " << diff << " second" << endl;
    cout << "There are " << count << " combination to produce " << y << endl;

    system ("pause");

}
Member Avatar
Banfa
Master Poster
798 posts since Mar 2010
Reputation Points: 582 [?]
Q&As Helped to Solve: 132 [?]
Skill Endorsements: 12 [?]
Featured
 
1
 

You have used more loops than is necassary. The thing is once you have chosen a value of a and b you can calculate the value of c that solves the equation 2a + b - c = y. You do not need a loop that checks every value of c.

In fact since you just want a count of solutions you don't even need to calculate the value of c all you need to check is that the value would be in the right range 0 <= c <= y.

Written in c++ it looks something like

for(a=0; a<=y; a++)
{
    for(b=0; b<=y; b++)
    {
        if ((((2 * a) + b - y) <= y) && (((2 * a) + b) >= y))
        {
            ++count;
        }
    }
}

This takes about 0.019s to run or if you switch on the optomiser 0.006s for an input of 2000. (NOTE with 3 loops the C++ solution was taking around 33s to complete or 20s under optomisation).

Interestingly the calculate c and then check it's in the right range solution

for(a=0; a<=y; a++)
{
    for(b=0; b<=y; b++)
    {
        c = (2 * a) + b - y;

        if (0 <=c && c <=y)
        {
            ++count;
        }
    }
}

takes 0.039s with no optomisation (slower) about 0.004s with optomisation (faster) which kind of suggests the optomiser finds it easier to optomise simpler code (which makes sense).

Member Avatar
alanso
Newbie Poster
22 posts since Mar 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

owh ok thanks alot but i want it in assembly....

Member Avatar
rubberman
Senior Poster
3,984 posts since Mar 2010
Reputation Points: 513 [?]
Q&As Helped to Solve: 500 [?]
Skill Endorsements: 87 [?]
 
1
 

The algorithm won't change - only the language. Consider Banfa's post as pseudo-code and convert that to assembler... :-)

Member Avatar
Banfa
Master Poster
798 posts since Mar 2010
Reputation Points: 582 [?]
Q&As Helped to Solve: 132 [?]
Skill Endorsements: 12 [?]
Featured
 
1
 

Alternatively consider that your assembly routine takes 33s, my original C++ port of your assembly routine without optomisation also takes 33s but with optomisation takes 20s and accept that the C++ compiler and optomiser does a better job than you of producing assembly/machine code and just use the C++ implementation with optomisation switched on.

Member Avatar
alanso
Newbie Poster
22 posts since Mar 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
-1
 

owh ok thanks for comments....is there any online converter available C++ to assembly...i need to convert this to assembly now...

for (a=0; a<=y; a++) {
    for (b=0; b<=y/2; b++) {
       for (c=0; c<=y/3; c++) {
           if ((a+2*b+3*c) == y)
count++;
       }
   }
 }
Member Avatar
Banfa
Master Poster
798 posts since Mar 2010
Reputation Points: 582 [?]
Q&As Helped to Solve: 132 [?]
Skill Endorsements: 12 [?]
Featured
 
0
 

Firstly I still recon you have 1 loop too many in that code. Again after the second loop you can calulate c and then check it is in the correct range. A bit of maths and an if statement is always generally going to be quicker than a loop.

To answer your question many C++ compilers can actually output the assembler for the code the are compiling, for example by using the -S switch with gcc the compiler outputs assembler to <filename>.s

Member Avatar
alanso
Newbie Poster
22 posts since Mar 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

owh ok thanks anyway is this correct according to c++ code...and when i input 2000 i still got 30sec....i want to get atleast 5 or lower is it possible?

 _asm{ 

            mov dx,y    // Set value of y to dx 
            mov ax,0    // AX as a, start from 0 
    set1:   mov bx,0    // BX as b, start from 0 / reset to 0 
    set2:   mov cx,0    // CX as c, start from 0 / reset to 0 

    again:  push ax     // Store AX value to Stack 
            add ax,ax   // This 3 line to calculate the 
            add ax,bx   // value of 2a + b - c 
            add ax,cx   //  

    again2: cmp ax,dx   // Compare the value (2a + b - c) with y (DX) 
            jne skip    // IF not equal then jump to skip 
            inc count   // Else IF equal then count++;  

    skip:   cmp cx,dx   // Comparing c (CX) with y (DX) 
            jge up      // IF c >= y, jump to up 
            inc cx      // Else c++; 
            pop ax      // Restore the 'a' (AX) value form stack  
                        // (because AX value changed when calculate the value 2a+b-c) 
            jmp again   // Jump to again to recount and recheck 

    up:     cmp bx,dx   // Comparing b (BX) with y (DX) 
            jge up2     // IF b >= y, jump to up2 
            inc bx      // Else b++; 
            pop ax      // Restore 'a' value. 
            jmp set2    // Jump to set2 which will reset c value to 0 

    up2:    pop ax      // Restore 'a' value 
            cmp ax,dx   // Compare a (AX) with y (DX) 
            jge stop    // IF a >= y, jump to stop 
            inc ax      // Else a++; 
            jmp set1    // Jump to set1 which will reset b value to 0 

    stop:

stop:
    }
Member Avatar
Banfa
Master Poster
798 posts since Mar 2010
Reputation Points: 582 [?]
Q&As Helped to Solve: 132 [?]
Skill Endorsements: 12 [?]
Featured
 
1
 

That is because you still have 3 loops, get rid of the third loop as I showed you. The problem with your implementation is that because of the nested loops it has a complexity of N^3. By removing a loop you reduce the complexity to N^2 which makes a huge difference to execution time.

It is slow because of the number of nested loops you use.

If I just compile the C++ code I get the following runtimes

                      Compiler      Execution
Code                  Optomised       Time
Your Original Code      No            7.237s
Your Original Code      Yes           1.144s
Rewritten With 2 Loops  No            0.018s
Rewritten With 2 Loops  Yes           0.002s

You ought to be able to get at least as good as the unoptomised compilation of the C++ code.

Also the code in assembly at lines 9-10 does not implement either the equation of you C++ code or the equation in the comments.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: