Hi,

I wrote this implementation of Sieve of Eratosthenes, is there any way I could eek some more performance out of it. As in opitmising data types, calculations etc.

#include <iostream> // cout
#include <math.h> // sqrt
#include <time.h> // clock/CLOCKS_PER_SEC
#include <stdlib.h> // malloc/free

using namespace std;

int main(int argc, const char * argv[]) {
    size_t n = 1000000000;
    int i;
    int j;
    int k;
    int s;
    int c;
    int sr;
    int * a = (int*)malloc(n * sizeof(int));
    clock_t t = clock();

    for(i = 0; i < n; i++) {
        a[i] = 1;
    }

    sr = sqrt(n);
    for(i = 2; i <= sr; i++) {
        if(a[i]) {
            s = i * i;
            for(k = 0, j = 0; j <= n; j = s + (k * i), k++) {
                a[j] = 0;
            }
        }
    }

    t = clock() - t;

    c = 0;
    for(i = 2; i < n; i++) {
        if(a[i]) {
            c++;
        }
    }

    cout << fixed << "Calculated " << c << " prime numbers up to " << n << " in " << t << " clocks (" << ((float)t) / CLOCKS_PER_SEC << " seconds).\n";

    free(a);

    return 0;
}

Currently, I get the following performance:
"Calculated 78498 prime numbers up to 1000000 in 7634 clocks (0.007634 seconds)."
"Calculated 664579 prime numbers up to 10000000 in 130416 clocks (0.130416 seconds)."
"Calculated 50847534 prime numbers up to 1000000000 in 19063075 clocks (19.063076 seconds)."

Thanks in advance.

BTW, those times are with clang++ main.cpp -O3, although not sure if -O2 would even make a difference (never tested).

Machine:
2013 15" MacBook Pro w/:

i7 2.4GHz 4-core.
8GB 1600MHz RAM.

And a quick question on the side, would a processor with a higher clock (i.e. P4) be any faster than the i7? Or are there other factors like instruction-set efficiency and processor efficiency? Never quite been sure on that one.

Again, thanks.

Back in the 1980's I wrote a seive using an array of pre-computed values from 1 to 10,000. Then using a recursive algorithm I could look up primes with almost linear time, and this was on an old Compaq 8088 with 256K of ram... I used doubles so I could determine if a number was prime up to 15+ digits. I later used that code for some work I did on Goedel numbers, strings represented as the products of prime numbers - very relevant work to breaking public key encryption. I stopped because I didn't have the time to build an arbitrary math module that would have been required to exceed the capability of double precision math - Boost didn't exist back then!

Anyway, I'd post the code here if I had time to dig the floppy it lives on somewhere in my storage space... And I'm not sure I still have a functional floppy drive. :-)

Edited 1 Year Ago by rubberman

I'm still rocking a floppy at work but it is a USB external floppy. The machine I have also has a serial port and that was hard to come by. When will computer manufactures realize that stuff built in the 80's-90's still works great today?

The only way to know for sure is to profile the code and identify the hot spots. When trying to improve performace, it is very, very, very, very common for a hunch or guess to be just plain wrong.

Bang out the performance profiling tools (on linux, I cannot recommend using Brendan Gregg's flamegraphs highly enough; http://www.brendangregg.com/ ), build a release version with debug symbols in, and profile your code. Also, take a look at your algorithm and see if there's anything that can be done in parallel; if you've got more than one core, you might as well use them.

Edited 1 Year Ago by Moschops

This article has been dead for over six months. Start a new discussion instead.