A few timing facts, for the curious. These were all run on the same PC, which was otherwise not busy.

The "A" algorithm, was the one that used the Sieve of Eratosthenes, but not the best one from CSE (it was getting 4.65 on the IDE and 4.0 on the terminal). It was "almost as good as CSE's version", but it was much simpler to test with.

Note: A "terminal" is sometimes called a Command Window, in Windows.

This included an aux1 (auxiliary) array (sometimes called Big[]), that held LOTS of prime numbers (a few hundred to a few thousand), to try and get a feel for the effect on the timing. Data shows the aux1 array was helpful, but not very much. At the largest number of primes we could use for the challenge, it could save 0.3 seconds, only (on my system).

The "B1" algorithm used no aux1 or big array of primes, at all. The k queries were sorted by Quicksort, and printed as they were generated, inside the loop that finds the primes - one number per call to printf().

The "B2" version of the algorithm is similar to the B1 version except the answers are not printed right away. They're stored in an array, and then printed out with multiple numbers being printed, in a single call to printf(). I'm experimenting with printing 125 numbers per call to printf(). Also, the query (kth) numbers are not sorted.

I believe the sorting would have the challenge test fail because the replies would be in the wrong (sorted), order. I could correct it, but it simply isn't fast enough.

I believe we know what the answer is, and we just have to find the best way to use it. My next test should be the best, using the B2 version.

With the "A" Algorithm:
   With Big[]or aux1 array, common IDE times are    5.40 seconds
   with one number per call to printf().
   With Big or aux1 array, printing 128 numbers per 
   call to printf(),(in the terminal) times are     2.61 seconds
       Inside the IDE: . . . . . . . . . . . . . .  3.48 seconds

With the "B" Algorithm:
   Has no big or aux1 array at all. Each number is 
   printed with one call to printf(). in the IDE    
   The kth numbers are all sorted first.            5.38 seconds

   Same as directly above, but in the terminal
   instead of in the IDE.                           4.58 seconds                   

   Same as above, EXCEPT, 1.the print calls are not   Should be good! ;)
   made one at a time. Many numbers are printed
   with each call to printf(), and 2. The kth 
   query numbers, are not sorted.

I don't expect to use the aux1[] array again. More hassle than it's worth, imo.
File size is currently (with the B algorithm), below 5KB, so no problem with our size.

I'll be doing the B2 test, this evening, after I finish modifying the current code.

Have you tried the "print many numbers with one printf() call", idea, in your program yet? If so, how did it perform?

The testing info on the B algorithm, second version is: 3.12 seconds in the terminal. This is the version intended to have the "B" algorithm, but it didn't quite work that way! ;)

Since the sorting idea went sour (a great idea, but it gave the results out of their original order, which I'm sure is what the test challenge requires), there was no way to directly get the data, organized as efficiently as I'd have liked. You either had to transfer the data (an extra step I did not want to have to make), or print through an index (which is just a touch slower and it adds up). I chose the latter.

Still using the very large printf() call (now at 250 numbers being printed per one call to printf() ).

Size of the file is 6.8 KB now, including some few comments that could be removed, and the few lines of code to communicate with the testing PC, that I have added.

So size is no problem.

So the overall champ -- and still king of the Challenge, is the A algorithm, with the smaller aux1 array, and using huge print calls. It's also a pain to get just right. Overall, I prefer the B algorithm. It's much easier to get working right.

I am not clearly getting you adak what you are trying to do with the sorting approach, but your quickshort is really fast,

Please look on to the Segmented seive algo it will make task approachable and make seive algo much more fast.
For accessing element there is nothing much problem, problem is in making seive algo fast..
so, divide and run the sieve algorithm into segments.
means first 100000, then next 100000
so on for a range only,
I am thinking and trying this.
please any one of you look into it.(Adak, smeagle, Diwakar)

I am not clearly getting you adak what you are trying to do with the sorting approach, but your quickshort is really fast,

Please look on to the Segmented seive algo it will make task approachable and make seive algo much more fast.
For accessing element there is nothing much problem, problem is in making seive algo fast..
so, divide and run the sieve algorithm into segments.
means first 100000, then next 100000
so on for a range only,
I am thinking and trying this.
please any one of you look into it.(Adak, smeagle, Diwakar)

That isn't Quickshort - Quickshort is only about 6 lines of code, in the whole thing! (But it's not this fast). This is a full Quicksort, and it is fast, BUT you can't use an unsigned int for any index. It has to be a regular int.

Anyway, yes! The sorting idea is out the window for this program ==> Forget it, it's Toast! ;)

I did look into both Segmented Sieve of Eratosthenes, and also Wheeled versions. The problem is, they were no faster in the tests I ran against an optimized Sieve of Era., without the segments or wheel aspects to them.

I didn't have time to look at several different examples of each, so there may be faster Segmented or Wheeled versions of the Sieve of Era. out there. I was more interested in getting a good basic version that was easy to understand fully, and then tear it apart and rework it different ways, for testing ideas out - and optimizing those ideas that reduced the run time.

The key to making a fast program is to always reduce any loops, as low as possible. In the Sieve of Eratosthenes, I see these loops:

1) Input from the testing PC. These are a queries number, and the actual kth numbers we have to print out the corresponding prime numbers for, later.

We can't cut down or reduce this loop, in any way. That's clear.

2) The markup of the Sieve array. I've tried a few faster ways to do this, but some have caused errors. I do this in three steps now:

First: I do multiples of 2 only, from i=4 to i< SieveSize with i+=2.
Second: I do all the other multiples, from i=3 to i< SieveSize with i+=2.

The reason is, if two is included, i can only be incremented in the outer loop, by 1.

Third: multiples of i. No division is used. Addition where possible instead of multiplication. j is my variable for this, and it's range is i+i, to j< SieveSize. Especially here, I've tried some speedups (cutting down on the top of the loop - replacing SieveSize here, with something less than SieveSize). And that's where either it makes no difference in run time (because the amount I cut down was rather small), or it produced errors.

Inside the markup outer loop, there is one test for !sieve and if so, a counter increments, and there is one assignment made into the primes array (primes[counter]=i.

That is the entire loop, in a nutshell. Maybe 4 lines of code or so.

3) The last main loop, is the print loop, and that's the one where printing multiple numbers with each call to printf(), helps out. You should try printing a hundred or more numbers at a time. Make the number you print per loop, a factor of 50,000. This one idea cut more than a full second off my run-times.

Tip: instead of using this print statement:

printf("%d\n%d\n%d\n%d\n...etc.", primes,primes[i+1],primes[i+2],primes[i+3]...etc);

Try this:

printf("%d\n%d\n%d\n%d\n...etc.", primes[++i],primes[++i],primes[++i],primes[++i]...etc);

I leave the 0 element in primes[] blank, so this works well. Easy to just copy and paste, since there are no individual numbers (1,2,3,4, etc.) inside the index, to have to deal with.

This is the idea.
[ Note:
My current program prints using a more complicated index, which would make no sense to post, but here it is: primes is shortened to just p to save space, and queue array is shortened to just q:

p[q[++i]], p[q[++i]], etc.

It won't make sense unless you are used to working through an indirect index. The p[] array has the prime numbers, but doesn't know what order to print them out in. It gets that info from the q[] array, using it as it's counter.]

Imo, that's fast looping, and you can't make it any faster without going bit level, or assembly on it, but that is from testing on MY PC, and your system may be similar, but have a very different result, due to a different hardware design. My times are 2.9 to 3.0 seconds, using an external timer (always a tad slower that way).

@CSE: If you have a Segmented or Wheeled version you want compared, I'll be glad to run it on my testing PC, so we can see what it can do. Post up the url to it, or the code.

This is the best "B" Algorithm speed that I can manage right now. It takes 2.6 seconds on my PC, and a lot of that speed up is due to the very large printf() calls - each one is printing out 500 numbers at a time.

I'm going to go back to the "A" Algorithm now, and see if I can tweak a bit more, using what I've learned from working on the B algorithm programs.

This is a test in the command window, with an external timer (ptime.exe), with data being redirected using stdin.

@Smeagel, you've been quiet lately, any status report?

CAUTION: I found out the gcc C compiler does NOT like the increment of i in the printf() statment. So this:

printf("%d\n%d\n",primes[[B]++i[/B]], primes[[B]++i[/B]])

exprression, should be avoided.

I'm trying out the rather tired primes[i+1],primes[i+2],primes[i+3]
expression in a printf() statement, to see how that works.

It worked fine for me on Pelles C, but they're testing on the gcc compiler, and testing the programs, by running them on a battery powered abacus, I swear! ;)

Yes, I submitted a B(2) algorithm program for testing - "time limit exceeded". It had run in 2.6 seconds, repeatedly. Now it can't run in under 10 seconds??

C'mon! Hamster running low on food? Timex Sinclair PC getting overheated?? Grrr!

Interesting - I tried to run my program twice on Sphere Online Judge's website - both times it exceeded the time limit of 10 seconds.

Then I tried it twice on IDE One. Both times they reported it had a run-time error.

Which leaves me wondering what's going on with this? The program runs fine time after time, and it's run time is between 2.6 and 4.2 seconds, depending on the version of it.

So I went into the forum and read up on the testing hardware - They're PIII Xenon's!!

My compiler is modern (just updated this year), and naturally I have no idea of how to make a program compile correctly for PIII's! They're using gcc's compiler, which should be OK for the PIII, but I've never used the gcc compiler.

I'm not confident I can meet this challenge on a PIII based system.

Not much new to report. I tweaked my program a bit. It runs in 2.62 seconds, but that number depends on the number of rows in the command window. A larger window takes longer to scroll up all those 50K numbers. This version was printing out 250 numbers with each call to printf(). Indeed, the entire program code has very few lines that AREN'T part of the print loop.

With a very small window (just a couple of rows), run times decrease to less than 1 second. Kind of fun to watch. ;)

Unfortunately, the testing rigs PIII cpu was not impressed when it ran it, so it declared "Time limit exceeded". Such a buzz killer, that testing rig. :(

I have one more speed up idea in mind. (I keep saying that, one more, just one more). This is it:

the kth numbers come in, and go into the query array - call it q[]. The prime numbers, as they are found by using the Sieve, go into the primes array, call it p[].

When I have all the primes, now I have to just print them out, but it has to be done in the order requested by the queries (that was the shortcoming of the sort idea, mentioned earlier in this thread).

To print with the right order, I have been using:
printf("%d\n%d\n",p[q-1],p[q[i+1]-1]); and so on.

Now, I'm going to change this printing to a more direct p1 type printing. p1 will receive it's values like so: p = p[q-1], in the loop that finds all the primes. That will slow down the loop, but hopefully the speed up in the printing, will more than make up for it.

After that, it's all about fast printing to stdout, which I've never worked with. Who puts out reams of data like our 50,000 primes, to stdout? It scrolls off the screen to fast to even read, let alone use.

The algorithm and code to generate the primes is a screamer, and I don't know of anything faster.

Low level i/o to the screen, (maybe binary, maybe just unix style i/o) is the next step after this one above. I've been checking out some info on this, but I'm not quite ready to declare printf() toast. Not quite yet.

I see where Diwakar has made several test submissions. Any new idea's you're trying, Diwakar?

The idea in the last post I made, showed no improvement in the run time.

Next idea was to assign the top several prime numbers, directly into the primes[] array. Due to the source file size limitation for this Challenge, it was of no benefit, either.

Ho Hum - another fine idea, shot down! ;)

I'm stuck at 2.66 seconds (on my PC), which is too slow on the testing PC (max time is 10 seconds). (With a "normal" command window size with space for 24 vertical rows. Less rows, means shorter run time.)

Next idea is clearly a way to speed up the printing to the console. I believe I have finally exhausted every other idea, for the moment.

I see where Diwakar has made several test submissions. Any new idea's you're trying, Diwakar?

No, I just couldn't believe it was showing "time limit exceeded " for the code I submitted.But yes,I tweaked it a little bit,didn't help much.I replaced the multiplications with bit shifts, did a little work to reduce the redundancy during marking off composites and printed the primes at a tab distance rather than one at each line.

void sive(int size) {
    int i,j,k,l;
    char *sieve = calloc(size, 1);
    for (i=2,k=1,l=2; i*i <= size; i=(k<<1)+1,k++,l=i<<1) {
               if (!sieve[i]) {
                       for(j = i*i; j < size; j+=l) {sieve[j]=1; }
    for (i=2,j=0; i<size; i++) {
        if (!sieve[i]) { sieve[j++]=i; }


But I'm not out of ideas yet.You may have seen guys mentioning Segmented sieve in the forum there.I haven't understood the concept thoroughly, though I have read about it couple of times but you know all those mathematical symbols drive me nuts.I remember someone wrote on one of the other forums that you first produce a list of primes within a small range and use those primes intelligently to produce other larger primes.Hmm, what does that mean?
And also about the Sieve of Eratosthenes, We're now using a whole integer (of the 'sieve' array) to represent a bit that marks composite, instead can we use just a single bit to represent that.
Once, I was very fascinated by the Wheel Factorization, later I came to know that the numbers produced by the method are not all prime numbers.May be we could take advice from other forum members, may be they have some new ideas to share!

Good to hear about your quest, Diwakar. ;)

I've been using this for my Sieve program:

for(i=3;i<86026253;i+=2){  //was < SieveSize
         ++pcnt; p[pcnt]=i; 
       for(j = i+i;j<86026253;j+=i)s[j] = 1;   //was < SieveSize

pcnt is the counter for the primes, as they are found. p[pcnt] is where the value of the prime is stored. That loop is as optimized as I can make it. I have an array of the Top[] prime numbers, that cuts down on some of the sieve looping, but it has minimal to no impact because of the small size of the file limit. If I minus one from the VALUE, in the i < VALUE, the sieve will be incorrect. That's true for both the top and the inner nested for loop VALUE, as well.

I'm using the multiple number prints with every call to printf() (using fprintf(stderr, or stdout, helps).

Just for fun, I submitted one version where it printed \r, instead of \n at the end of every number (Carriage return, but no line feed). It ran in less than 2 seconds on my PC. Failed - you guessed it - Time limit exceeded!! Holy Moly!

I held onto the idea that computing all the primes in the range, because it happened so fast, was the way to go. I'm now quite satisfied that is NOT the way to go, because I've optimized everything in the place, the pots and pans - the dirty dishes, -- you name it. LOL.

Right now, I'm looking into a faster way to print out the answers, but I don't have a lot of hope for that.

Early on, Smeagel had the idea of just calculating some of the primes, and then only calculating the primes that we needed to give the answers. We stopped looking into that when we saw how fast all the primes could be calculated - little dreaming that the testing rig would have a PIII cpu!

I have read Avinash's thread in the SPOJ forum, but there wasn't any info of any help. Bits can be used, but I'm not experienced with them beyond a very brief exposure years ago. Bit shifting is fine if you need it, but beyond that, I'd hate to get into bit work.

Thanks for checking in and keep in touch. New idea's have to be looked into, since the current one's are not enough to meet the challenge.

We all have done a lots of research in solving the question, I think we should try this question before solving the above question. I am concentrating in the question PRIME1 , if this question is solved then TDKPRIME will become easy to solve.
So please try the question in the link first.

With a nudge from Diwakar, I'm trying to implement a "poor man's" segmented Sieve of Eratosthenes. Since I did most of my beer drinking at home, not in a frat, that promises to bring us all to new heights of hilarity/despair and achievement. ;)

The implementations I've seen for it, as well as the descriptions, get my brain on overload. Anyway, the SORT is back in play, and this time it's the little QuickShort to the rescue:

void quickShort(int A[], int l, int u) {
  int i, m;
  if(l >= u) return;
  m = l;
  for(i = l+1; i<=u; i++) {
    if(A[i] < A[l])
      swap(A,++m, i);
  swap(A, l, m);
  quickShort(A, l, m-1);
  quickShort(A, m+1, u);
//With some help from swap():

void swap(int A[], int l, int m) {     
  int temp;
  temp = A[l];
  A[l] = A[m];
  A[m] = temp;

If it's not fast enough, I'll replace it, but it saves a lot of room.

I won't be sorting the queries, I'll be sorting an index to the queries, so the queries will be kept in their original order, but I'll be able to look at them for processing, in sorted order.

Hopefully, the program can find the biggest gaps quickly, and set the sieve loop finder so it can "jump" ahead, when it reaches these larger gaps. I divided 86 Million by 50,000, and it's only 1,720. So we may have a lot of gaps, but not a lot of really big gaps.

If we can store 400 "marker" positions, in the program, we'll only have a marker about every 40K primes. Any segment that we can jump over will be a big boost, but I'm not sure we'll have a lot of segments without queries.

I can code this up, (after a fashion), but getting it to be fast enough, and small enough, that will be a challenge.

So I'm thinking about how to design the next part of the "poor man's segmented Sieve of Eratosthenes" program, when I notice that every single darn one of the programs I see, all have a DIFFERENT top and bottom for loop, to mark the sieve.

OK, some programs are designed to print out every prime less than say, a million - and we're not doing that. They can use the square root rule for primes being <= the square root of the top number (a million in this example), and we can't use that method, since we need any prime requested, up to 86 Million. That is, THE PRIME, can be up to 86 million, that is NOT a top number to stop calculating the primes at.

Anyway, so I start printing up my Sieve array, after every time through the loop. Just what IS actually being marked?

And dammit! I've got a TON of useless EVEN numbers being marked as composite, even though the top loop deals only with odd numbers!! The bottom (nested) for loop, was happily wasting time marking out useless numbers!! FCOL!!

I'm still checking the total accuracy of the changes I made to remove that waste, but in preliminary testing:

1) it's 0.5 seconds faster (from 2.66 to 2.1 seconds), which is a healthy percent increase, and

2) it's accurate.

I'll post a screen shot of what I'm talking about after I get the accuracy confirmed.

I don't know, and don't believe it's possible, to remove every repetition in the Sieve marking loops, because when 5 is marked, it will mark out 35 for instance. Now when 7 is marked, 35 will be marked out again. You can stop the second marking, but only by wasting more time than it takes to actually mark 35 with a one, for a second time.

So CHECK THOSE SIEVE LOOPS, and be sure that:

1) in the top loop, you are only working with odd numbers, never even.


2) in the nested lower loop, you are marking out ONLY odd numbers, never even

Naturally, in your output of primes, you will ONLY consider odd numbers for primes, because with this scheme, the even numbers will look like primes, also. Don't look at the even numbers, after 2!

It checked out as accurate! ;)

Here's the screen shot of the sieve being marked, with an upper limit of 65, so it can be studied.

The number on the far left, is the number whose multiples are being marked as composite numbers. The composite numbers being marked are stretched out to the right hand side, on the row. 63 wrapped around, it's being marked as 3 X 21.

You can see there are NO even numbers now being marked - and that's a nice efficiency gain. You can also see the several numbers being marked twice - like the 35 I mentioned in the last post. (Since it is a multiple of 5 and a multiple of 7 also).

Check your sieve marking loops, and see how your program handles it. I thought since my loops made sense, were quick, and accurate, that they were optimal. They certainly were not!

It's still too slow to pass the SPOJ TDKPRIME test, but it's the fastest version I've had, so far. (2.1 seconds on my PC).

Hey adak I just noticed you found the solution for this problem, I am still worried about this question but don't know how to convert sieve algo to segmented sieve..
Please help me how u did this problem..:-/

Hey adak I just noticed you found the solution for this problem, I am still worried about this question but don't know how to convert sieve algo to segmented sieve..
Please help me how u did this problem..:-/

We have good news and bad news:

Good news:
Passed the challenge in good time (7.9 seconds).

Bad news:
The majority of the code was all from someone else, I just submitted it for him, and had to use my name, to do that.

The successful code did NOT perform at all well on my PC, taking 15 seconds - and this is one fast overclocked PC, not a slow test rig ==> an 850MHz server! (Just a reminder that my own code does this test in 2.6 seconds, on my PC, yet fails to finish in 10 seconds on the challenge testing PC.)

The testing rig uses the Linux compiler, and I use a Windows compiler (Pelles), and their compiler does a better job, with this code - and the effect is hugely magnified because of the repetitive nature of the program.

The program itself (the successful program), builds up an array of the answers first, and then uses fwrite() to output all the answers, in one single call.

When no one posted back into this thread for several weeks, I lost interest in the challenge. I don't want to post the actual code because it would make the challenge, useless.

I have a paper that describes the "wheel" (what you call the segmented sieve I believe). If you're interested, I'll dig into it.

When no one posted back into this thread for several weeks, I lost interest in the challenge

Sorry Adak..I was busy with my exams so couldn't research in this field .
I studied segmented sieve from this site, but still having lots of doubts.
If you go through this site then please make me understand these lines

35         mset(segment,0);
36	    for(i=0; sq(primes[i])<=b; i++)
37	    {
38	        j = primes[i] * ( (a+primes[i]-1) / primes[i] );
39	        if(j%2==0) j += primes[i];
40	        for(k=primes[i]<<1; j<=b; j+=k)
41	            if(j!=primes[i])
42	                setC(segment, (j-a));
43	    }
44	    for(i=0; i<=b-a; i+=2)
45	        if(!chkC(segment, i))
46	            cnt++;

I saw this site much earlier in the thread here, but thought "oh, this is much more optimizations that we need for an SPOJ problem", and kept on going.

I'll revisit it and see what I can gleen from what he's doing. He's doing more than "the wheel", there.

Hmmm what I suppose to get from that site is, after applying that thing we can solve this problem easily, but I am stucking in that part only :(

My sieve code is strung out over a 8 projects by now - just trying too many options, too fast. I'm working on getting it back to something sane.

Be patient, it will happen - at least enough to pass that SPOJ challenge - can't promise anything beyond that.

And *Happy New Year*!

I was able today to fix the compiler problem I had since day #1, with printing out the answers quickly - like the testing rig does it. My compiler uses a very small default buffer for stdout (the monitor in this case). It just had to be made larger to perform printing at a good speed.

YEAH!! ;)

#include <time.h>
#include <vector>
#include <iostream>

private: System::Void button1_Click_1(System::Object^  sender, System::EventArgs^  e) { 
    using namespace std;
    UInt64 cloc = clock();
    long m = 1;
    long k = long::Parse(textBox1->Text)-2; // The textbox you entered for the nth prime.
    std::vector<long> p(1,2);
    for( long n = 0; n <= k; n++ ) {
        m += 2;
        for( long l = 1; l <= n; l++ ) {
            if (m % p[l] == 0) {
                m += 2;
    textBox2->Text = p[k+1].ToString(); // The textbox for the result.
    MessageBox::Show("It took me " + (clock() - cloc).ToString() + " milliseconds to find your prime.");

Finally, it is a shortened and stable version of my original code.

int k = "the nth prime you like" -1;
int l,m=1,n,o;
int*p;p=new int [k+1],p[0]=2;


"the result" = p[k]

In fact the exact expression of my original code is this below. Even it is a shorter and slightly faster way,

int k = "the nth prime you like" -1;
int l,m=1,n,o;
int*p;p=new int [k+1],p[0]=2;


"the result" = p[k]

however it is unstable because of memory initialization issues and I couldn't overcome it. After 1000000th query it is giving division-by-zero error. Sometimes o gets 0 value even there is nothing such as 0 in array p. I can prove it because it runs so well around 1000000th queries. And m%0 causes the error. That's why I had to secure o many times in the first code. If someone can handle it, will be my welcome.
But the first code runs perfectly for 10000000th query around 40 seconds on an ordinary pc.

But it's NOT a contest for the shortest code - the length of the code is not the problem to be solved. Just keep it reasonable on the size.

The only problem really, is the speed. In my case, it wasn't possible because my compiler has a small default console buffer - so just getting the data, without any computations, was putting me outside the time limit.

Finally I learned 1) that the console buffer was indeed smaller, and had to be changed, and 2) how to change it.

After that it was not a problem, using the Sieve of Eratosthenes algorithm.

I don't believe you should be posting the answers (did it pass the test:
in a public forum however. Let people figure it out. If you want to discuss it, pass the test for this problem, and you can then go into the SPOJ forum, and discuss this problem, as much as you want.

I have come up with this solution, but as time constrate it take long time for 100000th solution.

int isPrime(int num)
    int i;
            return 0;
    return 1;
int main(){
    int num=2,max,i=1;

    printf("%d th prime number is %d\n",max,num);
    return 0;

thanks for the solution...bcz even me also not able get the solution for nth prime number