Hello All,

Please, advice some as quick as posible algorithm to sort file containing about 500 000 integers. Roughly 30% of them have value 1, 20% are 2, 10% 3......0,1% are 1 000

Thanks much,
C

Recommended Answers

All 7 Replies

Well there is std::sort , which will work on a std::vector of integers.

If you know what you're doing, it would take about 10 minutes to write - is that quick enough?

Or how about using your command line sort program sort -k1n -o result.txt input.txt

Gotta love the SWAP macro too...

#define SWAP(a, b) __typeof__(a) temp; temp = a; a = b; b = temp

> Gotta love the SWAP macro too...
Or better yet, the STL swap function from <algorithm>. ;)

That macro invites death without warning.

If you want to swap something, #include <algorithm> and use std::swap().

@cecyl
Generic algorithms will work just fine, but if you know something about the probable distribution of your data then you can reduce sort time considerably.

If you know that you have a relatively small set of values, then you can reduce your sort time to constant or near-constant time by simply counting how many times each value appears, then writing them back out in order.

Good luck.

[edit] Man, get up to visit the bathroom and someone smart answers first...

Thank you all for advices. I have to sort it much, much more quicker, are 10 seconds possible?. Can it be quicker using some low-level code like assembler,....?

ooops, I have understood one advice wrong, I thougt it should take 10 minutes to sort, ...but it takes 10 min to write the code (not for me).
Going to bathroom is good idea as well :*

> I have to sort it much, much more quicker, are 10 seconds possible?.
with 10 seconds at your disposal, you can sort it any which way you like.
the fastest way would perhaps be to use std::partition to move the most frequently occurring values (which are also the smallest values) to the front of the sequence. sort the remaining 40% or so of the sequence using a std::sort .
it doesn't really matter if you are sorting a file; i/o would dominate.

here are some measurements of a sample run on a sequence of 500000 integers (roughly 30% of them have value 1, 20% are 2, 10% 3, remainder are [3,1000] ):
a. std::sort on the sequence: 0.0390625 seconds
b. std::partition for 1, 2 and 3, std::sort for the rest: 0.0234375 seconds
c. same as b, but read from a file sort and write to another file: 0.335938 seconds

>dmesg | grep CPU:
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2800.10-MHz 686-class CPU)

perhaps, Salem's suggestion of using a command line sort program is all that you need.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.