i understand the 1/1000 chance then theres another 1/1000 chance of getting the song again next time, meaning 1/1000 X 1/1000 chance of getting the same song twice in a row using the shuffle.
That's an analysis of not getting a SPECIFIC song twice in a row in two sequential playings. It's like saying "What are the odds of getting song number 17 twice in row straight away when I press play?". You don't care about a SPECIFIC song, and you don't care about WHEN they are repeated.
Probability is commonly counter-intuitive. Here's a take on the problem (and probability IS tricky - I stand by to be corrected...):
Given a first song X, what is the probability that it is not the same as any previous song? 1
Given that first song, what is the probability that the next song will not be the same? 999/1000.
What is the probability that the next song will not be either of the first two? 998/1000.
What is the probability that the next song will not be any of the first three? 997/1000.
So the probability that none of the first four are the same is:
1 * 999/1000 * 998/1000 * 997/1000 = 0.994
The question becomes how far do you have to go before the product of these probabilities is equal to or less than 0.5?
Just as an interesting value, the chance of playing 1000 songs and not getting ANY repeats is about 4 x 10^(-433), which is pretty small :) And of course, if you go for 1001 songs with no repeats, the final value to be multiplied by is 0/1000, so the probability of playing 1001 songs with no repeats is exactly zero, which is what we intuitively expect.
An alternative way of doing this would be to actually run the situation a million billion times. Get a decent random number generator, and start churning out value between 1 and 1000. Each time you get a number that you already had, make a note of how many numbers you had to generate and start again. Once you've done it a million billion times, you'll have a pretty accurate average answer.
As an aside, I understand that in the early days Apple's random play algorithm on its music players were truly random. People complained about them not being random, and Apple changed the algorithm to a more deterministic one that people felt was more random.