How can I sample data from a database strata by date.

Table a, partitioned by date. I need to do a sampling of the table with more latest record and less older record and move it into a panda dataframe.

First we'll have to define those terms: From a Wikipedia, "Assume that we need to estimate average number of votes for each candidate in an election. Assume that country has 3 towns: Town A has 1 million factory workers, Town B has 2 million office workers and Town C has 3 million retirees. We can choose to get a random sample of size 60 over entire population but there is some chance that the random sample turns out to be not well balanced across these towns and hence is biased causing a significant error in estimation. Instead if we choose to take a random sample of 10, 20 and 30 from Town A, B and C respectively then we can produce a smaller error in estimation for the same total size of sample. (Source: Wikipedia)

Here, each town is a Strata."

In your case you need to work on what you define as a random date and then run the risk of no matches for that date. You will likely have to DEFINE how many samples you want then call for a random date and then keep calling until you get the number of samples you DEFINED as the sample size. This won't be a simple one line of code but something you'll think about then write down as psuedo code before you code. As such I can't begin to write any code that would tackle your problem. But I can see why you are stuck.

Go back to the beginning and think through how you would do this manually and once you have those steps clear in your mind, make some psuedo code with those steps. At that point you can start making code.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.