You would start by identifying the operating system you are going to write it for.
You then learn that operating system's (and possibly window manager) API for graphics. For example, if you were going to target windows, you would learn the Win32 API, perhaps by starting here: http://www.winprog.org/tutorial/start.html (that's been around for a while; I wouldn't be surprised if it was getting out of date, althought the Win32 API has been around for a long time and I expect it isn't going anywhere soon); then, you might work your way through Petzold's "Programming Windows", fifth edition, or 6th edition if you decide not to go with the Win32 API and instead go with Win 8 style). Other operating systems (+ window managers) would obviously dictate different learning resources.
Once you had mastered enough of your chosen operating system's API to the level that you can do everything you want to do with your toolkit (for example, creating windows, creating dialog boxes, accepting mouse and key inputs, menus and images and so on) you would design your toolkit. Since the purpose of the gui toolkit is to make it easier for people to create guis, without having to learn the operating system API, you would come up with a relatively simple set of objects/functions, and some kind of event handling model.
You would then implement your design; in essence, you're giving the toolkit user a relatively simple set of objects and functions to use, and your toolkit is taking care of all the complicated interaction with the operating system, using the operating system API directly.
If you then decide that you wanted your toolkit to be cross-platform, you would pick another operating system, learn the API of that operating system, and then reimplement your toolkit so that the same toolkit objects/functions cause the same things to happen on screen as on the first operating system you supported. Obviously, the interaction behind the scenes with the operating system would be completely different.
GUI toolkits are VERY complex. Moschops has identified many of the issues you face. My advice is to study other GUI toolkits, such as Qt, WxWin, etc for some perspective on the issues involved. Also, how many years do you have to spend on this project? :-)
Ah! A personal education project? Been there - done that! Post your work here and I'll at least be happy to comment. Remember, every operating system differs, depending upon their low-level GUI tool set. Linux/Unix == X-Windows/Xorg, Windows == whatever, QNX == Neutrino and/or Xorg. If you want to get lower into the process stack, then you will be writing a lot of assembler code to control the specific video devices used as well. IE, don't try to write the entire stack. Work with what is provided as a foundation.
root = tkinter.Tk()
buttons=[i for i in range(10)]
#If specific button is pressed, output "YES"
for num in buttons: