To clarify (a primer on dealing with biological sequence data):
A FASTA file is a text file containing an oligonucleotide or protein sequence and some header information. These files are very popular with computational biologists and are (I believe) the most popular way of formatting biological sequence data for BLAST and phylogenetic (e.g., Phylip) analysis. The following is a sample FASTA file containing the mRNA sequence of a developmental gene, PitX2, from mouse.
>gi|109948276|ref|NM_001042504.1| Mus musculus paired-like homeodomain transcription factor 2 (Pitx2), transcript variant 1, mRNA
GGAGAGAGAGTGCGAGACCGAGAGAGAAAGCCGGAGAGCAGCAGACAGAAACTGCCGGCGCCCGCTAGCT
TTAGCAGCCCCCCGCGTGGACCCTCTCGGAACTTGGCACCCTCAAGATCCCCGCAGTTCCACCCAGACCC
GCTCCACGGCGCTGGCTGTGCAGCCCGAGCCTCGGCCGCCTGGCAGTCACCCTGGGAAGCGGTGGGACGG
GGAGACAGCCGTTCTCTCTCCGGTAGCCGATAACCGGGAATGGAGACCAATTGTCGCAAACTAGTGTCGG
CCTGCGTGCAATTAGAGAAAGATAAGGGCCAGCAAGGAAAGAATGAGGATGTGGGCGCCGAGGACCCGTC
CAAGAAGAAGCGGCAACGCCGGCAGAGGACTCATTTCACTAGCCAGCAGCTGCAGGAGCTGGAAGCCACT
TTCCAGAGAAACCGCTACCCAGACATGTCCACTCGCGAAGAAATCGCCGTGTGGACCAACCTTACGGAAG
CCCGAGTCCGGGTTTGGTTCAAGAATCGCCGGGCCAAATGGAGAAAGCGGGAACGCAACCAGCAGGCCGA
GCTGTGCAAGAATGGCTTTGGGCCGCAGTTCAACGGGCTCATGCAGCCCTACGATGACATGTACCCCGGC
TATTCGTACAACAATTGGGCTGCCAAGGGCCTCACGTCAGCGTCTCTGTCCACCAAGAGCTTCCCCTTCT
TCAACTCCATGAACGTCAATCCCCTGTCCTCTCAGAGTATGTTTTCCCCGCCCAACTCCATCTCATCTAT
GAGTATGTCGTCCAGCATGGTGCCCTCCGCGGTGACCGGCGTCCCGGGCTCCAGCCTCAATAGCCTGAAT
AACTTGAACAACCTGAGCAGCCCGTCGCTGAATTCCGCGGTGCCCACGCCCGCCTGTCCTTACGCGCCGC
CGACTCCTCCGTACGTTTATAGGGACACATGTAACTCGAGCCTGGCCAGCCTGAGACTGAAAGCAAAGCA
GCACTCCAGCTTCGGCTACGCCAGCGTGCAGAACCCGGCCTCCAACCTGAGTGCTTGCCAGTATGCAGTC
GACCGGCCGGTGTGAACCGCGCCCAGGGCGCGGGGATCCGAGGACTGTCGGAGTGGGCAACTCTGCCCCA
GAAAGACTGAGAATTGTGCTAGAAGGTCGTGCGCACTATGGGAAGGAAGAGGGGGGAAAAAAGATCAGAG
GAAAAGAAACCACTGAATTCAAAGAGAGAGCGCCTTTGATTTCAAAGGAATGTCCCCAAGTGTCTACGTC
TTTCGCTAAGAGTATTCCCAACAGTTGGAGGACGCGTACGCCCACAAATGTTTGACTGGATATGACATTT
TAACATTACTATAAGCTTGTTATTTTTTAAGTTTAGCATTGTTAACATTAAAATGACTGAAAGGATGTAT
ATATATCGAAATGTCAAATTAATTTTATAAAAGCAGTTGTTAGTACTATCACGACAGTGTTTTTAAAGGC
TAGGCTTTAAAATAAAGCATGTTATACAGAATCAGTTAGGATTTTTCGCTTGCGAGCAAAGGAATGTATA
TACTAAATGCCACACTGTATGTTTCTAACATATTATTATTATAAAAATGTGTGAATATAAGTTTTAGAGT
AGTTTCTCTGGTGGATGCCTTGTTTCTGAAACTGCTATGTACGACCCATCCTGTGTATAACATTTCGTAC
GATATTATTGTTTTACTTTTCAGCAAATATGAAAAAAAATGTGTTTTATTTCTTGGGAGTAAAATATACT
GCATACAAA
If I understand msaenz correctly, he or she wants to create his or her own local database of these records (or at least, the ones which are important to his or her own research) so that entries can be pulled out in the form of dictionaries, with the sequence as one key-value pair, and the elements of the header as other key-value pairs. As any relational database needs a primary key, msaenz has (I think) chosen the GenBank accession (also a RefSeq ID, which is why you see "ref" immediately before it), which for this file is "NM_001042504.1." My recommendation is to use the NCBI ID instead, which for this file is "109948276." NCBI archives their own data using this field as the primary key, and as far as I know they are unique, so if msaenz is drawing his or her FASTA entries solely from NCBI and uses the NCBI ID as the primary key, he or she should not run into key collision problems when indexing.
The NCBI database can be found
here; oligonucleotide sequences can be searched for by clicking "Nucleotide" on the drop-down menu at the left and entering some search parameters.