I would like to ask if anyone know how can I create a tokenizer for a txt file in C++.
I find it difficult because there are not only words but there are also numbers and <p id> tags.
I have attached the file that is needed to be tokenized.
Could anyone help me ?
Thanks a lot