help me asp.....

In this assignment you have to write functions (in python) that applies various preprocessing
techniques which are popular in Natural Language Processing (NLP). For this purpose, you
are supplied with a Text Dataset (“corona_data\test_small.tsv”) and you have to process
that dataset using the following techniques:

  1. Tokenization
  2. Text Lowercase
  3. Remove HTML tags
  4. Convert number words to numeric form.
  5. Remove numbers
  6. Remove punctuation
  7. Remove extra whitespaces
  8. Convert accented characters to ASCII characters
  9. Expand contractions
  10. Remove special characters
  11. Remove default stopwords
  12. Stemming
  13. Lemmatization
  14. Part of Speech (POS) Tagging
  15. Named Entity Recognition
    You will find the details as well as the codes to do all of the above preprocessing in the
    Important Links section under LAB – 7 in the Google Classroom.
    Important:
    Please follow these rules while you do the preprocessing:
  16. Use separate functions (modular programming) while you do each of the above
    preprocessing.
  17. Only preprocess the “Example” field of the dataset. Do NOT process the Labels.Page 2
  18. For each of the preprocessing steps do the following two things –
    a. Apply it independently on the dataset and write the output as a text file (name
    the text file as – <name_of_the_preprocessing>_out.txt)
    E.g., tokenization_out.txt … text_lowercase_out.txt
    b. Apply it sequentially from the output of previous preprocessing(s), and finally
    save the complete output (as “preprocessed_out.txt”) when you are done
    with subsequent preprocessings steps (1 to 11).
    c. For outputting the TEXT file - You should write your own function that can take
    a list of strings (data) and write them into a .txt file under a directory named
    output automatically.
  19. Write proper Comments inside the code. Failing to do so will reduce your grades. Also
    copying others comments/code might give you a negative mark.
  20. Make sure you produce the output text files inside a folder named “output” which
    should be inside the same folder of your code. Rename the project folder having
    your code(s) and output files as your StudentID_Section and make it a .zip before you
    upload it as the solution to Assignment in Google Classroom. In case you use the
    Google CoLab, your code should automatically create the output folder in the
    corresponding google drive following the aforementioned folder structure.

Recommended Answers

All 5 Replies

The assignment looks to be what is given after many classes on the subject matter as well as drawing from your prerequisite coursework.

If you need clarification on the assignment that would be from the instructor as I didn't give out the assignment.

As you know supplying for the answer rarely happens. But you can start a discussion on something specific such as what is Python etc. But again you must show some evidence you are working on the subject.

commented: he told us not to disturb him. +0

I do not quite understand the task to which. I was given .

Lists are given:

a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89];

b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13].

You need to return a list that consists of the elements that are common to the two lists. Who can tell?

You posted 20 items there. What do you want to discuss?

Time to pare down and get one thing done at a time. Sometimes you get to say, build a house. You can't do everything at one time. Example: You create the foundation and home before you put on the roof.

You have to learn the idioms.

You need to return a list that consists of the elements that are common to the two lists. Who can tell?

list(set(a) & set(b))
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.