Hello,
i hope that someone can take the time to help in this.
Here exactly what i should exactly do: 
- Established the corpus.
- Prepare our project structure.
- Write a Perl script that :
      1 Browse the corpus.
      2 Cleans files and makes the necessary substitutions SYGMART .
      3 Call Sgmart and save the result .
The purpose of this project is to implement and evaluate a document classification method programmed in Perl.
**First step: formation of the corpus**
In a first step, a body should be formed . We propose to develop a body of five distinct themes (for exemple: politics , cooking, etc. ). This corpus will be normalized (removal HTML tags , etc ) . To do this , you will find ten texts written in French or English relating to each of these five themes.
**Second step: implementation of a classification algorithm**
Further work will be to implement a classification algorithm . many
learning approaches can be used for text classification :
o K nearest neighbors
o Decision Trees
o Naïve Bayes
o Neural Networks
o support vector machines
In this project, we propose to use the well-known method of K nearest neighbors ( KNN ) view
in progress.
Third step : taking account of linguistic information
The goal here is to use your texts with different information:
o Gross Texts .
o lemmatised Texts .
o Texts lemmatised with parsing .
**The project structure** as I see it is this:
ROOT
|____REP Article
     |____REP Donquichote
          |
          |
          |____REP Art
               |
               |
               |
               |____Txt files
          |
          |
          |
          |
          |
          |____REP clean
               |____Txt files cleaned
          |
          |
          |
          |____REP tag
               |____Tagged files in .txt format
          |
          |
          |
          |
          |
          |____REP vect
               |____Txt files
     |
     |____REP ParisElection
          |
          |
          |____REP Art
               |____Txt files
          |
          |
          |____REP clean
               |____Txt files cleaned
          |
          |
          |____REP tag
               |____Tagged files in .txt format
          |
          |
          |____REP vect
               |____Txt files
     |
     |____REP SarkozyCarla
          |
          |
          |____REP Art
               |____Txt files
          |
          |
          |____REP clean
               |____txt files cleaned
          |
          |
          |____REP tag
               |____Tagged files in .txt format
          |
          |
          |____REP vect
               |____Txt files
    |
    |____REP SkiGrange
          |
          |
          |____REP Art
               |____Txt files
          |
          |
          |____REP clean
               |____Txt files cleaned
          |
          |
          |____REP tag
               |____Tagged files in .txt format
          | 
          |
          |____REP vect
               |____Txt files
          |
          |____REP Tf1DaylimotionYoutube
               |
               |____REP Art
                    |____Txt files
               |
               |
               |____REP clean
                   |____Txt files cleaned
               |
               |
               |____REP tag
                    |____Tagged files in .txt format
               |
               |
               |____REP vect
                    |____Txt files
|
|____REP Binary
     |____Executions files
|
|____REP Data
     |____...

Recommended Answers

All 3 Replies

We can't help until you tell us what your problem is. You have clearly delineated what you need to do. What have you done so far? What problems have you encountered? Are you clueless about Perl? If so, there are abundant documentation and tutorials on the Internet for you to study. We don't do your homework for you, but we will help you resolve the problems that you encounter that are beyond your abilities to solve.

Well mister rubberman I didn't asked you to do the howework for me I juste said that i need some help and i tried and i have done 2 methods but so far they doesn't work they are some mistakes.
You didn't ask if i had tried or no

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.