Translator using Hashtable

Question

milan2011 0 Light Poster

13 Years Ago

Hi,
I am trying to use a hash table to sort the two string words from an input file ,display words and again put them back to an output file.

so if this is my input in input.txt:

thank Merci
yes oui
hello bonjour

the sorted output in output.txt will be:

hello bonjour
thank Merci
yes oui

I was wondering if you can help me start it.Thanks!

algorithm c++

3 Contributors
20 Replies
681 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by m4ster_r0shi

mike_2000_17 2,669 21st Century Viking

13 Years Ago

>>"to use a hash table to sort"

I'm not an expert on hashing, but this statement seems to make little sense to me. Hash tables are rarely used to sort anything (which requires a hash function that respects the ordering of the keys, which typically makes it very difficult to also make the hash function respect the distribution of keys (especially word-keys)). Overall, there will be so many collisions (or too many buckets) that it won't be worth it. Use a std::map or std::set instead. Since you want to map English words to French words, I suggest you use std::map<std::string, std::string>. Like so, for example:

#include <map>
#include <fstream>
#include <string>

int main() {
  std::map< std::string, std::string > dictionary;
  std::ifstream inFile("input.txt");
  std::string temp;
  while((inFile >> temp) && 
        (inFile >> dictionary[temp]))
    ;
  //.. then output as you wish, 'dictionary' now holds a sorted list of word-pairs.
  return 0;
};

m4ster_r0shi 142 Posting Whiz in Training

13 Years Ago

There are a couple of things that need to be fixed.

(1) This is a problem -> p = (Translator*)malloc(sizeof(Translator)); because malloc doesn't call constructors. This means that the underlying
string objects are not initialized properly (perhaps some internal pointers
are not set?). Therefore, the program crashes when you try to use them below.

The solution to this is to simply use new instead of malloc -> p = new Translator; (2) You also have an out-of-bounds problem in your hash function. How do you
know that your string's size will always be 99? It should look more like this:

int hashing (string s)
{
    int total = 0;

    for (int j = 0; j < s.size(); j++)
        total = total + (int)s[j];

    return total % 99;
}

Since you've made it this far by yourself, I'll give you a simple implementation for
searchDef, because there is another important problem that I want to point out.

Here's your code slightly modified and with searchDef added:

#include <iostream>
#include <string>
#include <cstdlib>

using namespace std;

struct Translator
{
    string english;
    string french;
};

Translator * dictionary[99] = { 0 };

void mainMenu();
int hash(string s);
void addDef();
void searchDef();
Translator * createNode(string a, string b);
void quit ();

int main()
{
    while (true) mainMenu();
}

void mainMenu()
{
    int choice;

    cout
        << "Welcome to English-French translator\n"
        << "What do You want to do ?\n"
        << "1. add Definition to Dictionary\n"
        << "2. Search for Definition\n"
        << "3. Quit\n"
        << "Enter the number you want :  ";

    cin >> choice;

    switch(choice)
    {
         case 1: addDef();        break;
         case 2: searchDef();     break;
         case 3: quit();          break;
        default: cout << "wrong choice";
    }
}

Translator * createNode (string a, string b)
{
    Translator * new_node = new Translator;

    if (new_node != NULL)
    {
        new_node->english = a;
        new_node->french  = b;
    }
    else
    {
        cout << "Memory FULL";
    }

    return new_node;
}

void addDef ()
{
    string a;
    string b;

    int hashVal;

    cout << "Enter Word in English : ";
    cin  >> a;

    cout << "Enter the transelation in French : ";
    cin  >> b;

    Translator * new_node = createNode(a,b);

    hashVal = hash(a);

    dictionary[hashVal] = new_node;
}

void searchDef()
{
    string a;
    string b;

    int hashVal;

    cout << "Enter Word in English : ";
    cin  >> a;

    hashVal = hash(a);

    if (dictionary[hashVal] && dictionary[hashVal]->english == a)
        cout << dictionary[hashVal]->french << endl;
    else
        cout << "couldn't find word..." << endl;
}

int hash(string s)
{
    int total = 0;

    for (int j = 0; j < s.size(); j++)
        total = total + (int)s[j];

    return total % 99;
}

void quit() { exit(0); }

Try entering hi -> salut and blue -> bleue . Then, try to find salut and bleue using hi and blue . It works fine, right?

Now, try entering dog -> chien and god -> dieu (in that order). Then, try to find chien using dog . It doesn't work, right?

(3) The problem here is that you do nothing to take care of hash collisions. dog and god have that
same hash value. As a result, entering god -> dieu overwrites dog -> chien and vice versa.

A simple solution to this is to implement your hashmap as an array of std::maps. Here's an
example of how you can do that -> http://www.cplusplus.com/forum/beginner/31394/#msg170354

[EDIT] (Note that I forgot to #include <cstdlib> in the above example...) [/EDIT]

P.S. Using the above approach for implementing you hashmap will also take care of your memory leaks :P

Edited 13 Years Ago by m4ster_r0shi because: n/a

m4ster_r0shi 142 Posting Whiz in Training

13 Years Ago

Oh, are you using visual studio? Try replacing hash with hashing in the code above.

Or... What did you mean? Which code doesn't work? The modified version of your code? Or the example in the link?

Edited 13 Years Ago by m4ster_r0shi because: n/a

m4ster_r0shi 142 Posting Whiz in Training

13 Years Ago

Replace all occurrences of hash with hashing . Not just the declaration, but the calls and the definition too! :P

EDIT: Oh, ok, you figured it out.

Edited 13 Years Ago by m4ster_r0shi because: n/a

m4ster_r0shi 142 Posting Whiz in Training

13 Years Ago

Sure, a loop would be fine. You can use a for loop to scan the dictionary array. In each iteration, if the
current pointer is NULL, do nothing. Else, print what you want (the english word, the french word, or both).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

milan2011 0 Light Poster · Answer 1 · 2011-08-01T13:49:56+00:00

what you said makes sense but I dont want to use <map> and want the time efficiency be O(1). So i wrote a code but the program breaks down when i add the french word.I would appreciate it if you help me to to fix the addDef and maybe some suggestions for search Def.thanks:)

#include <cstddef>
#include <cassert>
#include <iostream>
#include <string>
using namespace std;

using std::cout;
using std::cin;
struct Translator {
       string english;
       string french;
       };
       Translator *p;
       Translator *q[99];

void mainMenu();
int hashing (string s);
void  addDef();
void searchDef();
void createNode(string a, string b);
int quit ();
     
int main ()
{
    mainMenu ();
}

void mainMenu ()
{
    int choice;
	
	cout<<"Welcome to English-French translator"<<endl;
	cout<<"What do You want to do ?"<<endl;
	cout<<"1. add Definition to Dictionary"<<endl;
	cout<<"2. Search for Definition"<<endl;
	cout<<"3. Quit"<<endl;
	cout<<"Enter the number you want :  ";
	cin>>choice;
	
	if( choice == 1)
	{   

		addDef();

      
	}
	else if (choice == 2)
	{
		searchDef();
	}
	else if ( choice == 3 )
	{
		quit ();
	}
	else
	{
		cout<<"Input Invalid !"<<endl<<endl;
		mainMenu();
	}
}

void createNode (string a, string b)
{
     p = (Translator*)malloc(sizeof(Translator));
     if (p!=NULL)
     {
        p->english = a;
        p->french  = b;
     }
     else
     {
         cout<<"Memory FULL";
     }
}
     
void addDef ()
{
     string a;
     string b;
     int hashVal;
     
            cout<<"Enter Word in English :"<<endl;
            cin>>a;
            cout<<"Enter the transelation in French :"<<endl;
            cin>>b;
            createNode(a,b);
            hashVal = hashing (p->english);
            q[hashVal] = p;
}

int hashing (string s)
{
    int total = 0;
    int z = 0;
    char cname [99];
    string temp = s;
    
    for ( int i = 0; i <= 99; i++)
    {
    cname[i] =  temp.at(i);
    }
    
    for (int j = 0; j <= 99; j++)
    {
        total = total + (int)cname[j];
    }
    
    z = total % 99;
    return z;
}

int quit ()
{
    return 0;
}

void searchDef(){}

milan2011 0 Light Poster · Answer 2 · 2011-08-02T07:03:46+00:00

Thanks alot! the code helped alot but it still gives me an error.
error C2872: 'hash' : ambiguous symbol
could you please help me find whats the problem

thanks again :)

milan2011 0 Light Poster · Answer 3 · 2011-08-02T07:20:10+00:00

yes,now it says:

error LNK2019: unresolved external symbol "int __cdecl hashing(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >)" (?hashing@@YAHV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z) referenced in function "void __cdecl addDef(void)" (?addDef@@YAXXZ)

:(

milan2011 0 Light Poster · Answer 4 · 2011-08-02T07:23:59+00:00

oh it worked!! awesome! thanks! now i am going to try to read words from an input file.

milan2011 0 Light Poster · Answer 5 · 2011-08-02T07:30:32+00:00

Also i want to display all the words.what would be the best way to do this? maybe using a loop?

milan2011 0 Light Poster · Answer 6 · 2011-08-02T07:43:25+00:00

is there any example that i can read how to display because i can't find any example in my book.sorry i asked alot of questions!

m4ster_r0shi 142 Posting Whiz in Training · Answer 7 · 2011-08-02T07:52:30+00:00

What do you have trouble with in particular? Using for loops? This
will help then -> http://www.cplusplus.com/doc/tutorial/control/#for

Here's some pseudocode:

for ( each element in dictionary )
{
    if ( current_element != 0 )
    {
        print "english word = "
        print current_element->english
        
        print ", "
        
        print "french word = "
        print current_element->french
        
        print newline
    }
}

milan2011 0 Light Poster · Answer 8 · 2011-08-02T07:57:02+00:00

my problem is that i am not sure if the words will be sorted cause i feel like there will be too many collusion.

milan2011 0 Light Poster · Answer 9 · 2011-08-02T08:04:06+00:00

also not sure which elements should be in

for ( each element in dictionary )

.

m4ster_r0shi 142 Posting Whiz in Training · Answer 10 · 2011-08-02T08:17:49+00:00

Of course the words won't be sorted. If you want them to be, use std::map instead,
as mike suggested. std::map's lookup time complexity isn't that bad (it's O(logn)).

Alternatively, you can stick to your hashmap, but then you'd have to copy your words to another container and
sort them there whenever you want to print them, which would not be very wise if you want to print them often.

Another interesting option is to make your hashmap an array of 26 std::maps and your hash function work like
this -> If the word's first letter is 'a' put it in dictionary[0], else, if the word's first letter is 'b', put it in dictionary[1], etc...

This way,

(1) dictionary will be a sorted container for every i,

(2) for each i < j, any word in dictionary will be
less (lexicographically) than any word in dictionary[j].

milan2011 0 Light Poster · Answer 11 · 2011-08-02T08:27:35+00:00

ok so could i still use my hashmap to search for words but display them using std::map?

milan2011 0 Light Poster · Answer 12 · 2011-08-02T08:48:53+00:00

I am a little confused now,so i first sort them and then put them in hashtable? i am not sure how i can use std::maps :( never used it before! could you please provide an example ? thanks! :)

m4ster_r0shi 142 Posting Whiz in Training · Answer 13 · 2011-08-02T09:05:00+00:00

I am a little confused now,so i first sort them and then put them in hashtable?

Mmm... Not really. When you put them in the hashtable the order is lost.

i am not sure how i can use std::maps never used it before! could you please provide an example ?

There are plenty on the net. Try googling "std::map example".

If you want to stick to your hashmap being an array of pointers, it's probably better that you use a set to sort your data
when you want to print it. Though, note that the hash collision problem is still there. You have to address it somehow.

#include <iostream>
#include <string>
#include <set>

using namespace std;

struct is_less
{
    bool operator()(const string * pstr1, const string * pstr2)
    {
        if (pstr1 == 0) return false; // nulls should
        if (pstr2 == 0) return true;  // go at the end

        return *pstr1 < *pstr2;
    }
};

int main()
{
    string * pstr_arr[10] =
    {
        0,
        new string("world"),
        0,
        0,
        new string("hello,"),
        new string("violet"),
        0,
        new string("red,"),
        0,
        0
    };

    typedef set<const string *, is_less> SetType;

    SetType sorted(pstr_arr, pstr_arr + 10);

    SetType::const_iterator cur_it = sorted.begin();
    SetType::const_iterator end_it = sorted.end();

    for (; cur_it != end_it && *cur_it != 0; ++cur_it)
        cout << **cur_it << " ";

    return 0;
}

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 14 · 2011-08-02T09:27:56+00:00

The recent C++ standard library provides both a std::map class and a std::unordered_map class. The former implements a binary search tree which gives you a list that is sorted by key values, with O(logN) lookup time. The latter implements a hashtable which does not give you a sorted list, but O(1) lookup time (or almost O(1), because of collisions). Both are used almost the same way.

With std::map:

#include <map>
#include <iostream>
#include <string>

using namespace std;

int main() {
  string english_word, french_word;
  map<string,string> dictionary;
  cout << "Enter English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "Enter French word: ";
    cin >> french_word;
    dictionary[english_word] = french_word;
    cout << "Enter English word (or 'q' to quit): ";
    cin >> english_word;
  };
  cout << "Dictionary is: " << endl;
  for(map<string,string>::const_iterator it = dictionary.begin();
      it != dictionary.end(); ++it)
    cout << it->first << " " << it->second << endl;
  cout << "Enter a query English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "The French translation of '" << english_word
         << "' is '" << dictionary[english_word] << "'." << endl;
    cout << "Enter a query English word (or 'q' to quit): ";
    cin >> english_word;
  };
  return 0;
};

With std::unordered_map:

#include <unordered_map>
#include <iostream>
#include <string>

using namespace std;

int main() {
  string english_word, french_word;
  unordered_map<string,string> dictionary;
  cout << "Enter English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "Enter French word: ";
    cin >> french_word;
    dictionary[english_word] = french_word;
    cout << "Enter English word (or 'q' to quit): ";
    cin >> english_word;
  };
  cout << "Dictionary is: " << endl;
  for(unordered_map<string,string>::const_iterator it = dictionary.begin();
      it != dictionary.end(); ++it)
    cout << it->first << " " << it->second << endl;
  cout << "Enter a query English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "The French translation of '" << english_word
         << "' is '" << dictionary[english_word] << "'." << endl;
    cout << "Enter a query English word (or 'q' to quit): ";
    cin >> english_word;
  };
  return 0;
};

In other words, map and unordered_map are the same as far as usage is concerned.

If you need both at the same time, you can use the fact that the elements of std::map are invariant (the tree nodes remain valid as long as they are not destroyed). This means you can build an unordered_map of map elements. But, frankly, it is wasteful, you should decide whether you need a sorted list or if you need constant-time lookup, because you usually don't need both, and having both usually entails a compromise (e.g. bad hash function with lots of collisions (so, losing O(1) lookup time), or keeping two maps (ordered and not) which wastes memory).

To do both:

#include <map>
#include <unordered_map>
#include <iostream>
#include <string>

using namespace std;

int main() {
  string english_word, french_word;
  map<string,string> sorted_dictionary;
  unordered_map<string,string> lookup_dictionary;
  cout << "Enter English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "Enter French word: ";
    cin >> french_word;
    sorted_dictionary[english_word] = french_word;
    lookup_dictionary[english_word] = french_word;
    cout << "Enter English word (or 'q' to quit): ";
    cin >> english_word;
  };
  cout << "Dictionary is: " << endl;
  for(map<string,string>::const_iterator it = sorted_dictionary.begin();
      it != sorted_dictionary.end(); ++it)
    cout << it->first << " " << it->second << endl;
  cout << "Enter a query English word (or 'q' to quit): ";
  cin >> english_word;
  while(english_word != "q") {
    cout << "The French translation of '" << english_word
         << "' is '" << lookup_dictionary[english_word] << "'." << endl;
    cout << "Enter a query English word (or 'q' to quit): ";
    cin >> english_word;
  };
  return 0;
};

m4ster_r0shi 142 Posting Whiz in Training · Answer 15 · 2011-08-02T10:41:23+00:00

And if you want to do both with one container, you can always use boost::multi_index :P

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/hashed_index.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/identity.hpp>
#include <boost/multi_index/member.hpp>

#include <functional>
#include <utility>
#include <iostream>
#include <string>

using namespace boost::multi_index;

class Dictionary
{
    typedef std::pair<std::string, std::string> Pair;

    struct ordered {};
    struct hashed  {};

    typedef multi_index_container
    <
        Pair,
        indexed_by
        <
            ordered_unique<tag<ordered>, identity<Pair> >,
            hashed_unique <tag<hashed>,  member<Pair, std::string, &Pair::first> >
        >
    > Container;

    typedef Container::index<ordered>::type data_ordered;
    typedef Container::index<hashed >::type data_hashed;

public:

    void Insert(const std::string & english, const std::string & french)
    {
        data.get<ordered>().insert(Pair(english, french));
    }

    void PrintAll()
    {
        data_ordered::const_iterator cur_it = data.get<ordered>().begin();
        data_ordered::const_iterator end_it = data.get<ordered>().end();

        for (; cur_it != end_it; ++cur_it)
        {
            std::cout
                << cur_it->first  << " -> "
                << cur_it->second << std::endl;
        }
    }

    std::string Find(const std::string & english)
    {
        data_hashed::const_iterator it;

        it = data.get<hashed>().find(english, boost::hash<std::string>(), std::equal_to<std::string>());

        if (it != data.get<hashed>().end())
            return english + " -> " + it->second;
        else
            return english + " not found...";
    }

private:

    Container data;
};

int main()
{
    Dictionary dictionary;

    dictionary.Insert("hi",   "salut");
    dictionary.Insert("dog",  "chien");
    dictionary.Insert("god",  "dieu" );
    dictionary.Insert("blue", "bleue");

    dictionary.PrintAll();

    std::cout << std::endl;

    std::cout << dictionary.Find("hi")    << std::endl;
    std::cout << dictionary.Find("hello") << std::endl;
};

Output:

blue -> bleue
dog -> chien
god -> dieu
hi -> salut

hi -> salut
hello not found...