Hi,
can anyone tell me why i can't clean up the memory,
and maybe a small solution.

I'm allocing memory with strdup for each key,
but I cant figure out how to erase the key with free.

Before anyone starts talking about mixing c++ and c,
I know it's generally a bad idea but std::strings are to slow.

#include<iostream>

#include<cstring> //for cstring functions
#include<map> //associative array
#include<cstdlib> 


struct cmp_str {
  bool operator()(const char *a,const  char *b) {
    return std::strcmp(a, b) < 0;
  }
};

void printMap(std::map<const char*,int,cmp_str> &asso){
  for(std::map<const char*,int>::const_iterator it = asso.begin(); it != asso.end(); ++it){
    printf("%s:\t %d\n",it->first,it->second);
  }
}



std::map<const char*,int,cmp_str> build(){
  std::map<const char*, int,cmp_str> asso;
  
  const char *text = "does is work";
  char *tmp=NULL;
  tmp = strtok(strdup(text)," \t");
  
  while(tmp!=NULL){
    asso[strdup(tmp)] = 1;
    tmp=strtok(NULL," \t");
  }
  return asso;
}

void cleanup(std::map<const char*,int,cmp_str> &asso){
  std::map<const char*,int,cmp_str>::iterator it;
  asso.erase(asso.begin(),asso.end());
}


int main(int argc, char** argv){
  std::map <const char*, int,cmp_str> asso = build();
  printMap(asso);
  printf("should return 1: %d\n",asso["is"]);
  printf("should return 0: %d\n",asso["0"]);
  cleanup(asso);
  return 0;
}

Erasing the map elements will remove the containers but it won't 'free' the memory you allocated for the objects contained within them.

Yes I know,
this is what I write in my original post.

How do I free the memor, that I've allocated.

For anyone that cares,
this seems to do the trick.
But it is important that the item decleration is used directly at the assignmen operator

[TEX=c++]
my_map::value_type item = *it;
data.erase(it);
free (item.first);
[/TEX]

It's so easy:

typedef std::map<const char*,int,cmp_str> Map;
void cleanup(Map& m)
{
  for (Map::iterator i = m.begin(); i != m.end(); i++)
        free(const_cast<char*>(i->first));
  m.clear(); 
}

Be careful: this map must have keys allocated by C-functions strdup or malloc.

Cheers arkM,

I have no idea what happened here...

But the const_cast is not really necessary, is it?

here... where?

const_cast is necessary because iterator->first yields a pointer that is not suited for the function free ( map<const char*... ).

If you don't understand my code:

We must deallocate map key memory chunks obtained from strdup(s). The for loop iterates all mapped elements and deallocates these memory chunks. Now we can clear the map - that's all.

What's a problem?

Thanks for your response.

Is the const_cast necessary because i definded my map as a

typedef std::map <const char*,int> aMap;

instead of

typedef std::map <char*,int> aMap;

Having a key as const in a map is to prefere right?

thanks again

>Is the const_cast necessary because i definded my map as a ...
Yes.
>Having a key as const in a map is to prefere right?
It does not matter in your case.
I don't undestand why you are so disturbing on const_cast ;)

>Before anyone starts talking about mixing c++ and c,
>I know it's generally a bad idea but std::strings are to slow.
It's not generally a bad idea, but I'm curious why you think std::strings are too slow. Have you written your code using them, profiled it, and determined that the biggest bottleneck is inside the string class? Or are you making a conclusion based on subjective observation and/or hearsay?

I'm parsing 900 gig of textfiles
Using std::string as opposed to char* directly from getline,
takes around 40-60% longer.

If you see to the fastest code:

1. Avoid using expensive malloc/strdup for every new key string value. Place new C-strings to preallocated buffers chain (or even into std::string with a proper capacity) including terminating zero byte then use a pointer to the string as map key (no troubles with key deallocations)

2. Allocate large buffer fo process file (use std::filebuf and putsetbuf member to assign large preallocated buffer) or use C i/o library functions directly. As usually, default file stream buffers are too small for bulk file processing.

3. std::map (intrusive) container is a good thing but optimized AVL-tree implementations as usually are much more faster (that's proved ;))...

std::string is not a bottleneck, see points above (acceleration factor ~3-5 or more)...

Cheers arkM,

I will for sure look into this.

The avl is not part of the stl right?
So I have to make this one up.


As it is, the lookup map, takes aprox 3-4 gig memory,
I have 8 gig, but this is really the limit.
35million keys, each with a int[5] as value.

So i really can't just malloc huge amounts.
Or maybe I dont understand you...

The files I read are flatfiles, same amount of columns on each line,
so you are saying I should just buffer huge huge amounts.
And then manually tokenize by newlines, and then item seperate it?

Thanks for your advice,

AVL-trees: there are some good freeware implementations in C and C++ (search Goodle). Think: std::map with char* key is not too slow.
Memory: 35 millions keys * ~12-30 bytes (AVLT node size) = ~0.4-1 Gb - that's all.
String buffer: you allocates much more memory for std::map; 35 millions of malloc/new - too expensive if you see the fastest code.
Buffering: it's not so expensive (~64 Kb is enough)
Tokenizing: try and compare line parsing with operator >> and a simple scanner code (I'm sure that the last one is much more faster).

Of course, the right way to go:
- make prototype with STL (may be it has good characteristics)
- profile application
- step by step replace parts mentioned above
- test application, compare characteristics

This article has been dead for over six months. Start a new discussion instead.