Hello there, I need help on finding out how to count the number of extensions found using the Boost filesystem.

How i get the extensions is with this:

extension(iter->path())

I now need to know how to store the type of extension and then the next time it comes around increase its occurrence by 1.

Output should look like this:

Ext :             #  :              file size
-------------------------------------------------------
.cpp  :             5  :              124,255
.exe  :            10  :          729,358,928
.ppt  :             4  :               15,626
.obj  :             9  :          974,257,151
.txt  :            49  :               36,291
.vbs  :             8  :                1,387

I was deciding to use a map, but I'm not sure how maps really work, or if they could even accomplish this task.

Could anyone help me out?

Recommended Answers

All 27 Replies

simple example

#include <map>
#include <string>
#include <iostream>
using namespace std;


int main()
{
    std::map <std::string, int> m;
    m["cpp"]++;
    m["exe"]++;
    m["exe"]++;
    m["exe"]++;
    m["exe"]++;
    m["ppt"]++;
    m["ppt"]++;
    m["ppt"]++;

    std::map <std::string, int>::iterator it;
    for(it = m.begin(); it != m.end(); it++)
        cout << it->first << " = " << it->second << "\n";

}

so for the m[".cpp"]++ its checking if the extension is .cpp and then adding its occurrence by 1?

so it should be something like m[extension(iter->path())]++ ?

OK, so i got the maps in my program and they add up the files and all. But now i need to change it up and having a map of maps.

map<string, map<int, int>> myMap;

so the problem is, I don't know how to fill it with data.
I've seen that you could use myMap.insert<pair<some stuff...
but i dont know how or if you can use that on maps of maps.

Any suggestions?

Also can someone show an example on how to sort the data for string and filesize(the 2nd int) while using maps.

you don't have to sort maps -- map puts them in sorted order as they are inserted.

you don't have to sort maps -- map puts them in sorted order as they are inserted.

Yes, but i need to display the extensions or the file size by alphabetical or numerical order (ascending or descending)

I don't know if maps can be resorted -- never tried it. You could give it a go using std::sort().

I don't know if maps can be resorted -- never tried it. You could give it a go using std::sort().

Well i dont think sort will work, my prof said that the short answer to sorting maps is that u can't, but there is some kind of way around it, but i got lost in what he was saying... i'm currently going over lists to see if they are better to use.

you could create a structure then have a vector of structures -- I know std::sort works with vectors because I've done it several times. You have to write a function that take two arguments which are references to two of the vectors, and that function returns true if the first is less than the second. The function can compare whatever it wants as long as it returns either true or false.

The problem with this approach it that it isn't as efficient as a map. With vector, the program has to search to vector to see if an entry already exists, and if it does, increment the counter value in the structure. If it doesn't exist the it must add an entry to the vector.

The advantage of the vector solution is that the structure an contain anything you want. There is no limit to what information it can contain.

so your saying have 2 vectors that represent the number of files and the extension type?

If so i dont think that will work, if you sort out the extensions then the 2nd vector of number of extensions will not know where its match has been moved to.

With this aside, i should be concentrating on what will be the majority of marks, this sorting stuff only makes up about 20% of the project if i can get the rest to work i can still get an 80.

All i need to know to continue on till i figure out how to sort it is how to insert data into this map:

map <string, map <int, int> > myMap;

i tried to use something like this but it wont work:

myMap.insert(make_pair(".cpp", make_pair(352,25901)));

so the problem is, I don't know how to fill it with data.
I've seen that you could use myMap.insert<pair<some stuff...
but i dont know how or if you can use that on maps of maps.

I don't know why you created maps of maps. maps of pair would have been the right data structure.

int main()
{
    std::map<std::string, std::pair<int,int> > M;
    M[".cpp"].first=5;
    M[".cpp"].second=124255;

}

As the Dragon said, you cannot sort map actually.
If you had a bijection between the key and the value ( in simple terms it means that no two keys should have the same value), the answer was indeed simple:
You would have created another map with key as filesize and value as Ext. but in this case this cannot be done. Because two extensions can have similar filesize, and the primary condition to map is that the key should be unique.

The solution can possibly using std::multimap:
You should perhaps create a multimap with key as filesize and Ext ( multimap has two keys). Now insert the elements from the map to multimap and it will come sorted.

The efficiency issue is definitely a problem. If I were you, and knew that I would be needing to sort this things, I would not opted for a map in the first place.
I would use a struct and create a vector of that struct. Then do a std::sort() wherein I can easily specify (in the third parameter) the criteria of sorting.

@ siddhant3s

Ok well i'm not too sure whats going on in your code there.

int main()
{
      std::map<std::string, std::pair<int,int> > M;
      M[".cpp"].first=5;
      M[".cpp"].second=124255;
}

so would that make data like this:

map<".cpp", pair<5, 12455>;

Also if thats how it works, how do i print out the data?

so your saying have 2 vectors that represent the number of files and the extension type?

I said no such thing. I said of a vector of structures. Each structure contains the file type and count, as well as whatever else you want it to contain. You can then sort that vector any way you want, by file type, by count, or some other field in the structure.

struct ftype
{
    std::string file_type;
    int count;
};

vector<ftype> lists;

Oh, i see... so that wouldn't cause the data to become miss matched while sorting?

Ok, so i made the struct:

struct fileInfo
{
	string fileType;
	int count;
	int size;
};
vector<fileInfo> lists;

and then tried to insert some data into is, but i tried this and it gave me 2 errors:

lists.fileType.push_back ( extension ( iter->path() ) );

the errors are:

1:error C2039: 'fileType' : is not a member of 'std::vector<_Ty>'
2:error C2228: left of '.push_back' must have class/struct/union

you coded it wrong

// create a struct fileInfo object
fileInfo info;
// populate the fields
info.FileType = ".CPP";
info.count = 0;
into.size = 0;
// add it to the list
lists.push_back(info);

I think i got it coded right now, but how would i display the data?

To display the data

vector<fileInfo>::iterator it;
for(it = list.begin(); it != lists.end(); it++)
{
   cout << it->fileType << " = " << it->count << "\n";
}

Now to sort it, I did not compile or test this, so use at your own risk. google for std::sort and you will get lots more help.

// Return whether first element is greater than the second
bool UDgreater ( fileInfo& elem1,  fileInfo& elem2 )
{
   return elem1.fileType  > elem2.fileType;
}

int main()
{
....
    sort(list.begin(), list.end(),  UDgreater );
}

Here is a complete example that I compiled and tested

#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
using namespace std;


struct fileInfo
{
	string fileType;
	int count;
	int size;
};
// Return whether first element is greater than the second
bool UDgreater ( fileInfo& elem1,  fileInfo& elem2 )
{
   return elem1.fileType  < elem2.fileType;
}

int main()
{
    vector<fileInfo> lists;
    // create a struct fileInfo object
    fileInfo info;
    // populate the fields
    info.fileType = ".CPP";
    info.count = 0;
    info.size = 0;
    lists.push_back(info);
    info.fileType = ".AAA";
    info.count = 1;
    info.size = 1;
    // add it to the list
    lists.push_back(info);
    sort(lists.begin(), lists.end(),  UDgreater );
    vector<fileInfo>::iterator it;
    for(it = lists.begin(); it != lists.end(); it++)
    {
        cout << it->fileType << " = " << it->count << "\n";
    }
}

Ok, thank you for your help so far. But that sort function doesn't seem to work right, it didn't sort my extensions and when i tried to sort the file size, some number where out of place. (0's in the middle of data)

Also i need to know how i should accumulate the data (so there are only 1 type of an extension and the file count is equal to the number found, the file size is also accumulated)

If you run the program I posted you will see that it sorts the file types in ascenting order.

That sort function sorts the strings. If you want to sort on size then you will have to change the sort function. In the sort function use < operator for ascending sort, or > operator for descending sort. You can have as many sort functions as you want to sort the vector on different fields.

Just in case you didn't see this before you posted:

Also i need to know how i should accumulate the data (so there are only 1 type of an extension and the file count is equal to the number found, the file size is also accumulated)

when you read a line from the file you have to search the vector to see if there is an extension already in it. A loop similar to the one I posted for printing the data will do it. If the extension isn't in the vector then add it. If it is found in the vector then increment the counter variable.

Ok, this is what i got so far:

for(vector<fileInfo>::iterator it = lists.begin(); it != lists.end(); it++)
				{
					//search for a known extension
					if (it->fileType == iter->path())
					{
						foundExtension = true;
						//go to count and increment by one
						//go to file size increment by file_size(iter->path())
					}
					else
					{
						foundExtension = false;
					}
				}

				if(foundExtension == false)
				{
					info.fileType = extension(iter->path());
					info.count = 1;
					info.size = file_size(iter->path());
					lists.push_back (info);
				}

Now, how would i actually add up the file size and extension count?
Like how do i know its incrementing the right values.

>>how would i actually add up the extension count?
on line 6 just simply increment the count variable it->count++; what do you mean by "file size" ? The size of all files with that extension? That could be a very very time consuming thing to calculate because you would have to search every file on your hard drive for those file extensions. If that is not what you mean -- and I hope it isn't, then I have no idea how to increment the file size. If you already have a filename, then use fstat() to get the size of the file. There are other ways too, but IMO fstat() is the simplest to use.

Oh, for the file size i'm getting it from Boost filesystem. This program goes through a path given by the user and find all the files in it, adds them up and the file size.

so say you got 10 .cpp files of 100kb each then you got 1000kb, the ouput would look like this:

.cpp : 10 : 1,000

This can also be used to search your whole hard drive, and yes that would take a few hours lol.

EDIT: i added in it->count++; and it didn't work

I don't know a thing about boost. So you're on your own for that one. You may have to use a 64-bit integer for total file size because they could get quite huge.

>>and it didn't work
What exactly does that mean?

well with it->count++; and the way i have it set up, it should be checking for doubles right? well it doesn't even make it into the if statement. ( if (it->fileType == iter->path()) )

EDIT: Never mind the if should have been == extension(iter->path())
EDIT2: Ok i've now noticed that i think it makes a new vector if it enters a new directory... hmmm
EDIT3: Ok i see why there are more then one vector, when there is a new directory found, it runs the method again creating a new vector. Can i put the vector outside of the method some how?

void show_files ( const path & directory, bool recurse_into_subdirs )
{
	bool foundExtension = false;
	vector<fileInfo> lists;
	fileInfo info;
	if ( exists( directory ) )
	{
		directory_iterator end ;
		for ( directory_iterator iter( directory ) ; iter != end ; ++iter )
		{
			if ( is_directory( *iter ) )
			{
				if ( recurse_into_subdirs ) show_files( *iter, recurse_into_subdirs ) ;
			}
			else 
			{
				for(vector<fileInfo>::iterator it = lists.begin(); it != lists.end(); it++)
				{
					//search for a known extension
					if (it->fileType == extension(iter->path()))
					{
						foundExtension = true;
						//go to count and increment by one
						it->count++;
						//go to file size increment by file_size(iter->path())
						it->size += file_size(iter->path());
					}
					else
					{
						foundExtension = false;
					}
				}

				if(foundExtension == false)
				{
					info.fileType = extension(iter->path());
					info.count = 1;
					info.size = file_size(iter->path());
					lists.push_back (info);
				}
			}
		}
	}

	sort(lists.begin(), lists.end(),  UDgreater);

	vector<fileInfo>::iterator it;
	for(it = lists.begin(); it != lists.end(); it++)
    {
		cout << it->fileType << " = " << it->count << " : " << it->size << "\n";
    }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.