I have created a program that reads data from a file and stores it into a vector. The data looks something like the following:

48.698635 -0.000000 0.000019 1.000000 -0.004386 0.002192 // Step 1
48.583797 0.114838 0.100000 1.099997 -0.005061 0.001945 //Step 2
48.457909 0.125888 0.100000 1.199994 -0.005854 0.001883 //Step 3 ect...
48.331438 0.126471 0.100000 1.299990 -0.006675 0.001648
48.151781 0.179657 0.100001 1.399987 -0.007441 0.001298
47.977470 0.174311 0.099999 1.499982 -0.008155 0.000716
47.874269 0.103201 0.100001 1.599979 -0.008924 0.000386
47.769111 0.105158 0.100000 1.699975 -0.009758 0.000013
47.632978 0.136132 0.099999 1.799969 -0.010630 -0.000518
47.503279 0.129699 0.100000 1.899964 -0.011510 -0.000964
47.386929 0.116350 0.100000 1.999959 -0.012305 -0.001482
47.386846 0.000083 0.000041 2.000000 -0.012305 -0.001483

Each line is considered a step. The first line is step 1 the second line is step 2 and so on. As you can see each step contains 6 parameters. the first parameter of each step is the energy. For example the energy of step 1 is 48.698635. I created a class: class Step which is 6 floating point entries. This allowed me to push back each step into a vector. I wrote a program that stores this data into a vector where each line is a vector entry. Here is the code that I have so far.

#include "stdafx.h"
#include <iostream>
#include <iomanip>
#include <fstream>
#include <istream>
#include <string>
#include <vector>
using namespace std;


//Create class that contains the six parameters of each line of the data
class Step
{
public:
	float m_energy;
	float m_energy_loss;
	float m_step_length;
	float m_xpos;
	float m_ypos;
	float m_zpos;

	void print()
	{
		cout << m_energy << " " << m_energy_loss << " " << m_step_length << " " << m_xpos << " " << m_ypos << " " << m_zpos << endl;
	}
};

//Creates class that contains a vector which is a collection of mutiple steps
class Track
{
public:
	vector<Step> m_track;

	void print() //member function that prints the contents of the vector m_track
	{
		for(int c = 0; c < m_track.size(); c++)
			m_track.at(c).print();
	}
};


int main() 
{
    string line;
	Step current_step;
	Track current_track;

	ifstream myfile("data");    

	// Check if file is found. 
	if (!myfile) 
	{
		cout << "Unable to open file! " << endl;
		system("pause");
		exit(1); 
	}

	//loads data into vector m_track
	while (myfile.good())
	{
		getline(myfile, line);
		current_step = Step();
		sscanf(line.c_str(),"%f %f %f %f %f %f", &(current_step.m_energy), &(current_step.m_energy_loss), &(current_step.m_step_length), &(current_step.m_xpos), &(current_step.m_ypos), &(current_step.m_zpos));
		current_track.m_track.push_back(current_step);

	}

	//prints contents of vector
	current_track.print();

	system ("pause");

	return 0;
}

My goal now is to create a program that will allow the user to enter a max energy. The program will then preform a binary search of all the energies of each step. The program will then output all steps which have an energy less then the input. For example, using the data shown above, if the user input 48.0 the program would output:

47.977470 0.174311 0.099999 1.499982 -0.008155 0.000716
47.874269 0.103201 0.100001 1.599979 -0.008924 0.000386
47.769111 0.105158 0.100000 1.699975 -0.009758 0.000013
47.632978 0.136132 0.099999 1.799969 -0.010630 -0.000518
47.503279 0.129699 0.100000 1.899964 -0.011510 -0.000964
47.386929 0.116350 0.100000 1.999959 -0.012305 -0.001482
47.386846 0.000083 0.000041 2.000000 -0.012305 -0.001483

which is all the steps that have an energy less than the input value. Note that the program I am trying to write MUST use a binary search to search for the energy. The data I have contains hundreds of thousands of steps and other searches may be too slow for what I am trying to do. Any suggestions would be greatly appreciated. Thanks.

Recommended Answers

All 8 Replies

Binary searches only work on sorted data. So your first job is to sort the vector by energy. That should be fairly trivial if you use std::sort() and write your own comparison function. After that you could use any binary search algorithm on the vector, treating the vector as a simple array of classes. I'm assuming the vector could contain duplicates, so once the first item is found your program will have to search forward sequentially until it finds the first class whose energy is larger than the user input. Assuming the vector is sorted in ascending order you just simply print all the items from that point to the beginning of the vector.

Sorting and binary search will take nlogn + logn time. Instead you can simply run through the list and spit out anyone that satisfies the requirement-- this takes linear time.

Is using a vector a requirement? You could use a (multi)set and its upper_bound member function.
If the data on the file is already sorted, you can use the global upper_bound function with a vector.

Useful links:

http://www.cplusplus.com/reference/stl/set/
http://www.cplusplus.com/reference/stl/set/upper_bound/

http://www.cplusplus.com/reference/stl/multiset/
http://www.cplusplus.com/reference/stl/multiset/upper_bound/

http://cplusplus.com/reference/algorithm/upper_bound/

Sorting and binary search will take nlogn + logn time. Instead you can simply run through the list and spit out anyone that satisfies the requirement-- this takes linear time.

Agree if the search is done only one time. But when multiple searches are needed, the data is only sorted once (the first time), and with a vector of thousands of rows the binary search would quickly become much more efficient than linear search.

Agree if the search is done only one time. But when multiple searches are needed, the data is only sorted once (the first time), and with a vector of thousands of rows the binary search would quickly become much more efficient than linear search.

Agreed, but look at it another way, supposed he wants to add more elements. Then the list would have to be sorted each time.

And the problem with that is what again??? binary search will always be faster than linear search. See this link

And the problem with that is what again??? binary search will always be faster than linear search. See this link

Yes I know, but sorting cost nlogn using std::sort. So if a new element gets added, then its gonna cost him nlogn again before he could use binary search again. This is true for each elements he wants to add one by one

Yes I know, but sorting cost nlogn using std::sort. So if a new element gets added, then its gonna cost him nlogn again before he could use binary search again. This is true for each elements he wants to add one by one

As stated, the problem consists of read once - query possibly many. In that setup, the upfront cost of the sort is dwarfed by the payout on the searches.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.