After i have read in a data arranged in two columns, separated by white space characters and comprised of M = 1001 rows(where the data may be considered ordered pairs of points x and y), how do i use linear regression to find the equation of a line that best fits this data : y = mx+b
thanks :)

Recommended Answers

All 3 Replies

Er, is this a C++ problem? Or, just a general maths problem? There are two ways in C++:

  1. Write your own functions to do it
  2. Use an existing library

I would recommend option 2. I've used the Gnu Scientific Library (GSL) in the past. It's a C library, really, but it is quite comprehensive (and does contain functions for various kinds of linear and non-linear least squares fitting).

Option 1 isn't too hard for a simple y = a + b*x model, but get's more tricky past that, so if you're going to have to fit different models in the future, get into using a library now rather than write your own. Unless your aim is to specifically learn more about fitting algorithms or something.

Thanks for the help. This is what i have so far but i do not think i am on the right track because my program is not even compiling. Please help me .

#include<iostream.h>
#include<fstream.h>

main () {
    float p[2002];
    ifstream fin ("Prog5-Data 00.dat");
    float sumxy = 0;
    float sumx = 0;
    for (int x = 0; x<=1001; x++) {
         fin >> p[x] >> p[y] >>;
         sum = sum + p[x]*p[y]; 
     }
   
     for (int i = 0; i < 1001; i+2) {
         fin >> p[i];
         sumx = sumx + p[i]; 
     }
     float sumy = 0;
     for (int z = 1; z <=1001; z++)
          fin>> p[z];
          sumy = sum + p[z]; 
     }
     
     float sumxx = 0;
     for (int lcv = 0; lcv < 1001; lcv +2) {
          fin >> p[lcv];
          sumxx = sumx + p[lcv]*p[lcv];
      }    
}

You have a bunch of issues here:

  • You should use <iostream> , not <iostream.h> etc.
  • You should not use a local array for storing your points in, use a dynamically allocated one, so something like:
    double *points = new double[numberOfPoints];
    
    // Set the points to something
    for( int i = 0; i < numberOfPoints; ++i )
        points[i] = i;
    
    // delete the array when you're done (this is important)
    delete[] points;
  • For this kind of application a vector is probably easier and safer:
    #include <vector>
    
    int main()
    {
        std::vector< double > points;
        
        for( int i = 0; i < numberOfPoints; ++i )
            points.push_back(i);
            
        // No need to delete anything (safer!)
        
        return 0;
    }
  • Use a separate array/vector for the x and y coordinates (it's clearer):
    std::vector< double > x_points;
    std::vector< double > y_points;
    
    for( int i = 0; i < numberOfPoints; ++i ){
        double x, y;
        fin >> x >> y;
        x_points.push_back(x);
        y_points.push_back(y);
    }
  • Don't use a for loop to read from the file (I did it above because I hadn't made this point yet). So, it's better to do something like:
    #include <iostream>
    #include <fstream>
    #include <vector>
    
    int main()
    {
        std::ifstream inFile("filename.txt", std::ios::in);
        if ( inFile.is_open() == false )	return 1;
        
        std::vector< double > x_points;
        std::vector< double > y_points;
        
        while( inFile.fail() == false ){
            double x, y;
            fin >> x >> y;
            x_points.push_back(x);
            y_points.push_back(y);
        }
        
        for( int i = 0; i < x_points.size(); ++i )
            std::cout << x_points[i] << " " << y_points[i] << std::endl;
            
        return 0;
    }

See how you get on with those points :o)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.