hi,
I am very naive to java. my project is in data mining where i have to implement k means clustering.
the task is like this
1.reads a csv file and stores the attributes in a matrix format
2.clusters the matrix data depending on the euclidean distance measure.
3. the centroid value the user must give
4. the mean should be recalculated for the clusters to maintain accuracy
4. the output should be the clusters or groups
thanks in advance

hi,
I am very naive to java. my project is in data mining where i have to implement k means clustering.
the task is like this
1.reads a csv file and stores the attributes in a matrix format(6000rows and 86 columns). From that i have to choose certain columns(attributes) for clustering
2.clusters the matrix data depending on the euclidean distance measure.(minimum distance) from the user given centroids
3. the centroid value the user must give.
4. the mean should be recalculated for the clusters to maintain accuracy
4. the output should be the clusters or groups
please help to code in java. do i have any packages in java to implement this
thanks in advance

I am very naive to java.

So buy / borrow / steal a book on Java read it (there is also the "Starting Java" sticky which should interest you) and get up to speed on it. I think your class instructor / firm would have given you this assignment only after they were **sure** that you could handle it. However if you gave them a false impression of your skills then I do not know .....

do i have any packages in java to implement this

Weka is a data mining tool written in Java, so you might want to check out how it works. I however have never used, so do not have a clue about it.

please help to code in java

Help yes, but do not expect us to do the entire work for you. First you show us what you have tried. Reading data from a CSV file is simple enough show what you have achieved there ??

//main class 
import java.io.*;
import java.util.*;
import java.lang.*;


public class main 
{
   public static void main (String args[]) throws IOException 
     {

       Centroid cent = new Centroid();
       int ClustNumber;
       System.out.println(" Enter the number of clusters");
       Scanner input = new Scanner(System.in);
       ClustNumber=input.nextInt();
       String [][] numbers = new String [8][2];
       double Cordx[] =new double[8];
       double Cordy[] =new double[2];
       File file = new File("sam.csv");
       BufferedReader bufRdr  = new BufferedReader(new FileReader(file));
    String line = null;
    int row = 0;
    int col = 0;

    //read each line of text file
    while((line = bufRdr.readLine()) != null && row< 8 )
    {   
    StringTokenizer st = new StringTokenizer(line,",");
    while (st.hasMoreTokens())
    {
        //get next token and store it in the array
        numbers[row][col] = st.nextToken();
        col++;
    }
    col = 0;
    row++;
    }


for(row=0;row < 8;row++)
{
for(col=0; col<2;col++)
{
System.out.print(" " + numbers[row][col]);
}
//System.out.println(" ");
}

for (row=0; row<8;row ++)
{
for(col=0;col <2;col++)
{
Cordx[row]=Double.parseDouble(numbers[row][col]);
Cordy[col]=Double.parseDouble(numbers[row][col]);
}
}


//cent.Grouping(Cordx,Cordy,ClustNumber);

}
}
//centroid class
import java.io.*;
import java.util.*;
import java.lang.*;
import java.text.*;

public class Centroid {

   public void Grouping(double[] Cordx, double[] Cordy, int clustNumber) {
      int clusterNumber = clustNumber;
      double[] ClustCordX = new double[clustNumber];
      double[] ClustCordY = new double[clustNumber];
      this.getMeansetCentroid(Cordx, Cordy,  clustNumber);
 DecimalFormat dec = new DecimalFormat("0.00");
      for(int i = 0;i<Cordx.length;i++) {
         String result1 = dec.format(Cordx[i]);
         String result2 = dec.format(Cordy[i]);
         System.out.println("\n Cords are ( " + result1 + " , " + result2 + ")");
      }

      for(int i = 0; i<clustNumber;i++) {
         ClustCordX[i] = Cordx[i];
         ClustCordY[i] = Cordy[i];
      }


      this.groupCordtoCluster(Cordx,Cordy,ClustCordX,ClustCordY);
   }

   public void groupCordtoCluster(double[] Cordx, double[] Cordy, double[] ClustCordX, double[] ClustCordY) {
      double temp ;
      int size = Cordx.length;
      int clustsize = ClustCordX.length;
      int clusterComparison = clustsize;
      int[] grouping = new int[size - clustsize];
      double[] ClustgroupX = new double[size - clustsize];
      double[] ClustgroupY = new double[size - clustsize];
      int tempint = -1;


      for(int i = clusterComparison; i < size;i++) {
          temp = 0;
         for(int j = 0;j<clustsize;j++) {
            if (j == 0)
               tempint++;
            if(temp == 0) {
               temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
               grouping[tempint] = j;
               ClustgroupX[tempint] = Cordx[i];
               ClustgroupY[tempint] = Cordy[i];
            }
            else if (temp > Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2))) {        
               temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
               grouping[tempint] = j; 
               ClustgroupX[tempint] = Cordx[i];
               ClustgroupY[tempint] = Cordy[i];
            }
         }


      }
      DecimalFormat dec = new DecimalFormat("0.00");
      String result1, result2, result3, result4;
      for(int i = 0; i<grouping.length;i++) {
         System.out.println("------------------------");
         System.out.println("Clusters for group " + grouping[i]);
         result1 = dec.format(Cordx[grouping[i]]);
         result2 = dec.format(Cordy[grouping[i]]);
         result3 = dec.format(ClustgroupX[i]);
         result4 = dec.format(ClustgroupY[i]);
         System.out.println("Cordinates are (" + result1 + " , " + result2 + ")");
         System.out.println("------------------------");
         System.out.println("Clusters for group " + grouping[i]);
         System.out.println("Cordinates are (" + result3 + " , " + result4 + ")");

      }
   }

   public void getMeansetCentroid(double[] Cordx, double[] Cordy, int ClustNumber) {
      double xCord, yCord;
      double MAX=0,distance, tempd1, tempd2;
      double[] Distances = new double[Cordx.length];
      int reference, i, j, temp1, temp2, point, length;
      reference = i = j = temp1 = temp2 = point =0;
      int[] centroids;


      for(j = 1; j < Cordx.length;j++) {
         Distances[j-1] = Math.sqrt(Math.pow((Cordx[j]-Cordx[0]),2) + Math.pow((Cordy[j]-Cordy[0]),2));
      }
      for(i=0;i<Cordx.length-1;i++) {
         for(j=0;j<Cordx.length-1-i;j++) {
            if(Distances[j+1] < Distances[j]) {
               distance = Distances[j];
               tempd1 = Cordx[j];
               tempd2 = Cordy[j];
               Distances[j] = Distances[j+1];
               Cordx[j] = Cordx[j+1];
               Cordy[j] = Cordy[j+1];
               Distances[j+1] = distance;
               Cordx[j+1] = tempd1;
               Cordy[j+1] = tempd2;
            }
         }
      }

      point = Cordx.length;
      do {
         if(Cordx.length % ClustNumber != 0)
            point--;
      }while(point % ClustNumber != 0);
      length = point/ClustNumber;
      for(i=0;i<Cordx.length;i=length+i) {
         if((i+length-1) > point)
           break; 
         tempd1 = Cordx[i];
         tempd2 = Cordy[i];
         Cordx[i] = Cordx[i+length-1];
         Cordy[i] = Cordy[i+length-1];
         Cordx[i+length-1] = tempd1;
         Cordy[i+length-1] = tempd2;
      }

   }


}

and my data sample data file is

 1.1,60
 8.2,20
4.2,35
1.5,21
7.6,15
2.0,55
3.9,39

The error is

C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java

C:\Program Files\Java\jdk1.6.0_03\bin>java main
 Enter the number of clusters
4
  1.1 60
  8.2 20
 4.2 35
 1.5 21
 7.6 15
 2.0 55
 3.9 39
    null
Exception in thread "main" java.lang.NumberFormatException: empty String
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
        at java.lang.Double.parseDouble(Double.java:510)
        at main.main(main.java:53)

C:\Program Files\Java\jdk1.6.0_03\bin>java main
 Enter the number of clusters


9
  1.1 60
  8.2 20
 4.2 35
 1.5 21
 7.6 15
 2.0 55
 3.9 39
    null
Exception in thread "main" java.lang.NumberFormatException: empty String
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
        at java.lang.Double.parseDouble(Double.java:510)
        at main.main(main.java:53)

C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java

C:\Program Files\Java\jdk1.6.0_03\bin>java main
 Enter the number of clusters
4
  1.1 60
  8.2 20
 4.2 35
 1.5 21
 7.6 15
 2.0 55
 3.9 39
    null
Exception in thread "main" java.lang.NumberFormatException: empty String
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
        at java.lang.Double.parseDouble(Double.java:510)
        at main.main(main.java:53)

C:\Program Files\Java\jdk1.6.0_03\bin>java main
 Enter the number of clusters
0
  1.1 60
  8.2 20
 4.2 35
 1.5 21
 7.6 15
 2.0 55
 3.9 39
    null
Exception in thread "main" java.lang.NumberFormatException: empty String
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
        at java.lang.Double.parseDouble(Double.java:510)
        at main.main(main.java:53)

C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java

C:\Program Files\Java\jdk1.6.0_03\bin>java main
 Enter the number of clusters
4
  1.1 60  8.2 20 4.2 35 1.5 21 7.6 15 2.0 55 3.9 39    nullException in thread "
main" java.lang.NumberFormatException: empty String
        at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
        at java.lang.Double.parseDouble(Double.java:510)
        at main.main(main.java:53)

C:\Program Files\Java\jdk1.6.0_03\bin>

But if it happens to read [6000][86]
what will happen.
please help me now. thank you for your responses

Edited 3 Years Ago by mike_2000_17: Fixed formatting

Put your code inside code tags, like this:-

[code=java] // Your code here.

[/code]

The rest like the output you are getting, post it as :-

[code]

//Output on compiling / running the program.

[/code]

import java.io.*;
import java.util.*;
import java.lang.*;


public class main1 
{
   public static void main (String args[]) throws IOException 
     {

       Centroid cent = new Centroid();
       int ClustNumber;
       System.out.println(" Enter the number of clusters");
       Scanner input = new Scanner(System.in);
       ClustNumber=input.nextInt();
       String [][] numbers = new String [6][2];
       double Cordx[] =new double[6];
       double Cordy[] =new double[6];
       File file = new File("sam.csv");
       BufferedReader bufRdr  = new BufferedReader(new FileReader(file));
    String line = null;
    int row = 0;
    int col = 0;

    //read each line of text file
    while((line = bufRdr.readLine()) != null && row< 6 )
    {   
    StringTokenizer st = new StringTokenizer(line,",");
    while (st.hasMoreTokens())
    {
        //get next token and store it in the array
        numbers[row][col] = st.nextToken();
        col++;
    }
    col = 0;
    row++;
    }


for(row=0;row < 6;row++)
{
for(col=0; col<2;col++)
{
System.out.print(" " + numbers[row][col]);
}
System.out.println(" ");
}
for(row=0;row<6;row++)
{

Cordx[row]=Double.parseDouble(numbers[row][0]);
Cordy[row]=Double.parseDouble(numbers[row][1]);
//System.out.print(" " + Cordx[row]);
}

for(row=0;row<6;row++)
{
System.out.print(" " + Cordx[row]);
//System.out.print("\n " + Cordy[row]);
}
System.out.print(" \n");
for(row=0;row<6;row++)
{

System.out.print(" " + Cordy[row]);
}
cent.Grouping(Cordx,Cordy,ClustNumber);

}
}


//centroid

import java.io.*;
import java.util.*;
import java.lang.*;
import java.text.*;

public class Centroid
{

  // finds the mean of the dataset for grouping
  public void Grouping(double[] Cordx, double[] Cordy, int clustNumber)
   {
      int clusterNumber = clustNumber;
      double[] ClustCordX = new double[clustNumber];
      double[] ClustCordY = new double[clustNumber];
      this.getMeansetCentroid(Cordx, Cordy,  clustNumber);
      DecimalFormat dec = new DecimalFormat("0.00");
      for(int i = 0;i<Cordx.length;i++)
      {
         String result1 = dec.format(Cordx[i]);
         String result2 = dec.format(Cordy[i]);
         System.out.println("\n Cords are ( " + result1 + " , " + result2 + ")");
      }
    //setting random datasets as centroids
      for(int i = 0; i<clustNumber;i++) 
      {
         ClustCordX[i] = Cordx[i];
         ClustCordY[i] = Cordy[i];
      }


      this.groupCordtoCluster(Cordx,Cordy,ClustCordX,ClustCordY);
   }

   public void groupCordtoCluster(double[] Cordx, double[] Cordy, double[] ClustCordX, double[] ClustCordY) 
    {
      double temp ;
      int size = Cordx.length;
      int clustsize = ClustCordX.length;
      int clusterComparison = clustsize;
      int[] grouping = new int[size - clustsize];
      double[] ClustgroupX = new double[size - clustsize];
      double[] ClustgroupY = new double[size - clustsize];
      int tempint = -1;

    //grouping the dataset to respective clusters by comparing the distance
      for(int i = clusterComparison; i < size;i++) 
         {
          temp = 0;
            for(int j = 0;j<clustsize;j++)
             {
            if (j == 0)
               tempint++;
            if(temp == 0) 
              {
               temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
               grouping[tempint] = j;
               ClustgroupX[tempint] = Cordx[i];
               ClustgroupY[tempint] = Cordy[i];
               }
            else if (temp > Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2)))
               {        
               temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
               grouping[tempint] = j; 
               ClustgroupX[tempint] = Cordx[i];
               ClustgroupY[tempint] = Cordy[i];
               }
             }

          }
      DecimalFormat dec = new DecimalFormat("0.00");
      String result1, result2, result3, result4;
      for(int i = 0; i<grouping.length;i++)
        {
        //for(int i = 1; i< clustNumber;i++) {
         System.out.println("------------------------");
         System.out.println("Clusters for group " + grouping[i]);
         result1 = dec.format(Cordx[grouping[i]]);
         result2 = dec.format(Cordy[grouping[i]]);
         result3 = dec.format(ClustgroupX[i]);
         result4 = dec.format(ClustgroupY[i]);
         System.out.println("Cordinates are (" + result1 + " , " + result2 + ")");
         System.out.println("------------------------");
         System.out.println("Clusters for group " + grouping[i]);
         System.out.println("Cordinates are (" + result3 + " , " + result4 + ")");

          }
   }

   public void getMeansetCentroid(double[] Cordx, double[] Cordy, int ClustNumber)
      {
      double xCord, yCord;
      double MAX=0,distance, tempd1, tempd2;
      double[] Distances = new double[Cordx.length];
      int reference, i, j, temp1, temp2, point, length;
      reference = i = j = temp1 = temp2 = point =0;
      int[] centroids;


      for(j = 1; j < Cordx.length;j++) 
      {
         Distances[j-1] = Math.sqrt(Math.pow((Cordx[j]-Cordx[0]),2) + Math.pow((Cordy[j]-Cordy[0]),2));
      }
      for(i=0;i<Cordx.length-1;i++) 
      {
         for(j=0;j<Cordx.length-1-i;j++)
          {
            if(Distances[j+1] < Distances[j]) 
             {
               distance = Distances[j];
               tempd1 = Cordx[j];
               tempd2 = Cordy[j];
               Distances[j] = Distances[j+1];
               Cordx[j] = Cordx[j+1];
               Cordy[j] = Cordy[j+1];
               Distances[j+1] = distance;
               Cordx[j+1] = tempd1;
               Cordy[j+1] = tempd2;
             }
          }
       }
      //recalculation of centroids
      point = Cordx.length;
      do 
       {
         if(Cordx.length % ClustNumber != 0)
            point--;
       }while(point % ClustNumber != 0);
      length = point/ClustNumber;
      for(i=0;i<Cordx.length;i=length+i) 
        {
         if((i+length-1) > point)
           break; 
         tempd1 = Cordx[i];
         tempd2 = Cordy[i];
         Cordx[i] = Cordx[i+length-1];
         Cordy[i] = Cordy[i+length-1];
         Cordx[i+length-1] = tempd1;
         Cordy[i+length-1] = tempd2;
        }

   }


}

//output

C:\Program Files\Java\jdk1.6.0_03\bin>javac main1.java

C:\Program Files\Java\jdk1.6.0_03\bin>java main1
 Enter the number of clusters
3
 1.1 60
 8.2 20
 4.5 40
 7.6 15
 2.0 55
 3.9 39
 1.1 8.2 4.5 7.6 2.0 3.9
 60.0 20.0 40.0 15.0 55.0 39.0
 Cords are ( 7.60 , 15.00)

 Cords are ( 3.90 , 39.00)

 Cords are ( 2.00 , 55.00)

 Cords are ( 8.20 , 20.00)

 Cords are ( 4.50 , 40.00)

 Cords are ( 1.10 , 60.00)
------------------------
Clusters for group 0
Cordinates are (7.60 , 15.00)
------------------------
Clusters for group 0
Cordinates are (8.20 , 20.00)
------------------------
Clusters for group 1
Cordinates are (3.90 , 39.00)
------------------------
Clusters for group 1
Cordinates are (4.50 , 40.00)
------------------------
Clusters for group 2
Cordinates are (2.00 , 55.00)
------------------------
Clusters for group 2
Cordinates are (1.10 , 60.00)

C:\Program Files\Java\jdk1.6.0_03\bin>

Please help to read the data of size 6000*86 and generate discriptions for clustering. Though my code works my instructor wants me to enhance. but i dont know how

Edited 3 Years Ago by mike_2000_17: Fixed formatting

1) You're posting to a thread that is two months old, start you own thread.
2) You have not explained your problem nor have you shown what you've done to reach the point where you got an error. (if all you did was copy and paste someone else's code you're not going to get help.)

I have made a data mining software that offers more than 45 data mining algorithms in Java including K-Means.

It is open-source and the K-Means implementation is efficient. It is just a few files so it is easy to understand.

Also, note that there is a graphical user interface for launching K-Means and the other algorithms, and an example of how to use it on the website.

You can check this out on the project page: SPMF data mining software.

best,

Philippe

Edited 4 Years Ago by philfv: add some details

This article has been dead for over six months. Start a new discussion instead.