hi,
I am very naive to java. my project is in data mining where i have to implement k means clustering.
the task is like this
1.reads a csv file and stores the attributes in a matrix format
2.clusters the matrix data depending on the euclidean distance measure.
3. the centroid value the user must give
4. the mean should be recalculated for the clusters to maintain accuracy
4. the output should be the clusters or groups
thanks in advance
masijade 1,351 Industrious Poster Team Colleague Featured Poster
I don't know why you're telling us what you need to do, but I wish you luck.
rapture 134 Posting Whiz in Training
thijo 0 Newbie Poster
hi,
I am very naive to java. my project is in data mining where i have to implement k means clustering.
the task is like this
1.reads a csv file and stores the attributes in a matrix format(6000rows and 86 columns). From that i have to choose certain columns(attributes) for clustering
2.clusters the matrix data depending on the euclidean distance measure.(minimum distance) from the user given centroids
3. the centroid value the user must give.
4. the mean should be recalculated for the clusters to maintain accuracy
4. the output should be the clusters or groups
please help to code in java. do i have any packages in java to implement this
thanks in advance
stephen84s 550 Nearly a Posting Virtuoso Featured Poster
I am very naive to java.
So buy / borrow / steal a book on Java read it (there is also the "Starting Java" sticky which should interest you) and get up to speed on it. I think your class instructor / firm would have given you this assignment only after they were **sure** that you could handle it. However if you gave them a false impression of your skills then I do not know .....
do i have any packages in java to implement this
Weka is a data mining tool written in Java, so you might want to check out how it works. I however have never used, so do not have a clue about it.
please help to code in java
Help yes, but do not expect us to do the entire work for you. First you show us what you have tried. Reading data from a CSV file is simple enough show what you have achieved there ??
thijo 0 Newbie Poster
//main class
import java.io.*;
import java.util.*;
import java.lang.*;
public class main
{
public static void main (String args[]) throws IOException
{
Centroid cent = new Centroid();
int ClustNumber;
System.out.println(" Enter the number of clusters");
Scanner input = new Scanner(System.in);
ClustNumber=input.nextInt();
String [][] numbers = new String [8][2];
double Cordx[] =new double[8];
double Cordy[] =new double[2];
File file = new File("sam.csv");
BufferedReader bufRdr = new BufferedReader(new FileReader(file));
String line = null;
int row = 0;
int col = 0;
//read each line of text file
while((line = bufRdr.readLine()) != null && row< 8 )
{
StringTokenizer st = new StringTokenizer(line,",");
while (st.hasMoreTokens())
{
//get next token and store it in the array
numbers[row][col] = st.nextToken();
col++;
}
col = 0;
row++;
}
for(row=0;row < 8;row++)
{
for(col=0; col<2;col++)
{
System.out.print(" " + numbers[row][col]);
}
//System.out.println(" ");
}
for (row=0; row<8;row ++)
{
for(col=0;col <2;col++)
{
Cordx[row]=Double.parseDouble(numbers[row][col]);
Cordy[col]=Double.parseDouble(numbers[row][col]);
}
}
//cent.Grouping(Cordx,Cordy,ClustNumber);
}
}
//centroid class
import java.io.*;
import java.util.*;
import java.lang.*;
import java.text.*;
public class Centroid {
public void Grouping(double[] Cordx, double[] Cordy, int clustNumber) {
int clusterNumber = clustNumber;
double[] ClustCordX = new double[clustNumber];
double[] ClustCordY = new double[clustNumber];
this.getMeansetCentroid(Cordx, Cordy, clustNumber);
DecimalFormat dec = new DecimalFormat("0.00");
for(int i = 0;i<Cordx.length;i++) {
String result1 = dec.format(Cordx[i]);
String result2 = dec.format(Cordy[i]);
System.out.println("\n Cords are ( " + result1 + " , " + result2 + ")");
}
for(int i = 0; i<clustNumber;i++) {
ClustCordX[i] = Cordx[i];
ClustCordY[i] = Cordy[i];
}
this.groupCordtoCluster(Cordx,Cordy,ClustCordX,ClustCordY);
}
public void groupCordtoCluster(double[] Cordx, double[] Cordy, double[] ClustCordX, double[] ClustCordY) {
double temp ;
int size = Cordx.length;
int clustsize = ClustCordX.length;
int clusterComparison = clustsize;
int[] grouping = new int[size - clustsize];
double[] ClustgroupX = new double[size - clustsize];
double[] ClustgroupY = new double[size - clustsize];
int tempint = -1;
for(int i = clusterComparison; i < size;i++) {
temp = 0;
for(int j = 0;j<clustsize;j++) {
if (j == 0)
tempint++;
if(temp == 0) {
temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
grouping[tempint] = j;
ClustgroupX[tempint] = Cordx[i];
ClustgroupY[tempint] = Cordy[i];
}
else if (temp > Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2))) {
temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
grouping[tempint] = j;
ClustgroupX[tempint] = Cordx[i];
ClustgroupY[tempint] = Cordy[i];
}
}
}
DecimalFormat dec = new DecimalFormat("0.00");
String result1, result2, result3, result4;
for(int i = 0; i<grouping.length;i++) {
System.out.println("------------------------");
System.out.println("Clusters for group " + grouping[i]);
result1 = dec.format(Cordx[grouping[i]]);
result2 = dec.format(Cordy[grouping[i]]);
result3 = dec.format(ClustgroupX[i]);
result4 = dec.format(ClustgroupY[i]);
System.out.println("Cordinates are (" + result1 + " , " + result2 + ")");
System.out.println("------------------------");
System.out.println("Clusters for group " + grouping[i]);
System.out.println("Cordinates are (" + result3 + " , " + result4 + ")");
}
}
public void getMeansetCentroid(double[] Cordx, double[] Cordy, int ClustNumber) {
double xCord, yCord;
double MAX=0,distance, tempd1, tempd2;
double[] Distances = new double[Cordx.length];
int reference, i, j, temp1, temp2, point, length;
reference = i = j = temp1 = temp2 = point =0;
int[] centroids;
for(j = 1; j < Cordx.length;j++) {
Distances[j-1] = Math.sqrt(Math.pow((Cordx[j]-Cordx[0]),2) + Math.pow((Cordy[j]-Cordy[0]),2));
}
for(i=0;i<Cordx.length-1;i++) {
for(j=0;j<Cordx.length-1-i;j++) {
if(Distances[j+1] < Distances[j]) {
distance = Distances[j];
tempd1 = Cordx[j];
tempd2 = Cordy[j];
Distances[j] = Distances[j+1];
Cordx[j] = Cordx[j+1];
Cordy[j] = Cordy[j+1];
Distances[j+1] = distance;
Cordx[j+1] = tempd1;
Cordy[j+1] = tempd2;
}
}
}
point = Cordx.length;
do {
if(Cordx.length % ClustNumber != 0)
point--;
}while(point % ClustNumber != 0);
length = point/ClustNumber;
for(i=0;i<Cordx.length;i=length+i) {
if((i+length-1) > point)
break;
tempd1 = Cordx[i];
tempd2 = Cordy[i];
Cordx[i] = Cordx[i+length-1];
Cordy[i] = Cordy[i+length-1];
Cordx[i+length-1] = tempd1;
Cordy[i+length-1] = tempd2;
}
}
}
and my data sample data file is
1.1,60
8.2,20
4.2,35
1.5,21
7.6,15
2.0,55
3.9,39
The error is
C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java
C:\Program Files\Java\jdk1.6.0_03\bin>java main
Enter the number of clusters
4
1.1 60
8.2 20
4.2 35
1.5 21
7.6 15
2.0 55
3.9 39
null
Exception in thread "main" java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
at java.lang.Double.parseDouble(Double.java:510)
at main.main(main.java:53)
C:\Program Files\Java\jdk1.6.0_03\bin>java main
Enter the number of clusters
9
1.1 60
8.2 20
4.2 35
1.5 21
7.6 15
2.0 55
3.9 39
null
Exception in thread "main" java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
at java.lang.Double.parseDouble(Double.java:510)
at main.main(main.java:53)
C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java
C:\Program Files\Java\jdk1.6.0_03\bin>java main
Enter the number of clusters
4
1.1 60
8.2 20
4.2 35
1.5 21
7.6 15
2.0 55
3.9 39
null
Exception in thread "main" java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
at java.lang.Double.parseDouble(Double.java:510)
at main.main(main.java:53)
C:\Program Files\Java\jdk1.6.0_03\bin>java main
Enter the number of clusters
0
1.1 60
8.2 20
4.2 35
1.5 21
7.6 15
2.0 55
3.9 39
null
Exception in thread "main" java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
at java.lang.Double.parseDouble(Double.java:510)
at main.main(main.java:53)
C:\Program Files\Java\jdk1.6.0_03\bin>javac main.java
C:\Program Files\Java\jdk1.6.0_03\bin>java main
Enter the number of clusters
4
1.1 60 8.2 20 4.2 35 1.5 21 7.6 15 2.0 55 3.9 39 nullException in thread "
main" java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:99
4)
at java.lang.Double.parseDouble(Double.java:510)
at main.main(main.java:53)
C:\Program Files\Java\jdk1.6.0_03\bin>
But if it happens to read [6000][86]
what will happen.
please help me now. thank you for your responses
Edited by mike_2000_17 because: Fixed formatting
stephen84s 550 Nearly a Posting Virtuoso Featured Poster
Put your code inside code tags, like this:-
[code=java] // Your code here.
[/code]
The rest like the output you are getting, post it as :-
[code]
//Output on compiling / running the program.
[/code]
thijo 0 Newbie Poster
import java.io.*;
import java.util.*;
import java.lang.*;
public class main1
{
public static void main (String args[]) throws IOException
{
Centroid cent = new Centroid();
int ClustNumber;
System.out.println(" Enter the number of clusters");
Scanner input = new Scanner(System.in);
ClustNumber=input.nextInt();
String [][] numbers = new String [6][2];
double Cordx[] =new double[6];
double Cordy[] =new double[6];
File file = new File("sam.csv");
BufferedReader bufRdr = new BufferedReader(new FileReader(file));
String line = null;
int row = 0;
int col = 0;
//read each line of text file
while((line = bufRdr.readLine()) != null && row< 6 )
{
StringTokenizer st = new StringTokenizer(line,",");
while (st.hasMoreTokens())
{
//get next token and store it in the array
numbers[row][col] = st.nextToken();
col++;
}
col = 0;
row++;
}
for(row=0;row < 6;row++)
{
for(col=0; col<2;col++)
{
System.out.print(" " + numbers[row][col]);
}
System.out.println(" ");
}
for(row=0;row<6;row++)
{
Cordx[row]=Double.parseDouble(numbers[row][0]);
Cordy[row]=Double.parseDouble(numbers[row][1]);
//System.out.print(" " + Cordx[row]);
}
for(row=0;row<6;row++)
{
System.out.print(" " + Cordx[row]);
//System.out.print("\n " + Cordy[row]);
}
System.out.print(" \n");
for(row=0;row<6;row++)
{
System.out.print(" " + Cordy[row]);
}
cent.Grouping(Cordx,Cordy,ClustNumber);
}
}
//centroid
import java.io.*;
import java.util.*;
import java.lang.*;
import java.text.*;
public class Centroid
{
// finds the mean of the dataset for grouping
public void Grouping(double[] Cordx, double[] Cordy, int clustNumber)
{
int clusterNumber = clustNumber;
double[] ClustCordX = new double[clustNumber];
double[] ClustCordY = new double[clustNumber];
this.getMeansetCentroid(Cordx, Cordy, clustNumber);
DecimalFormat dec = new DecimalFormat("0.00");
for(int i = 0;i<Cordx.length;i++)
{
String result1 = dec.format(Cordx[i]);
String result2 = dec.format(Cordy[i]);
System.out.println("\n Cords are ( " + result1 + " , " + result2 + ")");
}
//setting random datasets as centroids
for(int i = 0; i<clustNumber;i++)
{
ClustCordX[i] = Cordx[i];
ClustCordY[i] = Cordy[i];
}
this.groupCordtoCluster(Cordx,Cordy,ClustCordX,ClustCordY);
}
public void groupCordtoCluster(double[] Cordx, double[] Cordy, double[] ClustCordX, double[] ClustCordY)
{
double temp ;
int size = Cordx.length;
int clustsize = ClustCordX.length;
int clusterComparison = clustsize;
int[] grouping = new int[size - clustsize];
double[] ClustgroupX = new double[size - clustsize];
double[] ClustgroupY = new double[size - clustsize];
int tempint = -1;
//grouping the dataset to respective clusters by comparing the distance
for(int i = clusterComparison; i < size;i++)
{
temp = 0;
for(int j = 0;j<clustsize;j++)
{
if (j == 0)
tempint++;
if(temp == 0)
{
temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
grouping[tempint] = j;
ClustgroupX[tempint] = Cordx[i];
ClustgroupY[tempint] = Cordy[i];
}
else if (temp > Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2)))
{
temp = Math.sqrt(Math.pow((Cordx[i]-ClustCordX[j]),2) + Math.pow((Cordy[i]-ClustCordY[j]),2));
grouping[tempint] = j;
ClustgroupX[tempint] = Cordx[i];
ClustgroupY[tempint] = Cordy[i];
}
}
}
DecimalFormat dec = new DecimalFormat("0.00");
String result1, result2, result3, result4;
for(int i = 0; i<grouping.length;i++)
{
//for(int i = 1; i< clustNumber;i++) {
System.out.println("------------------------");
System.out.println("Clusters for group " + grouping[i]);
result1 = dec.format(Cordx[grouping[i]]);
result2 = dec.format(Cordy[grouping[i]]);
result3 = dec.format(ClustgroupX[i]);
result4 = dec.format(ClustgroupY[i]);
System.out.println("Cordinates are (" + result1 + " , " + result2 + ")");
System.out.println("------------------------");
System.out.println("Clusters for group " + grouping[i]);
System.out.println("Cordinates are (" + result3 + " , " + result4 + ")");
}
}
public void getMeansetCentroid(double[] Cordx, double[] Cordy, int ClustNumber)
{
double xCord, yCord;
double MAX=0,distance, tempd1, tempd2;
double[] Distances = new double[Cordx.length];
int reference, i, j, temp1, temp2, point, length;
reference = i = j = temp1 = temp2 = point =0;
int[] centroids;
for(j = 1; j < Cordx.length;j++)
{
Distances[j-1] = Math.sqrt(Math.pow((Cordx[j]-Cordx[0]),2) + Math.pow((Cordy[j]-Cordy[0]),2));
}
for(i=0;i<Cordx.length-1;i++)
{
for(j=0;j<Cordx.length-1-i;j++)
{
if(Distances[j+1] < Distances[j])
{
distance = Distances[j];
tempd1 = Cordx[j];
tempd2 = Cordy[j];
Distances[j] = Distances[j+1];
Cordx[j] = Cordx[j+1];
Cordy[j] = Cordy[j+1];
Distances[j+1] = distance;
Cordx[j+1] = tempd1;
Cordy[j+1] = tempd2;
}
}
}
//recalculation of centroids
point = Cordx.length;
do
{
if(Cordx.length % ClustNumber != 0)
point--;
}while(point % ClustNumber != 0);
length = point/ClustNumber;
for(i=0;i<Cordx.length;i=length+i)
{
if((i+length-1) > point)
break;
tempd1 = Cordx[i];
tempd2 = Cordy[i];
Cordx[i] = Cordx[i+length-1];
Cordy[i] = Cordy[i+length-1];
Cordx[i+length-1] = tempd1;
Cordy[i+length-1] = tempd2;
}
}
}
//output
C:\Program Files\Java\jdk1.6.0_03\bin>javac main1.java
C:\Program Files\Java\jdk1.6.0_03\bin>java main1
Enter the number of clusters
3
1.1 60
8.2 20
4.5 40
7.6 15
2.0 55
3.9 39
1.1 8.2 4.5 7.6 2.0 3.9
60.0 20.0 40.0 15.0 55.0 39.0
Cords are ( 7.60 , 15.00)
Cords are ( 3.90 , 39.00)
Cords are ( 2.00 , 55.00)
Cords are ( 8.20 , 20.00)
Cords are ( 4.50 , 40.00)
Cords are ( 1.10 , 60.00)
------------------------
Clusters for group 0
Cordinates are (7.60 , 15.00)
------------------------
Clusters for group 0
Cordinates are (8.20 , 20.00)
------------------------
Clusters for group 1
Cordinates are (3.90 , 39.00)
------------------------
Clusters for group 1
Cordinates are (4.50 , 40.00)
------------------------
Clusters for group 2
Cordinates are (2.00 , 55.00)
------------------------
Clusters for group 2
Cordinates are (1.10 , 60.00)
C:\Program Files\Java\jdk1.6.0_03\bin>
Please help to read the data of size 6000*86
and generate discriptions for clustering. Though my code works my instructor wants me to enhance. but i dont know how
Edited by mike_2000_17 because: Fixed formatting
thijo 0 Newbie Poster
why you people dont help
victoriageorge 0 Newbie Poster
i have compiled the source code main1.java.it's getting error in Decimalformat object. i need your help.
rapture 134 Posting Whiz in Training
1) You're posting to a thread that is two months old, start you own thread.
2) You have not explained your problem nor have you shown what you've done to reach the point where you got an error. (if all you did was copy and paste someone else's code you're not going to get help.)
victoriageorge 0 Newbie Poster
i am in need of k means clustering for the face recognition which is for my research.so i need a sample code in any language(c,c++,java or vb)
sdemirel 0 Newbie Poster
philfv 0 Newbie Poster
Hi
philfv 0 Newbie Poster
I have made a data mining software that offers more than 45 data mining algorithms in Java including K-Means.
It is open-source and the K-Means implementation is efficient. It is just a few files so it is easy to understand.
Also, note that there is a graphical user interface for launching K-Means and the other algorithms, and an example of how to use it on the website.
You can check this out on the project page: SPMF data mining software.
best,
Philippe
Edited by philfv because: add some details
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.