In my program to implement huffman algorithm.I have created the huffman codes and stored the ascii values and corresponding codes in a map.While creating the encoded file I followed this approach:I firstly converted the ascii codes in to their corresponding integer values and wrote them in a file.Now ,while decoding I read each byte from that file and converted it into its binary equivalent and searched for their ascii values in the map.This approach is creating the following problem.Since,01,001,1 all have integer equivalent as 1,so when I convert the codes into integer it no longer remains unique.I have ran out of ideas. And I faled to store the actual huffman tree.So I could not follow the decoding scheme where I can traverse the tree a file the codes Help!:?:

Recommended Answers

All 6 Replies

here is the problem i am facing now.I have a file consist of a b c d.codes a=100,b=101,c=110,d=111 and space=0.Now,the encoded file is 100010101100111.so ,I take 8 bytes(of 0 and 1) from the encoded file convert it into integer and store it in another file(say auxfile.txt) so, auxfile.txt contains 138 and 103.Now while decoding I first convert 138 and 103 into its bainary equivalent.But to my horror ,while reading the first byte(from auxfile.txt) I get 63 and not 138!The next byte is coming properly as 103.Please help
Here is the code where I am converting strings of 0 ,1 into numbers.

 size=fin1.available();  
     while(count<=size){  
     bytes=fin1.read();  
     count++;  
      if(bytes!=-1)  
     {  
     if(tm.containsKey(bytes))    
     {  str1+=tm.get(bytes);  
        while((str1.length()<8) && (count<=size))  
        {  
            bytes=fin1.read();  
            count++;  

                if(bytes!=-1){  
            str2=tm.get(bytes);  
            str1 +=str2;}  
        }  
        if(str1.length()>8)  
        {str2=str1.substring(0,8);  
            int x=Integer.parseInt(str2,2);  
        fout2.write(x);  
        str1=str1.substring(8);  

        }  

        else if(str1.length()==8)  

            {     

                int x=Integer.parseInt(str1,2);  
                    fout2.write(x);  
                str1="";  

            }  
            }  
     }  

     }  
    if((str1.length()<8))  
    {  
        int x=Integer.parseInt(str1,2);  
    fout2.write(x);   
    }  
fin1.close();fout2.close();

10001010 1100111 isn't 8 bytes, no matter how you count them.
I don't have time now to study your code with all those String manipulations, but the normal way to do bit-wize manipulations of the contents of an int is via boolean operations, eg

// get the first (high-order) byte from an int...
(myInt>> 24) & 0xff; // shifts 3 bytes left and masks out the first 3 byte
// put a byte into the first (high-order) byte of an int... 
myInt | (myByte << 24);

s

@JamesCherrill I did not say 10001010 1100111 is 8 bytes.I said i am taking the first 8 bytes form it i.e. 10001010 and converting into an integer using base 2(so,it comes as 138).Then,I am taking 1100111 and doing the same it gives me 103.My objective is to convert these strings of 0 and 1's ,convert them into integer and store them in a file.Then read each byte(i.e numbers) from the file convert it into its binary equivalent

1 byte = 8 bits. I think you are taking the first 8 bits, not the first 8 bytes. Hence the confusion. Nevermind.
I'm still a little confused as to where Strings come into this. You are mapping a sequence of bits to an ASCII character. Assuming the sequence of bits is <=32 bits long you can work with each code in an int, and use java chars for the ascii characters.
If you are using a single bit 0 as the code for ' ' then no other code can begin with a 0 bit, so packing sequences from the ints to a single fully-compressed sequence is just a question of copying only from the first 1 bit.
Finally, for the fully-compressed form, have a look at the BitSet class. It holds a sequence of bits, as long as you want, and is serialisable so you can read/write it to/from a file with a simple Object stream

Actually my huffman codes are string type.Say,code for a=101,hence code for a takes 3 bytes(not 3 bits).So,while encoding the data using huffman algorithm my encoded file size is getting larger than the actual file size.Hence the problem.So, my idea was to read 8 bytes for the strings of 0,1 and convert it into number.
Anyway I have solved the problem.

Java Strings are Unicode - 16 bits/2 bytes per character.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.