Hello, my prof gave us a problem from the ACM programming contest, we are supposed to write a program that takes a compressed file as input and generates a reproduction of the original uncompressed file as output,

the compression scheme requires making a list of the words in the uncompressed file. When a non-alphabetic character is encountered in the uncompressed file, it is copied directly into the compressed file. When a word is encountered in the uncompressed file, it is copied directly into the compressed file only if this is the first occurrence of the word. In that case, the word is put at the front of the list. If it is not the first occurrence, the word is not copied to the compressed file. Instead, its position in the list is copied into the compressed file and the word is moved to the front of the list. The numbering of list positions begins at 1,

like the sample input would be :


Dear Sally,
Please, please do it--1 would 4
Mary very, 1 much. And 4 6
8 everything in 5's power to make
14 pay off for you.

-- Thank 2 18 18--
0

sample output would be

dear Sally,
Please, please do it--it would please
Mary very, very much. And Mary would
do everything in Mary's power to make
it pay off for you.

-- Thank you very much--


so yeah, i dont even know where to start :sad:

Recommended Answers

All 27 Replies

Member Avatar for iamthwee

You need some kinda structure to determine which words have been repeated and such.

Has your professor given you any hints cos the scope is massive.

nope, no tips no nothing.. and yeah, scope is going to be huge, any help would be greatly appriciated!

Hello, my prof gave us a problem from the ACM programming contest, we are supposed to write a program that takes a compressed file as input and generates a reproduction of the original uncompressed file as output,

the compression scheme requires making a list of the words in the uncompressed file. When a non-alphabetic character is encountered in the uncompressed file, it is copied directly into the compressed file. When a word is encountered in the uncompressed file, it is copied directly into the compressed file only if this is the first occurrence of the word. In that case, the word is put at the front of the list. If it is not the first occurrence, the word is not copied to the compressed file. Instead, its position in the list is copied into the compressed file and the word is moved to the front of the list. The numbering of list positions begins at 1,

like the sample input would be :


Dear Sally,
Please, please do it--1 would 4
Mary very, 1 much. And 4 6
8 everything in 5's power to make
14 pay off for you.

-- Thank 2 18 18--
0

sample output would be

dear Sally,
Please, please do it--it would please
Mary very, very much. And Mary would
do everything in Mary's power to make
it pay off for you.

-- Thank you very much--


so yeah, i dont even know where to start :sad:

I'm assuming you just want to decompress the already compressed file. I don't know how the compressed file is formatted but here is an idea that might help:

Make a list of elements that are translated and read from the compressed file.

Taking each element at a time, place the corresponding word to its rightful position - which you can determine by considering each word or whitespace in the compressed file as a position. You might want to write a function that can scan the file distinguishing each word or whitespace as a position and place the file read position where it needs to go relative to the beginning of the file, and then insert a copy of the word.

If you do the above way, you'll need to search through the list finding positions in a successive order and forget which word that position corresponds to. In otherwords if you've got a list of words and their positions in a file in a list, search through the whole list for the position numbers rather than each words position numbers.

Diagram:

ucompressed file:
Daniweb is a really cool online forum and is a really cool place to meet knowledgable people. Daniweb is number 1.

compressed file:

Daniweb:18 is:9:19 a:10 really:11 cool:12 online forum and place to meet knowledgable people. number 1.

Round 1:

Daniweb:18 is:19 a:10 really:11 cool:12 online forum and "is" place to meet knowledgable people. number 1.

Just an idea but I hope it'll perhaps give you some other ideas.

Good luck, LamaBot

hmm
well i was thinking that i make a character array, and its input would be the sentence, and then make an if statement, so it reads and goes through the array, and if one of the characters is an integer, then it would go back in the array that number of times, like lets say we have a sentence "Daniweb is the best and is very helpful" as an uncompressed file,
and our compressed file would be "Daniweb is the best and 4 very useful".. so therefore it would read the 4, and go back 4 words, and read the 4th word, and then replace the number with that word.

its a bit confusing but some good old built in string manipulation functions should help

hmm
well i was thinking that i make a character array, and its input would be the sentence, and then make an if statement, so it reads and goes through the array, and if one of the characters is an integer, then it would go back in the array that number of times, like lets say we have a sentence "Daniweb is the best and is very helpful" as an uncompressed file,
and our compressed file would be "Daniweb is the best and 4 very useful".. so therefore it would read the 4, and go back 4 words, and read the 4th word, and then replace the number with that word.

its a bit confusing but some good old built in string manipulation functions should help

Well both methods would have the same effect but the programming would be different. Might take a bit more overhead because it'd have to check number of words back then jump back to that position inserting the word there. My way simply inserts sucessively. But you know the formatting defined by the given compression algorithm and your method is very good, kudo's. I hope I helped, and if I did, I'd be more than willing to help you further. :)

Good luck, Lamabot

Thanks alot Lazaro, i'll give it a shot and keep u guys posted, meanwhile anyone who has a method or knows how to do it please help!!! lol, again any help would be greatly appriciated

ok guys, im in a bit of pickle here, im still working on the program, its just that, ok its supposed to read the input from a file called input.txt, so the entire paragraph that i showed you in the sample input is in the file input.txt, now im trying to test it first before i start the program, so i used the code :

#include <stdio.h>
int main ()
{
char input[300];
gets (input);
printf ("%s", input);
}

im merely trying to print the input to see how its going to print, now its only printing Dear Sally, thats it, then it stops, it doesnt print everything, so it compiles correctly, then im trying to run it by using the input from the file input.txt.. this is how it looks like when i try to run it.

luna:~>cc assign2.c
luna:~>a.out < input.txt
Dear Sally,luna:~>

any ideas on how i can read from the file input.txt and then print the entire thing correctly???

http://www.phim.unibe.ch/comp_doc/c_manual/C/FUNCTIONS/gets.html

gets does NOT check the size of the buffer and overflow on the stack can occour. Because of this, you should use fgets in preferance.

So... use fgets.

Regarding your question: the link explains it, too:

gets continues to read characters until NEWLINE or EOF is seen.

Basically you need to create a loop in which fgets keeps reading from the file until EOF is reached. Something like this:

char line[255];
while(fgets(line, sizeof line, yourFile) != NULL) {
    // do whatever you want with 'line'
}

ok, my file name is input.txt, i cant just put it in fgets, it will create an error, like, this is not right :

int main ()
{
char line[300];
   
while (fgets (line, sizeof line, input ) != NULL) {         
  
printf ("%s", input);
}
}

the only reason im using the file input.txt is because my input is a paragraph, so i dont have to keep on testing and rewriting the paragraph, so im only trying to say

a.out < input.txt

, so the program would read its input from that file, problem is, with this program above, its only printing the first line of it if i use gets, is there something else other than fgets i can use??

You are getting an error because you are trying to print the "input stream" and not the "character array". In your use of fgets ( ) , "input" is the stream from which you are reading, line by line.

char buffer [BUFSIZ] = { '\0' } ;

while ( fgets (buffer, BUFSIZ, file_stream) != NULL )
{
    // process the line keeping in mind that it
    // has a newline at its end
   printf ("%s", buffer) ;
}

ok, im getting everything, and i know how to scan the whole thing now, but how do you print a paragraph in C? are there any built in functions for that or anything?

Member Avatar for iamthwee

ok, im getting everything, and i know how to scan the whole thing now, but how do you print a paragraph in C? are there any built in functions for that or anything?

Of course not. Define a paragraph?

hey, i made some good progress, but i still need your help lol, this is what i got so far

#include <stdio.h>
#include <ctype.h>
#include <string.h>
 
#define WMAX 100
#define WLEN 50
 
int main(){
 
char c, word[WMAX][WLEN];
int clet=0, cwor=0, num;
int length = strlen (*word);
int k;
 
c = getchar(); 
 
while (1){
 
if (c == '0'){
break;
}
 
if (isalpha (c)){ //if the character read is alphabetical
word[WMAX][WLEN] = c;
c = getchar (); //read another character
clet++; // add it to the word
}
else{
word [cwor][clet] = '\0'; //otherwise if its a
space
c = getchar (); //then read another character
cwor++;
break;
} 
} 
for (k = 0; k <= length; k++){
printf ("%s", word[k]);
}
}

ok, this is what i have so far, except its not printing right, this is what happens when i run it:

luna:~>cc assig2.c
luna:~>a.out < input.txt
luna:~>

nothing happens, i know its something wrong with my printf statement but what! lol its driving me nuts, also i have not done the number part yet. When it sees a number, its supposed to read it, then go back that number of words, take the word, and replace the number with it. Any help is greatly appriciated!!!

NOTE: input.txt contains the following:

Dear Sally,
Please, please do it--1 would 4
Mary very, 1 much. And 4 6
8 everything in 5's power to make
14 pay off for you.
-- Thank 2 18 18--
0

>word[WMAX][WLEN] = c;
Do you see what you're doing wrong here? Everytime you loop, you assign c to the same character -- the last one.

oh.. well what am i supposed to replace it with then?

>oh.. well what am i supposed to replace it with then?
Since it looks like you're using cwor and clet as your indices, use them for copying the letter.

yeah, i did, heres the code segment:

c = getchar();
                 
        while (c != '0'){
                
                 
                        if (isalpha (c)){
                                word[cwor][clet] = c;
                                c = getchar ();
                                clet++;
                                }     
                                 
                        else{ 
                                word [clet][cwor] = NULL;
                                c = getchar ();
                                cwor++;
                                break;
                                }
        }
         for (k = 0; k <= length; k++){
                printf ("%s\n", word[k]);
                }

ok, now its printing only the first word, i dont get why it does that, because i used a for loop for the print to print the input character by character.

The reason that it's breaking after the first word is that in your else {} block you happen to have a break statement, which effectively breaks out of the loop and jumps to the print statement.

The reason that it's breaking after the first word is that in your else {} block you happen to have a break statement, which effectively breaks out of the loop and jumps to the print statement.

hmm.. no i tried taking out the break already, it didnt work. The only reason i have the break statement in there is so that it wouldnt plunge into an infinite loop lol

>The only reason i have the break statement in there is so that it wouldnt plunge into an infinite loop lol
Duh. That's what the first break statement is for (the one that checks for a '0'). Each time you encounter a space, however, you should keep going. Otherwise you're only going to read in a single word.

well yeah but when i remove that break statement, nothing happens when i make an input, it just stays blank.

You need to reset clet everytime you start a new word. (Could you also post the updated code, as well as exactly what happens when you type something in?)

[edit]

Additionally, this line is totally screwed:

int length = strlen (*word);

length will likely only equal 0 because you haven't put anything into word . Declare cwor before the loop, and use this to determine the number of iterations the print loop should run (note that this value is 1 less than the number due to arrays starting at element 0).

i dont get it, do you mean i have to put clet = 0; everytime??
anyway heres the updated code:

#include <stdio.h>
#include <ctype.h>
#include <string.h>   
 
#define WMAX 100
#define WLEN 50
 
int main(){
 
char c, word[WMAX][WLEN];
int clet=0, cwor=0, num;
int length = strlen (*word);
int k;
 
 
c = getchar();
 
        while (1){
 
                if (c == '0'){
                break;
                }
                        if (isalpha (c)){
                                word[cwor][clet] = c;
                                c = getchar ();
                                clet++;
                                }
                        else{
                                word [clet][cwor] = '\0';
                                c = getchar ();
                                clet++;
                                }     
                      }
         for (k = 0; k <= length; k++){
                printf ("%s\n", word[k]);
                }
 
        return 1;
  }

Notice this one doesnt have a break statement under clet++; in the else statement.

Heres what happens when i run it:

luna:~>cc assig2.c
luna:~>a.out
Daniweb is the best

nothing happens, "Daniweb is the best" is the input, and nothing happened, its just blank, its supposed to read it and print it, but it just stays blank,


[edit]: im only using strlen so i can print, thats all im using it for.

>do you mean i have to put clet = 0; everytime??
No. Add it where you start a new word.

#include <stdio.h>
#include <ctype.h>
#include <string.h>

#define WMAX 100
#define WLEN 50

int main(){
    
    char c, word[WMAX][WLEN];
    int clet=0, cwor=0, num;
    int length = strlen (*word); // read my previous post; this is going to be 0
    int k;
    
    
    c = getchar();
    
    while (1){
        
        if (c == '0'){
            break;
        }
        if (isalpha (c)){
            word[cwor][clet] = c;
            c = getchar ();
            clet++;
        }
        else{
            word [clet][cwor] = '\0';
            c = getchar ();
            clet++; // what? increment cwor, not clet!
            // you should be resetting clet here
        }
    }
    // replace 'length' with 'cwor'
    for (k = 0; k <= length; k++){
        printf ("%s\n", word[k]);
    }
    
    return 1;
}

As regards for not stopping, I don't know what you mean. I just tried the code with the modifications, and nowhere did it stop if I used a 0 at the end of the string.

ohhhhh, i see, heh, sorry, been working all day, anyways, yeah now it works, only the printing is very strange..

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define WMAX 100                
#define WLEN 50                 
int main(){
        
char c, word[WMAX][WLEN];
int clet=0, cwor=0, num;
int k;
                 
                        
c = getchar();
                                
        while (1){
                                
                if (c == '0'){        
                break;           
                }
                                
                        if (isalpha (c)){
                                word[cwor][clet] = c;
                                c = getchar ();
                                clet++;
                                      
                                } 
                
                        else{
                                word [clet][cwor] = '\0';
                                c = getchar ();
                                cwor++;
                                clet=0;
                     }
        for (k = 0; k<= cwor;k++){    
                printf ("%s\n", word[k]);
                }
          }                     
        return 1;
  }

Heres the output:

luna:~>cc assig2.c
luna:~>a.out
Dear Sally 0
D
De
Dea
Dear
Dear
Dear
S
Dear
Sa
Dear
Sal
Dear
Sall
Dear
Sally
Dear
Sally

Nope, Nevermind, got it to work, Thanks a bunch for the help!

ok, now im trying to find any numbers in the input so that when the program reads the number, it will go back that number of words,

example:
Daniweb is the best, and 4 very helpful.

Now the program will read the 4, and go back 4 words, it will read the word "is" and replace the 4 with it, so the output would be:

Daniweb is the best, and is very helpful.

This is what i have so far in my code:

#include <stdio.h>
#include <ctype.h>
#include <string.h>
#define WMAX 100
#define WLEN 50
 
int main(){
char c, word[WMAX][WLEN];
int clet=0, cwor=0, num;
int k;
 
 
c = getchar();
 
        while (1){
 
                if (c == '0'){
                break;
                }
 
                        if (isalpha (c)){
                                word[cwor][clet] = c;
                                c = getchar ();
                                clet++;
 
                                }
 
                        else{
                                word [clet][cwor] = '\0';
                                c = getchar ();
                                cwor++;
                                clet=0;
                                }
        while (1){
                        if (isdigit (c)){
                                num = (num*10) + (c-48);
                                printf ("%s", word[cwor - num]);
                                c = getchar ();
                                cwor++;
                                }
                        else{
                                word [clet][cwor] = '\0';
                                c = getchar ();
                                cwor++;
                                break;
                        }
 
                }
        }
 
                for (k = 0; k <= cwor; k++){
                printf ("%s ", word);
                }
 
return 1;
}

and when i run it,this is what i get:

luna:~>cc assig2.c
luna:~>a.out
Daniweb is the best and 4 very helpful 0

again its like what happened before, even if i do input 0, it will still go blank

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.