File parsing in 'C'

Reply

Join Date: Jul 2004
Posts: 16
Reputation: reuben12 is an unknown quantity at this point 
Solved Threads: 0
reuben12 reuben12 is offline Offline
Newbie Poster

File parsing in 'C'

 
1
  #1
Jul 12th, 2004
I have a bcp output file from sybase that i need to parse using 'c' and write to a new file.

A sample line from bcp o/p file is below

9890000501:74667:0:6::2:0000:0:6:0:5:0:0:0:0:0:0:0:::9890000501:1:1:0::::::::0:0:3::::::::::\202^B:\202^B^D:0:0
:1:0:0:0:0:0:0:0:0:0::::0:0:0:::::::::0::0:0:0:0:

The file has ":" has the delimitor.
I need to create a new file with a single line string as something like this

CUST:NAME=9890000501,HT=74667&0&6,LT=PT-0&PQ-5;

There will be thousands of subscribers.I also will need to get data from different bcp output files to get the data to be filled in this string.

The 'c' program needs to be fast.

I'm a novice with 'c'.

The only way i know is read char by char and check for ":" and then store the value into a char array.
I would have as many variables as there are in the string!!

What would be the fast way to do this?

Is there any function that would in one shot give me the values separated by a delimitor of my choice?

Which is the best way to build up the final string?

Can i create the fixed parts of the string and then just fill up values that need to be taken from the bcp output file??

please HELP!!!
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 436
Reputation: Chainsaw is an unknown quantity at this point 
Solved Threads: 11
Chainsaw's Avatar
Chainsaw Chainsaw is offline Offline
Unprevaricator

Re: File parsing in 'C'

 
1
  #2
Jul 12th, 2004
Check out 'strtok()'; it will parse a string, stopping on one or more tokens; in this case your ':'.

The strings you parse could be referenced by an array of string pointers:

  1. const char* foundThings[30]; // or however big
  2.  
  3. while (going)
  4. {
  5. foundThings[n] = strtok( inputString, ":" );
  6. if (foundThings[n] == NULL)
  7. going = false;
  8. else
  9. n++;
  10. }
now all parsed strings are in the foundThings[] array.
Reply With Quote Quick reply to this message  
Join Date: Jul 2004
Posts: 16
Reputation: reuben12 is an unknown quantity at this point 
Solved Threads: 0
reuben12 reuben12 is offline Offline
Newbie Poster

Re: File parsing in 'C'

 
0
  #3
Jul 12th, 2004
WOW! thats neat.
Thank you!!

A couple of things!!

1) I read the man page for strtok.It says that it returns a (char *) pointer back.

Suppose i have a fixed line in which i want to fill up values from foundthing array.

for example ,

CUST:NAME=<value from array>,TP=<next value from array>&<next value from array>;

This values would be of variable length.

In perl i would write it in a single line with the '.' operator.

Is there a similar way here or i should store the fixed strings in a char array and do 'strcat' with each value.

The ouput file would have thousands of lines as above.

Example,

char S1[100]="CUST:NAME=";
strcat (S1,foundThing[0]);
strcat(S1,",TP=");
strcat(S1,foundThing[1]);

and so on.

Is there a faster way to do this work since efficiency is very important(reason to choose 'c' over perl)
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 4,335
Reputation: Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future 
Solved Threads: 236
Team Colleague
Dave Sinkula's Avatar
Dave Sinkula Dave Sinkula is offline Offline
long time no c

Re: File parsing in 'C'

 
1
  #4
Jul 12th, 2004
You may want to try fgets/sscanf. Maybe something like this.
  1. #include <stdio.h>
  2. int main(void)
  3. {
  4. const char filename[] = "file.txt";
  5. FILE *file = fopen(filename, "r");
  6. if ( file )
  7. {
  8. char line [ BUFSIZ ];
  9. while ( fgets(line, sizeof line, file) )
  10. {
  11. char substr[32], *ptr = line;
  12. int n;
  13. fputs(line, stdout);
  14. while ( *ptr )
  15. {
  16. if ( sscanf(ptr, "%31[^:]%n", substr, &n) == 1 )
  17. {
  18. ptr += n;
  19. puts(substr);
  20. }
  21. else
  22. {
  23. puts("---empty field---");
  24. }
  25. ++ptr;
  26. }
  27. }
  28. }
  29. else
  30. {
  31. perror(filename);
  32. }
  33. return 0;
  34. }
Then, if this is too slow, I'd look into other faster techniques.
"One of the methods used by statists to destroy capitalism consists in establishing controls that tie a given industry hand and foot, making it unable to solve its problems, then declaring that freedom has failed and stronger controls are necessary." --Ayn Rand
Reply With Quote Quick reply to this message  
Join Date: Jun 2004
Posts: 436
Reputation: Chainsaw is an unknown quantity at this point 
Solved Threads: 11
Chainsaw's Avatar
Chainsaw Chainsaw is offline Offline
Unprevaricator

Re: File parsing in 'C'

 
0
  #5
Jul 12th, 2004
Well, generally the fastest way to do it is to loop through the string in a while loop, not relying on standard string routines. Some string library functions may be implemented in assembler on some platforms, generally printf/scanf/strtok and the like aren't.

the non-standard-library parser could be something like this:
  1.  
  2. // parse the source line into tokens[], returns the number of tokens found.
  3. // note that some tokens may be empty.
  4. int ParseLine( char* theSourceLine, const char* tokens[], int maxTokens )
  5. {
  6. currentToken = 0;
  7. tokens[currentToken] = theSourceLine;
  8.  
  9. while (*theSourceLine)
  10. {
  11. if (*theSourceLine == ':')
  12. {
  13. *theSourceLine = 0; // null terminate this string
  14. currentToken++;
  15. if (currentToken >= maxTokens) return currentToken; // reached the limit; maybe return -1?
  16. tokens[currentToken] = theSourceLine; // next token starts here
  17. }
  18. theSourceLine++;
  19. }
  20. return currentToken;
  21. }
And then, if it were me, I'd have an array of the static strings to build up the final string:
  1. static const char* finalStringConstants[MAX_FINAL_CONSTANTS] =
  2. {
  3. "CUST:NAME=",
  4. "HT=",
  5. "LT=",
  6. <etc>
  7. };
  8.  
  9. char outputLine[500]; // make it big enough
  10. outputLine[0] = 0; // null terminate it to start
  11. for (i = 0; i < MAX_FINAL_CONSTANTS; i++)
  12. {
  13. strcat(outputLine, finalStringConstants[i]);
  14. strcat(outputLine, tokens[i] );
  15. }

If that's not fast enough, you could store the token lengths when you are parsing them and use memcpy() rather than strcpy() in the final loop.
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 4,335
Reputation: Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future 
Solved Threads: 236
Team Colleague
Dave Sinkula's Avatar
Dave Sinkula Dave Sinkula is offline Offline
long time no c

Re: File parsing in 'C'

 
0
  #6
Jul 12th, 2004
My highlighing in red:
Rob Pike, a leading expert on applying the C programming language, offers the following "rules" in Notes on Programming in C as programming maxims (but they can be easily viewed as points of a Unix philosophy):
  • Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
  • Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
  • Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
  • Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
  • Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
  • Rule 6. There is no Rule 6.
Pike's rules 1 and 2 restate Donald Knuth's famous maxim, "Premature optimization is the root of all evil." Ken Thompson rephrased Pike's rule 4 as "When in doubt, use brute force." Rule 5 was previously stated by Fred Brooks in The Mythical Man-Month.
http://en.wikipedia.org/wiki/Unix_philosophy
"One of the methods used by statists to destroy capitalism consists in establishing controls that tie a given industry hand and foot, making it unable to solve its problems, then declaring that freedom has failed and stronger controls are necessary." --Ayn Rand
Reply With Quote Quick reply to this message  
Join Date: Jul 2004
Posts: 16
Reputation: reuben12 is an unknown quantity at this point 
Solved Threads: 0
reuben12 reuben12 is offline Offline
Newbie Poster

Re: File parsing in 'C'

 
0
  #7
Jul 13th, 2004
Thanks for the solutions!!

Can we use sprintf() to format the string at the end instead of strcat()?

I can use multiple "%s" to pick from the static string and the parsed output strings
and write to a buffer in one shot.

This buffer can be written to the file.

Is this the best way to go??
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 4,335
Reputation: Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future 
Solved Threads: 236
Team Colleague
Dave Sinkula's Avatar
Dave Sinkula Dave Sinkula is offline Offline
long time no c

Re: File parsing in 'C'

 
0
  #8
Jul 13th, 2004
Do you something mean like this?
  1. #include <stdio.h>
  2.  
  3. int main(void)
  4. {
  5. const char filename[] = "file.txt";
  6. FILE *file = fopen(filename, "r");
  7. if ( file )
  8. {
  9. char line [ BUFSIZ ];
  10. while ( fgets(line, sizeof line, file) )
  11. {
  12. char name[16], ht[8], output[128];
  13. int a,b,c,d;
  14. if ( sscanf(line, "%15[^:]:%7[^:]:%d:%d::%*d:%*d:%*d:%*d:%d:%d",
  15. name, ht, &a, &b, &c, &d) == 6 )
  16. {
  17. snprintf(output, sizeof output,
  18. "CUST:NAME=%s,HT=%s,&%d&%d,LT=PT-%d&PQ-%d",
  19. name, ht, a, b, c, d);
  20. puts(output);
  21. }
  22. }
  23. }
  24. else
  25. {
  26. perror(filename);
  27. }
  28. return 0;
  29. }
  30.  
  31. /* my output
  32. CUST:NAME=9890000501,HT=74667,&0&6,LT=PT-0&PQ-5
  33. */
Sure -- but the fields in each record would need to be all filled or all empty in the same places. And the format string can get a bit unwieldy -- unless you are only looking for the first several fields out of it.
"One of the methods used by statists to destroy capitalism consists in establishing controls that tie a given industry hand and foot, making it unable to solve its problems, then declaring that freedom has failed and stronger controls are necessary." --Ayn Rand
Reply With Quote Quick reply to this message  
Join Date: Jul 2004
Posts: 16
Reputation: reuben12 is an unknown quantity at this point 
Solved Threads: 0
reuben12 reuben12 is offline Offline
Newbie Poster

Re: File parsing in 'C'

 
0
  #9
Jul 14th, 2004
I guess sscanf wont work since i have fields with variable length delimited by ":"

%31[^:]

will it expect minimum 31 chars..i dont understand it

Can i use fprintf() to write straight to the file??

Since it is variable length fields i will have to clear the buffer for each record and memset would be expensive!!

I believe it wont write "\0" after each string ..right?

Using fprintf(), can i use "\n" for the succeding fields to be written to the next line?

I want the lines in my file to be separated by a carriage return and a line feed.
Is it the same as "\n"?????
Reply With Quote Quick reply to this message  
Join Date: Apr 2004
Posts: 4,335
Reputation: Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future Dave Sinkula has a brilliant future 
Solved Threads: 236
Team Colleague
Dave Sinkula's Avatar
Dave Sinkula Dave Sinkula is offline Offline
long time no c

Re: File parsing in 'C'

 
0
  #10
Jul 14th, 2004
Originally Posted by reuben12
I guess sscanf wont work since i have fields with variable length delimited by ":"

%31[^:]

will it expect minimum 31 chars..i dont understand it
No, it would prevent overflowing a 32-char buffer by writing a maximum of 31 characters plus the null.

Originally Posted by reuben12
Can i use fprintf() to write straight to the file??
Yes.

Originally Posted by reuben12
Since it is variable length fields i will have to clear the buffer for each record and memset would be expensive!!
Why? Overwriting strings make clearing the buffer(s) irrelevant.


Originally Posted by reuben12
I believe it wont write "\0" after each string ..right?
Wrong.

Originally Posted by reuben12
Using fprintf(), can i use "\n" for the succeding fields to be written to the next line?
Yes.

Originally Posted by reuben12
I want the lines in my file to be separated by a carriage return and a line feed.
Is it the same as "\n"?????
It is if that is how your system translates a newline in text mode.
"One of the methods used by statists to destroy capitalism consists in establishing controls that tie a given industry hand and foot, making it unable to solve its problems, then declaring that freedom has failed and stronger controls are necessary." --Ayn Rand
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC