I have a bcp output file from sybase that i need to parse using 'c' and write to a new file.

A sample line from bcp o/p file is below

9890000501:74667:0:6::2:0000:0:6:0:5:0:0:0:0:0:0:0:::9890000501:1:1:0::::::::0:0:3::::::::::\202^B:\202^B^D:0:0
:1:0:0:0:0:0:0:0:0:0::::0:0:0:::::::::0::0:0:0:0:

The file has ":" has the delimitor.
I need to create a new file with a single line string as something like this

CUST:NAME=9890000501,HT=74667&0&6,LT=PT-0&PQ-5;

There will be thousands of subscribers.I also will need to get data from different bcp output files to get the data to be filled in this string.

The 'c' program needs to be fast.

I'm a novice with 'c'.

The only way i know is read char by char and check for ":" and then store the value into a char array.
I would have as many variables as there are in the string!!

What would be the fast way to do this?

Is there any function that would in one shot give me the values separated by a delimitor of my choice?

Which is the best way to build up the final string?

Can i create the fixed parts of the string and then just fill up values that need to be taken from the bcp output file??

please HELP!!!

Check out 'strtok()'; it will parse a string, stopping on one or more tokens; in this case your ':'.

The strings you parse could be referenced by an array of string pointers:

const char* foundThings[30]; // or however big

while (going)
{
    foundThings[n] = strtok( inputString, ":" );
    if (foundThings[n] == NULL)
        going = false;
    else
        n++;
}

now all parsed strings are in the foundThings[] array.

WOW! thats neat.
Thank you!!

A couple of things!!

1) I read the man page for strtok.It says that it returns a (char *) pointer back.

Suppose i have a fixed line in which i want to fill up values from foundthing array.

for example ,

CUST:NAME=<value from array>,TP=<next value from array>&<next value from array>;

This values would be of variable length.

In perl i would write it in a single line with the '.' operator.

Is there a similar way here or i should store the fixed strings in a char array and do 'strcat' with each value.

The ouput file would have thousands of lines as above.

Example,

char S1[100]="CUST:NAME=";
strcat (S1,foundThing[0]);
strcat(S1,",TP=");
strcat(S1,foundThing[1]);

and so on.

Is there a faster way to do this work since efficiency is very important(reason to choose 'c' over perl)

You may want to try fgets/sscanf. Maybe something like this.

#include <stdio.h>
int main(void)
{
   const char filename[] = "file.txt";
   FILE *file = fopen(filename, "r");
   if ( file )
   {
	  char line [ BUFSIZ ];
	  while ( fgets(line, sizeof line, file) )
	  {
		 char substr[32], *ptr = line;
		 int n;
		 fputs(line, stdout);
		 while ( *ptr )
		 {
			if ( sscanf(ptr, "%31[^:]%n", substr, &n) == 1 )
			{
			   ptr += n;
			   puts(substr);
			}
			else
			{
			   puts("---empty field---");
			}
			++ptr;
		 }
	  }
   }
   else
   {
	  perror(filename);
   }
   return 0;
}

Then, if this is too slow, I'd look into other faster techniques.

Well, generally the fastest way to do it is to loop through the string in a while loop, not relying on standard string routines. Some string library functions may be implemented in assembler on some platforms, generally printf/scanf/strtok and the like aren't.

the non-standard-library parser could be something like this:

// parse the source line into tokens[], returns the number of tokens found.
// note that some tokens may be empty.
int ParseLine( char* theSourceLine, const char* tokens[], int maxTokens )
{
    currentToken = 0;
    tokens[currentToken] = theSourceLine;

    while (*theSourceLine)
    {
        if (*theSourceLine == ':')
        {
            *theSourceLine = 0;  // null terminate this string
            currentToken++;
            if (currentToken >= maxTokens) return currentToken;  // reached the limit; maybe return -1?
            tokens[currentToken] = theSourceLine;  // next token starts here
        }
        theSourceLine++;
    }
    return currentToken;
}

And then, if it were me, I'd have an array of the static strings to build up the final string:

static const char* finalStringConstants[MAX_FINAL_CONSTANTS] =
{
    "CUST:NAME=",
    "HT=",
    "LT=",
    <etc>
};

char outputLine[500];  // make it big enough
outputLine[0] = 0;      // null terminate it to start
for (i = 0; i < MAX_FINAL_CONSTANTS; i++)
{
    strcat(outputLine, finalStringConstants[i]);
    strcat(outputLine, tokens[i] );
}

If that's not fast enough, you could store the token lengths when you are parsing them and use memcpy() rather than strcpy() in the final loop.

My highlighing in red:

Rob Pike, a leading expert on applying the C programming language, offers the following "rules" in Notes on Programming in C as programming maxims (but they can be easily viewed as points of a Unix philosophy):

  • Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
  • Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
  • Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)
  • Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
  • Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
  • Rule 6. There is no Rule 6.

Pike's rules 1 and 2 restate Donald Knuth's famous maxim, "Premature optimization is the root of all evil." Ken Thompson rephrased Pike's rule 4 as "When in doubt, use brute force." Rule 5 was previously stated by Fred Brooks in The Mythical Man-Month.

http://en.wikipedia.org/wiki/Unix_philosophy

Thanks for the solutions!!

Can we use sprintf() to format the string at the end instead of strcat()?

I can use multiple "%s" to pick from the static string and the parsed output strings
and write to a buffer in one shot.

This buffer can be written to the file.

Is this the best way to go??

Do you something mean like this?

#include <stdio.h>

int main(void)
{
   const char filename[] = "file.txt";
   FILE *file = fopen(filename, "r");
   if ( file )
   {
	  char line [ BUFSIZ ];
	  while ( fgets(line, sizeof line, file) )
	  {
		 char name[16], ht[8], output[128];
		 int  a,b,c,d;
		 if ( sscanf(line, "%15[^:]:%7[^:]:%d:%d::%*d:%*d:%*d:%*d:%d:%d",
					 name, ht, &a, &b, &c, &d) == 6 )
		 {
			snprintf(output, sizeof output,
					 "CUST:NAME=%s,HT=%s,&%d&%d,LT=PT-%d&PQ-%d",
					 name, ht, a, b, c, d);
			puts(output);
		 }
	  }
   }
   else
   {
	  perror(filename);
   }
   return 0;
}

/* my output
CUST:NAME=9890000501,HT=74667,&0&6,LT=PT-0&PQ-5
*/

Sure -- but the fields in each record would need to be all filled or all empty in the same places. And the format string can get a bit unwieldy -- unless you are only looking for the first several fields out of it.

I guess sscanf wont work since i have fields with variable length delimited by ":"

%31[^:]

will it expect minimum 31 chars..i dont understand it :(

Can i use fprintf() to write straight to the file??

Since it is variable length fields i will have to clear the buffer for each record and memset would be expensive!!

I believe it wont write "\0" after each string ..right?

Using fprintf(), can i use "\n" for the succeding fields to be written to the next line?

I want the lines in my file to be separated by a carriage return and a line feed.
Is it the same as "\n"?????

I guess sscanf wont work since i have fields with variable length delimited by ":"

%31[^:]

will it expect minimum 31 chars..i dont understand it :(

No, it would prevent overflowing a 32-char buffer by writing a maximum of 31 characters plus the null.

Can i use fprintf() to write straight to the file??

Yes.

Since it is variable length fields i will have to clear the buffer for each record and memset would be expensive!!

Why? Overwriting strings make clearing the buffer(s) irrelevant.

I believe it wont write "\0" after each string ..right?

Wrong.

Using fprintf(), can i use "\n" for the succeding fields to be written to the next line?

Yes.

I want the lines in my file to be separated by a carriage return and a line feed.
Is it the same as "\n"?????

It is if that is how your system translates a newline in text mode.

Thanks.

Then i can use sscanf() instead of strtok()

If i use fprintf() and i use multiple "%s" arguments to create the string(buffer) to write to the file,
Will the file have a "\0" after each string or just the contents of the string will be present??

The program is to run on a SUN netra server.
I dont know how it interprets a "\n"

The ASCII text file needs to have each line separated by a carriage return and a line feed..

While opening a file using vi or other Unix editor how can we make sure that the END of line has a carriage return and a line feed???

If i use fprintf() and i use multiple "%s" arguments to create the string(buffer) to write to the file,
Will the file have a "\0" after each string or just the contents of the string will be present??

Just the contents of the string will be present.

The program is to run on a SUN netra server.
I dont know how it interprets a "\n"

In Unix, a newline is just the '\n'.

The ASCII text file needs to have each line separated by a carriage return and a line feed..

This strikes me as somewhat odd since you mention Unix.

While opening a file using vi or other Unix editor how can we make sure that the END of line has a carriage return and a line feed???

Use "\r\n" if that is what you need.

Actually the file generated on solaris is to be sent to a PC.

I tried a "\015\n" and it works fine :)

Thanks!!!

Actually the file generated on solaris is to be sent to a PC.

I tried a "\015\n" and it works fine :)

Thanks!!!

Why not use '\r' instead of '\015'? While they may both resolve into the same character for ASCII, the '\r' version is also portable to other systems. And I would think it is easier to understand the meaning of '\r' as opposed to '\015'.

yea..i should do that..Thanks..

Do you know whether there will be a difference in performance between using

fread/fwrite functions and fgets/fprintf??

The file to be read and written is a text file.

No, I don't really know of any performance issues.

If it runs too slow, profile the code to find bottlenecks. If the file I/O is too slow, you'd likely need to used something platform-specific instead of the standard functions anyways.

I have a file with lines as below

fname:lname:1::2:3:1234:123

Note the space in the first position of line,an empty field denoted by "::{"
and rthe final field has no delimiter in the end just the end of line.

Using scanf how can i read these individual fields into char array.

can you please tell me what format specifier to use.

Earlier it was mentioned "%31[^:]%n"
Can somebody please explain this format???

31 is the maximum number of bytes for the field.
Whats [^:]
whats %n..shouldnt i use %s???

If possible,also please point to a online tut or man that explains the format specifiers .

>Earlier it was mentioned "%31[^:]%n"

Can somebody please explain this format???

>Whats [^:]

The %31[^:] specifier means to read up to 31 non-colon characters into a string.

[edit]Check this out, too[/edit]

>whats %n..shouldnt i use %s???

No. The %[ specifier works like a string. The %n is a completely different specifier that fills an integer with the number of characters actually read (into the string).

>If possible,also please point to a online tut or man that explains the format specifiers .

Here is a man page.


>I have a file with lines as below
>fname:lname:1::2:3:1234:123

Try something like this.

I dont need the %n

It works great.Thanks!!

How do i remove the initial space from the first string.

Is there a way i can do it in sscanf itself?

If not, is there a function in c like `basename`

>How do i remove the initial space from the first string.

If you know that it will always be there, you can simply begin the format string with %*c to read and discard it.

sscanf(buf,"%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]:%[^:]",name,....


When it encounters a "::" in the buffer it stops.

Cant i make it assign NULL if a field is empty or is strtok() the way to go??

>When it encounters a "::" in the buffer it stops.

Try something like this.

Empty fields are handled in the code from this link. And that is part of the reason why we needed the %n.

In that example, they overwrite the same char array.

I need to store each field extracted and then do some processing and later write it to a file.

Then loop again and pick up the next set of fields and do processing and write to a file.

The problem is subsequent loops if any field had no value extracted it takes the previous row's value.(last time stored ).

I cant memset each field before pass through each row.(too expensive)

I gave a char *ptr[13]...and assign &ptr to the sscanf()
i is the loop variable.

I figured ptr will point to each field in the buffer itself but its giving garbled output.

I think i need to take care of NULL values properly.

Let me try it out again :)

Tried strtok(0 but it doesnt handle empty fields :(

I gave a char *ptr[13]...and assign &ptr to the sscanf()
i is the loop variable.

Has this array of pointers been properly allocated memory? And then the field actually copied (via strcpy or similar) into the memory to which the pointer points?

Is it possible to post a minimal (100 lines or less) test code and several lines of the sample input?

[edit]Oops. I forgot about the sample you already posted.

char buf[MAX_LINE_LEN+1];   /* input buffer.          */
char buf1[MAX_LINE_LEN+1];   /* input buffer.          */
char *ptr=buf;
char *subs[13];
while (fgets(buf, MAX_LINE_LEN+1, fpsub))
    {
     ptr=buf;
     for(i=0;i<12;i++)
     {
       if(sscanf(ptr,"%[^:]%n", &subs[i], &n) == 1 )
           {
              ptr += n + 1;
           }
           else
           {
              ++ptr;
           }

     }
    sprintf(buf1,"lname=%s,fname=%s,TP=TR-%s&AT-%s&AC-%s,nb=%s;\r\n",
                   &subs[0],&subs[1],&subs[2],&subs[3],&subs[4],&subs[5]);
    fwrite(buf1,strlen(buf1),1,fpexp);

sample Input

123456:789564::12:23:23:12:12::
3245:12345:32:23:re:1234:1234:432::

Edited 3 Years Ago by Dani: Formatting fixed

i just have an array of pointers..
Wont sscanf() make them point to my input buffer??

If i had a char array for each field and input them to sscanf()
i need to clean the individual buffers for each row!!

>i just have an array of pointers..

That's why you get garbage...

>Wont sscanf() make them point to my input buffer??

Because sscanf copies from the source buffer into the string you specifiy (which is apparently just a dangling pointer).

>If i had a char array for each field and input them to sscanf()
>i need to clean the individual buffers for each row!!

Why do you think so?

char subs[500];
char buf[MAX_LINE_LEN+1];   /* input buffer.          */
char buf1[MAX_LINE_LEN+1];   /* input buffer.          */
char *ptr=buf;


/* copy source file to target file, line by line. */
    while (fgets(buf, MAX_LINE_LEN+1, fpsub))
    {
     ptr=buf;

     for(i=0;i<441;i+=40)
     {
       if(sscanf(ptr,"%[^:]%n",&subs[i], &n) == 1 )
           {
              ptr += n + 1;
           }
           else
           {
              ++ptr;
              strcpy(&subs[i],"NULL");
           }
     }

    sprintf(buf1,"lanem=%s,fname=%s,TP=AT-%s&AW-%s&AT-%s,nb=%s;\r\n",
                   &subs[0],&subs[40],&subs[80],&subs[120],&subs[160],&subs[200]);
    fwrite(buf1,strlen(buf1),1,fpexp);

This code works but is there a better way to do this job.

Edited 3 Years Ago by Dani: Formatting fixed

Note that in the else part of parsing that i strcpy a string "NULL".

Otherwise it takes the previous rows value...

I think sscanf() when it sees an empty field it doesnt copy anything...
Not even a NULL.

In the data processing i can check for the string"NULL" and proceed.

Still this big buffer seems to be a clumsy way of doing this....

Is there a better way???

I think i can do a


char lname[12];
char fname[15];
char start[5];
char *ptr={&lname,&fname,&start......}

I guess i should have the minimum array as '5' so that i can copy NULL..
let me try it...

How about something along this line.

/* file.txt
123456:789564::12:23:23:12:12::
3245:12345:32:23:re:1234:1234:432::
*/
#include <stdio.h>
int main(void)
{
   const char filename[] = "file.txt";
   FILE *file = fopen(filename, "r");
   if ( file )
   {
	  char line [ BUFSIZ ];
	  size_t i;
	  while ( fgets(line, sizeof line, file) )
	  {
		 int n;
		 char substr[6][10], *ptr = line;
		 for ( i = 0; i < sizeof substr / sizeof *substr; ++i )
		 {
			if ( sscanf(ptr, "%9[^:]%n", substr[i], &n) == 1 )
			{
			   ptr += n;
			}
			else
			{
			   substr[i][0] = '\0';
			}
			++ptr;
		 }
		 printf("lname=%s,fname=%s,TP=TR-%s&AT-%s&AC-%s,nb=%s;\n",
				 substr[0],substr[1],substr[2],substr[3],substr[4],substr[5]);
	  }
   }
   else
   {
	  perror(filename);
   }
   return 0;
}

/* my output
lname=123456,fname=789564,TP=TR-&AT-12&AC-23,nb=23;
lname=3245,fname=12345,TP=TR-32&AT-23&AC-re,nb=1234;
*/

Note that I had to change the "\r\n" to see the output.

This article has been dead for over six months. Start a new discussion instead.