0

I am using turbo c.
I have a 2mb text file "tryy.txt" , so what i am trying to do is ,i am scarping the urls from the text file and saving it into another file.

Now, after running the program, i get a meassage as "Null Pointer Assignment" on the output screen.When i manually open the rseult.txt file, then i find that i have scarped the urls succesfully, but only some 10-15 url's are scraped and not all the url's are scraped.
Here is my program:

#include<stdio.h>
#include<conio.h>
void main()
{
char c,d;
char e='\n';
FILE *fp,*fs;
fp=fopen("tryy.txt","r");
while((c=getc(fp))!=EOF)
 {
  if(c=='<')
   {
    fseek(fp,1,SEEK_CUR);
    d=getc(fp);
    if(d=='o')
     {

	     fseek(fp,2,1);
	     fs=fopen("result.txt","a");
	     while((c=getc(fp))!='<')
		   putc(c,fs);
	     fseek(fp,4,1);
	     putc(e,fs);

       }
     }
   }
fclose(fs);
fclose(fp);
getch();
}

and here is the url sample i am scarping.
<loc>http://www.gstatic.com/s2/sitemaps/sitemap-00000000.txt</loc>
So, what i have done is while scanning the file i check for the occurance of '<',then i move the file pointer by 1 position.Then i store the value in d variable, if it is equal to 'o' then i again move my file pointer and write the contents of the file till i get the next '<'.
So what should i do to remove the null poiner assignment and scrape all the url's succesfully.

Edited by hszforu: n/a

2
Contributors
1
Reply
3
Views
5 Years
Discussion Span
Last Post by WaltP
0
#include<conio.h>  // You don't need this old non-standard header
void main()        // main is an INT function -- always has been
getch();           // What's wrong with the completely standard getchar()?
}

So, what i have done is while scanning the file i check for the occurance of '<',then i move the file pointer by 1 position.

Why? Skipping 1 character using fseek() is so much more confusing that reading a character.

Then i store the value in d variable, if it is equal to 'o' then i again move my file pointer ...

And again, why?

Is there something wrong with
1) reading an entire line
2) looking for <loc>
3) if found, look for </loc>
4) write everything between

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.