How to Extract url links to a url web address.

Hello everyone i hope all is well. I want to extract all link to a Url Address in a list.
for Example:
i have a url link : www.geynstuff.com
now we want to get all Url link in a listbox to this website page.
But do not use webbrowser

I hope everyone try to solve this problem.
Thanks

Recommended Answers

All 4 Replies

Open a socket on port 80 and send a standard HTTP header,
I'm sorry, I don't really code VB and responded from the main page, but the concept is the same, I'll give some equivalent C++ code.

This is the constructor of a class object:

urlsock(char * URL, char * PATH) : inpacket(1024) {
        buf = new BYTE[MAX_LEN]; // send buffer
        rec = new BYTE[MAX_LEN]; // recv buffer
        memset(&hints,0,sizeof hints); // empty the uninitialized set
        WSADATA wsadata;
        if( WSAStartup(MAKEWORD(2,2),&wsadata) != 0 ) return; // failure

        hints.ai_family = AF_UNSPEC; // IPv6 or IPv4 is fine
        hints.ai_socktype = SOCK_STREAM; // connected server (protocol defaults to TCP)

        if( getaddrinfo( URL, "80", &hints, &res ) != 0 ) return; // failure

        for(r=res;r;r=r->ai_next) {
            if( (sockfd = socket(r->ai_family, r->ai_socktype, r->ai_protocol)) == -1 ) // -1 error
                continue; // next res
            if( connect(sockfd, r->ai_addr, r->ai_addrlen) == -1 ) { // -1 error
                closesocket(sockfd);
                continue; // next res
            }
            break; // we connected!
        }
        if(!r) return; // no socket
        freeaddrinfo(res); // done with result list

        sprintf((char*)buf,
            "GET %s HTTP\1.1\r\n"
            "Host: %s\r\n"
            "Connection: close\r\n"
            "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain,q=0.8,image/png,*/*;q=0.5\r\n"
            "Accept-Language:en-us;q=0.5;en;q=0.3\r\n"
            "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n"
            "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1\r\n"
            "Referer: http://www.google.com/\r\n\r\n",
            PATH,URL);
        send(sockfd,(char*)buf,strlen((char*)buf),0);
        while(handle_recv());   // don't do anything, it's handled
        page_result.process(&inpacket);
    }

There's lots of stuff that doesn't matter to you in there, but if you look at the sprintf() command, you can see that I send a GET request for the URL.

Once you call recv(), you'll get the webpage as "plain text" (this is a simplification of truth).

You have to parse the text and search for tokens, in your case specifically, you're searching for <li and </li>

I'm sure you know how to handle strings, I don't mean to lecture you.

Thanks for reply
but i don't know c++ language
we need vb6 code.
we use already a method to extract webiste url links. but in this mehtod must be browse a url and open url website page. then we extract all links in listbox.
but it is not helpful .we want to extract website site all url link without browse website.

we send already example what we want to do ?

The language you use does not change HTTP packet syntax.
I have no idea how successful (or unsuccessful) you have been in reaching your objective.

If you have a specific question, ask it and include relevant code samples.

How about you use M$ HTML Object library and M$ WinHTTP Services

you can find it on VB6 Project > References

Use WinHTTP Services to GET documents source (HTML Source)
Then use HTML Object Library to parse it by passing it to HTML Document Object

Like this

Dim doc As MSHTML.HTMLDocument
Set doc = New HTMLDocument
doc.body.innerHTML = HTTPRequest.ResponseText

Then to get all anchor elements, you can do it like this

Dim iE_col As MSHTML.IHTMLElementCollection
Set iE_col = doc.getElementsByTagName("A")
...

Just find out the rest yourself, good luck.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.