So, i've been redesigning the program I've been making to make reusable functions, rather than copy/pasting almost identical code to several different functions (which is what I would have ended up doing).

These upcoming sections of code are functions designed to, on a button press, navigate to a URL, scrape the HTML to find a value. The value being the last page number of an element which stores messages, 10 per page, so 25 pages is equal to between 241-250 messages. Once the program knows how many pages there are, it will then proceed to 'tick' all the checkboxes on one page, delete them, and then repeat, for as many pages as there were when it began. To facilitate this I will be using a 'for' loop, although maybe in this case a 'while' loop may be better, input on this would be appreciated.

Here is the code, preceded by the name of the file each part is in. Please take note of the commented sections, I have included pertinent information along with the code. And note that some of this code is borrowed from an online tutorial that I have heavily modified, and don't fully understand, so please be nice.


// CSnipRobotDlg dialog
class CSnipRobotDlg : public CDialogEx
// Construction
    CSnipRobotDlg(CWnd* pParent = NULL);    // standard constructor
    void* GetElemFromID_Name(LPWSTR szID, IID nTypeIID);
    int GetLastPageNumber(LPWSTR sUrl/*, something?*/);

    // Rest of file.


// Returns an integer holding a value of the number of the last page.
int CSnipRobotDlg::GetLastPageNumber(LPWSTR sUrl/*, something?*/)
    //----- Navigate to URL supplied in argument, check readystate of ActiveX control.
    m_ctlBrowser.Navigate(sUrl, 0,0,0,0);

        while (m_ctlBrowser.get_ReadyState() != READYSTATE_COMPLETE) {
        DelayMs( 100 );
    //----- Page is now ready to process, capture the HTML.
    IHTMLDocument2* pDoc= (IHTMLDocument2*)m_ctlBrowser.get_Document();
    IHTMLElement* pElemBody= NULL; // Why define as NULL @ decleration? In the next line it's re-defined...
    pDoc->get_body(&pElemBody); // PElemBody is now holding the HTML.

    //----- I'm really not sure why this is here, apart from making sBody = pElemBody, via scenic route.
    CComBSTR bstrHTML;
    CString sBody= bstrHTML;  // Easier to work with as a CString. Why is it easier to work with CString?
    //MessageBox(sBody);  // Debug, shows the string as a message.

    //----- Zero-in on the data of interest.
    int nOffset= sBody.Find(L"Snip");
    CString sTmp= sBody.Mid(nOffset); // Remove the preceding HTML.
    //MessageBox(sTmp);  // Debug, shows the string as a message.

    //----- Isolate last page number, save it to CString variable.
    int nStart= sTmp.Find(L"Snip")+Snip;
    int nEnd=   sTmp.Find(L"Snip")-Snip;
    CString sLastPageNumber= sTmp.Mid(nStart, nEnd-nStart);
    //MessageBox(sLastPageNumber);  // Debug, shows the string as a message.

    //----- Convert string holding the value we need into an integer, for future mathematical/logical use.
    int iConvertedLastPageNumber = _wtoi(sLastPageNumber.GetBuffer());
    // I would really like a way of showing this int in/as a message, to prove it converted successfully.

void CSnipRobotDlg::OnBnClickedDeleteMessages()
    int pWhatIsTheNamingConventionHere= GetLastPageNumber(L"http://Snip"/*, something?*/);
    // Some more code to get the last page number here and to be useful.

    int iFoo; //Ignore this line, imagine iFoo == iTheIntegerImGettingFrom_GetLastPageNumber_

    for (int iTheIntegerImGettingFrom_GetLastPageNumber_ = 6; iFoo >= 0; iFoo--) // Should execute 6 times?
        while (m_ctlBrowser.get_ReadyState() != READYSTATE_COMPLETE) 
        // Mark the boxes, cannot mark 1-6 due to re-used ID's (I'll get you yet *shakefist*)
        void* pElement= GetElemFromID_Name(L"n7", IID_IHTMLInputElement);
        IHTMLInputElement* pCheckBoxSeven= (IHTMLInputElement*)pElement;

        pElement= GetElemFromID_Name(L"n8", IID_IHTMLInputElement);
        IHTMLInputElement* pCheckBoxEight= (IHTMLInputElement*)pElement;

        pElement= GetElemFromID_Name(L"n9", IID_IHTMLInputElement);
        IHTMLInputElement* pCheckBoxNine= (IHTMLInputElement*)pElement;

        pElement= GetElemFromID_Name(L"n10", IID_IHTMLInputElement);
        IHTMLInputElement* pCheckBoxTen= (IHTMLInputElement*)pElement;
        //------------------------------ now click the "Delete" button 
        pElement= GetElemFromID_Name(L"del", IID_IHTMLElement);
        IHTMLElement* pDelete= (IHTMLElement*)pElement;

        pDelete->click(); // GO!
    }   // It worked, once. It ticks all the boxes I've told it to (I'll get to ticking all 10 later).

The loop section of 'OnBnClickedDeleteMessages' will also probably end up as an autonomous and reusable fuction, leaving the only unique part of 'OnBnClickedDeleteMessages' the arguments it supplies to those functions. This seems like a logical design to me (a beginner).

            pElement= GetElemFromID_Name(L"n10", IID_IHTMLInputElement);
            IHTMLInputElement* pCheckBoxTen= (IHTMLInputElement*)pElement;

This is an example of getting a variable from another function, so I would have liked to be able to 'get it' myself but I can't understand whether or not I'm making a mistake with the arguments.

Some answers (not to all the questions though)...


Line 12: You should always try and initialise variables to a known value. In this case, after get_body has returned, you should check pElementBody != NULL.

Line 22: You should check that "Snip" is actually found in the string.

Line 34: You could just print it to the terminal:

std::cout << iConvertedLastPageNumber << std::endl;

Line 45: How does this get it's value?

Lines 54 - 68: You can use stringstreams for this:

for ( size_t i = 7; i < 11; ++i )
    std::wstringstream ssID;
    ssID << L"n" << i;
    void* pElement= GetElemFromID_Name( ssID.str().c_str(), IID_IHTMLInputElement);
    IHTMLInputElement* pCheckBoxSeven= (IHTMLInputElement*)pElement;

This will do all the cases from 7 - 10

Thanks ravenous

So, after line 12 I know that the variable should either be Null OR the HTML I'm interested in. So that means I can write some logic like:

if variable != to Null > continue, if variable = to Null > throw error with useful output. I'll write up an error handling function to link it to.

Seems like a good way to handle important variable in your code thanks for making that apparent to me.

Line 22: I know for a fact that my screen scraping is successful, i've got to the point where I have removed the 'debugging' MessagebBox lines from the code. Up until now I've been checking the value of my strings after every operation.

Line 34: I will try that immediately, thankyou, I really had no idea how a console would work alongside my MFC application.
Edit: Not working, error - 'Namespace std has no member cout'. If I remove the std:: part it just says 'cout is undefined'

Line 45: iFoo is == iTheIntegerImGettingFrom_GetLastPageNumber_.
It's just there so my for loop fit on one line and both of those placeholder variables are there in lieu of the variable being returned from GetLastPageNumber()

Line 54-68: Yea, I knew these lines of code were very unwieldy and were definately earmarked for a rewrite, thanks so much for giving me a way of doing it. I was just siting and scribbling on my notepad about how I'd go about making that into a useful function. It will probably end up having another function within that called GetAmountOfMessagesOnPage(or similar), so I dont try to tick 10 boxes when there are only 8. As they say, if you fail to plan, you plan to fail.

Thanks again, Alochai

// Returns an integer holding the value of the amount of pages.//
int CRobotDlg::GetLastPageNumber(LPWSTR sUrl)
    //----- Navigate to URL supplied in argument, check readystate of ActiveX control.
    m_ctlBrowser.Navigate(sUrl, 0,0,0,0);
    while (m_ctlBrowser.get_ReadyState() != READYSTATE_COMPLETE)

    //----- Page is now ready to process, capture the HTML.
    IHTMLDocument2* pDoc = (IHTMLDocument2*)m_ctlBrowser.get_Document();

    //----- Test for errors in internet coonection using pDoc.
    if (pDoc = NULL)
        // Would this be overkill? I'd need to declare pDoc differently, I think.
    IHTMLElement* pElemBody = NULL;
    pDoc -> get_body (&pElemBody); // pElemBody is now holding the HTML.

    //----- Test for errors in handling the variable, pElemBody
    if (pElemBody = NULL)
        // Finish error handling.

    //----- I'm not exactly sure what this part is doing.
    CComBSTR bstrHTML;
    pElemBody -> get_innerHTML(&bstrHTML);
    CString sBody = bstrHTML;  // Easier to work with CString in MFC.

    // Any errors from this point or beyond would be conversion errors...

    //----- Zero-in on the data of interest.
    int nOffset = sBody.Find(L"last href");
    CString sTmp = sBody.Mid(nOffset); // Remove the preceding HTML.

    //----- Isolate last page number, save it to CString variable.
    int nStart = sTmp.Find(L"last href")+50;
    int nEnd = sTmp.Find(L"<IMG")-2;
    CString sLastPageNumber = sTmp.Mid(nStart, nEnd-nStart);
    MessageBox(sLastPageNumber);  // Debug, shows the string as a message.

    //----- Convert string holding the value we need into an integer.
    int iConvertedLastPageNumber = _wtoi(sLastPageNumber.GetBuffer());


This is the final version of this function, I believe. All I need is to handle errors, is it best to logically check for errors after every operation? Or just ones that have a larger risk of failing, like getting documents from the internet? There is one section of this code I dont really understand, I can read it and see what it's doing, but I don't understand why it has to be done. Can anyone enlighten me?

There are three possible error types in this function that I could see possibly happening; No internet connection, failure to retreive the document, and conversion errors.

I think I can research and write errorhandling techniques, but is this a reasonable amount of detail? This level of detail in every function greatly increases workload, I'm not sure whether it would significantly affect anything else.