Hi guys, i'm trying to extract the data "Lady Gaga Fame Monster" from the html below using substr and find, but i wasn't able to retrieve the data.

<div class="album-name"><strong>Album</strong> > Lady Gaga Fame Monster</div>

I'm tried to extract the whole string first, but i can only extract till <strong>Album</strong> under the command cout << line_found , as there's spacing that prevents it from proceeding further.

I try cout << extract_line . I see no spaces in the extracted html code.

I tried the tutorial based from this http://www.cplusplus.com/reference/string/string/substr/, it works, even with spaces. I'm following closely but it stops extracting once it hit spaces. Pls help really appreciated. thanks. Figuring out 2 days without any solution.

here's the source code:

#include "parser.h"
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

int main() {

    string line_found, extract_line, result, finalResult="";
    int firstPosition, secondPosition, input, location;

    ifstream sourceFile ("cd1.htm"); // extracts from sourcefile

    while(!sourceFile.eof())
    {
        sourceFile >> extract_line;
        location = extract_line.find("album-name");
       // cout << extract_line;

       if (location >=0)
       {       
            line_found = extract_line.substr(location);
            cout << line_found << endl;
            firstPosition= line_found.find_first_of(">");

            result = line_found.substr(firstPosition);

       }
    }    
    return 0;
}

Recommended Answers

All 4 Replies

I hope you know that there are libraries already made for this. If you are doing
this just for practice then go ahead.

To get the whole string and not be limited by cin, use getline(cin,stringVariable) . This puts the whole line into the stringVariable. Work from there.

Didn't get it to work. Sorry i'm quite weak in the programming. I'm doing it for my assignment, I'm trying to extract the string from the html site hence i need to use matchpattern to find some unique pattern on website to locate it, and once the pattern is found, the string following would be selectively altered and final result be stored in a string and displayed as output.

The existing source code allows me to extract data strings without spaces, but doesn't work on string with spaces.

Instead of while(!sourceFile.eof()) , do this while(getline(sourceFile,extract_line)){ }

Now the extract_line should contain the whole line.

Instead of while(!sourceFile.eof()) , do this while(getline(sourceFile,extract_line)){ }

Now the extract_line should contain the whole line.

Thank you so much! It works.

I've tried create a classes and called matchPattern(). I keep getting segmentation fault. Tried using cin and cout, it would display result before displaying segmentation fault. Any idea? thanks.

parser.h

#ifndef PARSER_H
#define	PARSER_H

#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string.h>
#include <cstring>

using namespace std;

class Parser
{
public:
    Parser();
    string matchPattern();
    
private:
};

#ifdef	__cplusplus

#endif

#endif	/* PARSER_H */

main.cpp

#include "parser.h"
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

int main() {
    
    Parser P;
    P.matchPattern();

    return 0;
}
#include "parser.h"
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

Parser::Parser()
{

}

string Parser::matchPattern()
{

}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.