First time posting, this seems like a wonderful community.

Okay, so I'm new to C++ (mostly worked with java) and I'm trying to write a program that will read in a plain text file (a short paragraph) and take and place each word in the text file and place it into a data structure (im thinking of using a map or multimap).

My big concern at this point is what function of ifstream should I use to read in each separate word. I'm guessing using whitespace and periods as delimiters as the parameters, but how do I use both delimiters at the same time. I'm not too concerned about storing the data at this point, as I can't even make any headway on that until I can figure out a proper way to read in each word properly.

Any help is greatly appreciated :)

Recommended Answers

All 9 Replies

very easy to read just words

#include <string>
#include <fstream>
using namespace std;

int main()
{
    ifstrem in("filename");
    string word;
    while( in >> word )
    {
          // do something with this word
    }
}

>> I'm guessing using whitespace and periods as delimiters as the parameters,
>> but how do I use both delimiters at the same time.
as dragon has shown, using whitespaces alone as delimiters is very easy; this is the default in c++.
to also use period, ; , :, ? etc as delimiters to separate words, you could read one line of text from the file at a time and parse that line into words.
reading a text file line by line is easy enough; just use std::getline
to break each line into words, you could parse it yourself or (much easier) use a library eg. boost.tokenizer http://www.boost.org/libs/tokenizer/introduc.htm

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <fstream>

void process_line( const std::string& line )
{
  boost::tokenizer<> tokenizer( line ) ;
  typedef boost::tokenizer<>::iterator iterator ;
  for( iterator iter = tokenizer.begin() ;
           iter != tokenizer.end(); ++iter )
         std::cout << *iter << '\n' ;
}

int main()
{
  std::ifstream file( "whatever.txt" ) ;
  std::string line ;
  while( std::getline( file, line ) ) process_line(line) ; 
}

The code above separate the file by '\n', than by the delimiter.

Is it possible to separate the file only by delimiter?

Eg:

A
B@
C@D

Separate by @
=>
"A\nB"
"\nC"
"D"

Thank you.

yes -- getline has a third optional parameter that is the deliminator. This should work, but I haven't tested it. getline(fin, line, '@');

Thank you, Dragon for the quick reply.

Sorry I didn't describe my problem clearly.
Actually, I am hoping to use a string delimiter instead of a char.

Like this:

@
@A
@B
@
@C
DE

Separate by "@\n@"
=>
"A\nB\n"
"C\nDE"

:)

I used to write Perl, which can use $/="@\n@" to do so, but can't find the same in C++

I'm still not sure what you want -- "@\n@" is the line terminator ? Instead of just '\n'? So that '\n' all by itself is not a line termiator. The only way I can think of at the moment to do that is to read the whole file as binary file into a character buffer then parse it yourself.

Someone once mentioned that perl scripts can be called from c++, but I don't know how useful that would be to you.

you can call Perl scripts from C/C++, but it's not particularly useful if you plan on distributing your code to any other machines.

because you'll either need the full Perl installation (plus any obscure modules you might use) installed on (or networked to) every target machine, or you'll need a Perl Development Kit ($$) to build super-freaking-huge EXE files. either option is kind of gross.

If you really want Perl's powerful regex functionality, use the Regex++ engine for C++. It pretty awesome.

http://www.ddj.com/184404797


.

commented: Thanks I did not know about regex++ +3

You know that you can getline() with any character as delimiter, so why not use that to test for a potential separator sequence?

#include <iostream>
#include <string>
#include <vector>
using namespace std;

vector<string> read_lines( istream&ins, const string& separator )
  {
  vector<string> lines;

  string line;
  char   head = separator[ 0 ];
  string tail = separator.substr( 1 );

  while (getline( ins, line, head ))

    if (line.compare( 0, tail.length(), tail ) == 0)

      // we've found a separator. split lines
      if (lines.size() == 0) lines.push_back( line );
      else                   lines.push_back( line.substr( tail.length() ) );

    else
      // oops! not a separator!
      if (lines.size() == 0) lines.push_back( line );
      else                   lines.back() += string( head ) +tail;

  return lines;
  }

Using it then is elementary:

#include <fstream>

...

ifstream file( "fooey.txt" );
if (!file) complain();

vector<string> lines = read_lines( "@\n@" );

file.close();

for (unsigned i = 0; i < lines.size(); i++)
  cout << i << ":" << lines[ i ] << endl;
cout << "done.\n";

Hope this helps.

This may not be the best way to seperate each word, but heres how I managed it.
You could read all the text out of the file first and then manually retrieve each word.

#include<iostream>
using namespace std;

#pragma warning(disable : 4018)

char *SubStr(char *text, int beg, int end) {
	register int len = end - beg;
	register char *cut = new char[len];
	memcpy_s(cut, (rsize_t)len, &text[beg], (size_t)len);
	cut[len] = '\0';
	return cut;
}

inline bool TextContains(char *txt, char ch) {
	while (*txt) if (*txt++ == ch) return 1;
	return 0;
}

unsigned int WordCount(char *text, char *gaps) {
	register bool t = 0, ot = 0;
	register int wc = 0;
	while (TextContains(gaps, *text++));
	while (*text) {
		t = TextContains(gaps, *text++);
		if (t != ot) wc++;
		ot = t;
	}
	return (wc/2)+1;
}

string *GetAllWords(char *text, char *gaps) {
	int NumWords = WordCount(text, gaps);
	string *words = new string[NumWords];
	register unsigned int sc = 0, ec = 0, g = 0;
	for (unsigned int i = 0; g < NumWords; i++) {
		while (text[i] && TextContains(gaps, text[i])) i++;
		sc = i;
		while (text[i] && !TextContains(gaps, text[i])) i++;
		ec = i;
		words[g++] = SubStr(text, sc, ec);
	}
	return words;
}

int main() {
	char str[] = "zero:;one#@two([three ==four";
	char gaps[] = " :;#@([=";
	string *words = GetAllWords(str, gaps);
	for (int i = 0; i < WordCount(str, gaps); i++) {
		cout << words[i].c_str() << '\n';
	}
	cin.ignore();
	return 0;
}
commented: good work :) +29
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.