easist way to find out # of lines in a text?

Question

conan19870619 0 Newbie Poster

15 Years Ago

what is the easist way find out the number of lines in a txt file?
there seems to be a function to ignore rest of the line...will that help?

c++

7 Contributors
13 Replies
186 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by Radical Edward

All 13 Replies

Duoas 1,025 Postaholic

15 Years Ago

Use the STL

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
using namespace std;

int main( int argc, char** argv )
  {
  if (argc < 2)
    {
    cout << "usage:\n  countlines FILENAME\n";
    return 0;
    }

  ifstream file( argv[ 1 ] );
  if (!file)
    {
    cerr << "Could not open file \"" << argv[ 1 ] << "\"\n";
    return 1;
    }

  file >> noskipws;
  cout << count(
            istream_iterator <char> ( file ),
            istream_iterator <char> (),
            '\n'
            )
          +1  // the last line is not terminated by NL
       << endl;

  return 0;
  }

Hope this helps.

Ancient Dragon commented: I like it -- although slightly obscure. +35

vijayan121 1,152 Posting Virtuoso

15 Years Ago

> cat my_file.txt | grep ^ | wc -l

Ancient Dragon 5,243 Achieved Level 70

15 Years Ago

> cat my_file.txt | grep ^ | wc -l

That doesn't satisfy the c++ requirement. If the op is writing shell scripts that would be perfect, but this isn't shell script programming.

Radical Edward 301 Posting Pro

15 Years Ago

+1 // the last line is not terminated by NL

But if the last line is terminated by NL, the count will be wrong. Sometimes it's better to be boring and conventional instead of clever and daring. ;)

#include <iostream>
#include <sstream>
#include <string>

namespace EdRules {
  using namespace std;

  int CountLines(istream& is)
  {
    int lines = 0;
    string s;

    while (getline(is, s))
      ++lines;

    return lines;
  }
}

int main()
{
  using namespace std;

  istringstream file1("line 1\nline 2\nline 3\nline 4");
  istringstream file2("line 1\nline 2\nline 3\nline 4\n");

  cout << EdRules::CountLines(file1) << '\n'
    << EdRules::CountLines(file2) << '\n';
}

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Duoas 1,025 Postaholic Featured Poster · Answer 1 · 2008-08-16T16:58:50+00:00

???

If there are N lines of text in a file, there are N-1 newlines. The code I posted returns the same number as vijayan's shell script.

[edit] And it doesn't play with dynamic memory for every line of text.[/edit]

mitrmkar 1,056 Posting Virtuoso · Answer 2 · 2008-08-16T17:29:05+00:00

If the file happens to be empty, then Ed really rules here ;)

iamthwee · Answer 3 · 2008-08-16T17:35:14+00:00

Define line:

Is the below:

fdsa
\n
\n
fdsf

4 lines or just two?

iamthwee · Answer 4 · 2008-08-16T18:03:47+00:00

How about the below? But then you might also want to consider if it should also ignore a line full of spaces. Again it all depends on your definition.

#include <iostream>
#include <fstream>
#include <string>
 
using namespace std;

class Foo
{
public:
  void bar()
  {
    ifstream file ( "testing.txt" );
    string line; 
    int lineCounter = 0;
    while ( getline ( file, line, '\n' ) )
    {
      if ( line.length() > 0 )
      {
        lineCounter++;
      }
    }
    cout << lineCounter << endl;
  }
};

int main()
{
  Foo test;

  test.bar();

  cin.get();
  return 0;
}

Radical Edward 301 Posting Pro · Answer 5 · 2008-08-16T18:54:30+00:00

If there are N lines of text in a file, there are N-1 newlines.

Unless the last line has a newline before EOF, then there are N newlines. Counting newlines isn't a good solution to the problem because of that very inconsistency, and you can't rely on the final newline not being there because sometimes the file has to conform to a standard format. For example, C++ requires every source file to end with a newline character.

The code I posted returns the same number as vijayan's shell script.

Then both solutions are wrong if you want them to be general. :( If Edward has a well defined C++ source file with 5 lines and your program says that there are 6 lines, there's no doubt that something is amiss. It's especially damaging if you want to use the count to manage memory and file access.

And it doesn't play with dynamic memory for every line of text.

So you use less memory to get the wrong answer in a common case.

Duoas 1,025 Postaholic Featured Poster · Answer 6 · 2008-08-16T21:13:16+00:00

I am really dismayed at the confusion here on how a plain-text file is organized. There is only one rule: A newline sequence [i]separates[/i] lines of text. (Typically, the file may also be understood to contain only newlines, horizontal tabs, and printable characters, but this is not always true; the file may contain binary data and/or other control characters specific to some application or device, such as a printer or an application's configuration data.)

In the Unix world, the newline sequence is just an ASCII LF.
In Windows, it is ASCII CR followed by ASCII LF.
In Mac, it is just an ASCII CR.
(On some old IBM dinosaurs, EBCDIC gives you a choice. :) )

In all cases, the presence of a newline sequence is to separate lines of text. The Unix terminology is telling: it is called a new line --meaning that it introduces a new line of text. Likewise in other old texts it is called a line break --again a separator.

While it is common (and convenient) to consider it a line terminator, it is not. It is a SEPARATOR.

So, let us list the possibilities:

The file does not exist: it has zero lines of text.
The file exists, but has zero length: it has one line of text. Granted, an empty line of text, but one never the less. (This by the definition of a plain text file --a file containing plain text. Since there are no restrictions on the length of the text, the existance of the file denotes the existance of text.)
The file exists, and contains characters other than the newline sequence: it again has one line of text, as there is no newline sequence to mark any boundry between lines of text.
The file exists, and contains a single newline sequence as the final character(s) before EOF: the file contains two lines of text. The first line is everything before the newline sequence. The second line is everything after the newline sequence. Either line may be empty.
The file exists, and contains N newline sequences. Remembering that every newline must be both preceded by and succeeded by a single line of text (by definition of a separator), the file must contain exactly N+1 lines of text.

And now to answer some postings:
iamthwee
> 4 lines or just 2?
Neither. It has three lines of text: "fdsa", "", and "fdsf", each separated by a newline.

mitrmkar
See point #2 above.

Ed
This is the first time I've ever seen your super-genious amiss.

Counting newlines is exactly the correct solution, for two reasons:
One, because there is no other way.
Two, because it is correct.

Text-file correctness is not defined by any one programming language's specification. By the way

2.1/1.2 "If a source file that is not empty does not end in a new-line
character, or ends in a new-line character immediately preceded by a
backslash character, the behavior is undefined".

It does not say that C++ files must end in newlines, but it says that failure to do so may cause undefined behavior.

Why is this the case? Because the ISO committee understands the structure of plain text files: they want the source files to end with a blank line of text. The reason is fairly straight-forward:
Suppose you have a list of includes:

#include "fooey.h"
#include "barnicle.h"
using namespace feathers;

Now imagine that neither "fooey.h" nor "barnicle.h" have a blank line at the end of their text, and that your compiler's preprocessor is too stupid (or properly instructed, maybe?) to insert a newline character after inserting the files' texts into your source. You'd get things like: }#include "barnicle.h" and #endifusing namespace feathers; ...which are definitely an error.

The ISO committee understands that inserting a file is an implicit concatenation of the first line with everything on the line before the #include directive, and of the last line with everything after the #include directive.

> So you use less memory to get the wrong answer in a common case.
No, I use less memory and a faster algorithm to get the correct answer in every case.

This is, and has always been, THE definition of a plain text file. I didn't make this up, and I am quite frankly surprised that I felt obliged to go through all this.

Addendum
The line count problem is a direct result of this confusion, and the Wikipedia article, if you care to look at it, makes specific mention of this confusion. While I am a fan of Wikipedia, the article does have one incorrect statement: "The general convention on most systems is ... to treat [it] as a line terminator." It is a convention in a lot of (poorly written -IMHO) software to treat newlines as line-terminators only, and some older systems had an EOL character instead of NL (and again IMHO the reason such dinosaurs are extinct is because EOL is far less convenient than NL and introduces subtle bugs on incorrectly-formatted data).

Remember, the most general solution is to count every line, and not just those preceding a newline. If there is an application-specific reason to treat things differently then that is a specialization (or non-general case).

Hope this helps clear up some confusion.

Radical Edward 301 Posting Pro · Answer 7 · 2008-08-17T00:51:54+00:00

While it is common (and convenient) to consider it a line terminator, it is not. It is a SEPARATOR.

If it's common to consider new-line a line terminator, programmers have to keep that in mind when writing code. That's all there is to the matter: it's used often, so programmers have to support it or successfully justify to clients why it isn't supported. Logical purity is a noble goal, but rarely has a place in the real world.

This is the first time I've ever seen your super-genious amiss.

Edward isn't wrong just because you don't agree. ;) In this case it seems we're both right, depending on the interpretation of what a new-line is.

It does not say that C++ files must end in newlines, but it says that failure to do so may cause undefined behavior.

If you don't want undefined behavior, source files must end in new-lines. It goes without saying that nobody wants undefined behavior, so it's safe to say that source files must end in new-lines. :D

Duoas 1,025 Postaholic Featured Poster · Answer 8 · 2008-08-17T07:39:26+00:00

Geez, Ed, are you trying to miss the point now? Or just pushing my buttons?

People can decide that a newline is only a terminator for their applications all they want, that doesn't make functional software. A person can believe the sky is orange, but that doesn't change the fact that it is also blue. You must consider both options when writing code.

If you only consider a line to be text terminating with a newline, you've missed half the possibilities in your code, and sooner or later someone is going to pass a file that has just one line (no newline) or no blank line at the end of the file (no newline at EOF).

(It is only common because it is convenient. Not the other way around. Console input is line-buffered --making it convenient input, and writing text files is convenient if you just stick a newline after every line of text you write.)

However, if you consider all possible lines of text, your code will work flawlessly with any text file, no matter how the newline was considered (or not considered) when writing it.

The only difference it will ever make is that a line-counting algorithm will tell you that you have one more line than you think you do. As I already noted, this difference is purely point-of-view. In this instance we are indeed both right. I am not wrong to count lines as (newlines+1).

To summarize sound software strategy: Read files counting newlines as separators. Write files using newlines as terminators. And don't multiply blank lines at the EOF. That's bullet-proof.

Watch how you draw conclusions too. Not wanting undefined behavior is not a causal predicate to "source files must end in newlines". It only indicates that "If you don't want undefined behavior then you must terminate source files with newlines." I know it feels like splitting hairs, but there is a significant difference.

To be clear though, I agree with you in that I think the standard should just say "source files must end in newlines", since it says nothing about how the preprocessor should handle the final line.

Personally, I very much like the STL streams classes, because they handle this very issue very nicely. For example, your method of counting lines returns a different number depending on whether the last line is empty --not whether or not it ends in a newline. getline() returns everything upto the newline or EOF, which means that if the last character before EOF is newline, it fails on the final (blank) line --essentially ignoring it. Bullet-proof.

(To all others reading, I realize that I'm preaching to the choir with Ed, but I'm long-winded to be clear about the entire consideration for your benefit. That and because I'm naturally long-winded :$ . Just don't get me started on EBCDIC... :scared: )

Funny how we like to pick-apart stupid stuff, no? :)

Radical Edward 301 Posting Pro · Answer 9 · 2008-08-17T17:48:27+00:00

Geez, Ed, are you trying to miss the point now? Or just pushing my buttons?

I was trying to avoid pushing your buttons by finding a compromise. But if you can't compromise, Ed doesn't feel strongly enough about the issue to make an enemy out of you.

So you're completely right and Edward is completely wrong, if that's what you want to hear. Ed concedes all points to you so we can get past this and move on to more important matters.

easist way to find out # of lines in a text?

Recommended Answers Collapse Answers

All 13 Replies

Recommended Answers