Hi everybody.

Mi first post: I'm trying to learn some python

From CLI I run: python file.py

import sys

lines = 0
words = 0
chars = 0

for line in sys.stdin:
	lines = lines +1
	words = words +1
	chars = chars + 1

print lines, words, chars

I write some text from stdin/keyboard, but then ... how can I leave the stdin, so that the file can process the text I wrote? So far I have to quit the terminal ...
I hope I managed to describe the problem. Thanks for any help

Even if you iterate over the lines of a file (strange use of sys.stdin), your code does not make much sense. It might give you the number of lines, but surely not the number of words or characters this way.

Even if you iterate over the lines of a file (strange use of sys.stdin), your code does not make much sense. It might give you the number of lines, but surely not the number of words or characters this way.

Thanks you are right.

I corrected two lines like this using a variable instead of stdin and it works

words = words + len(line.split())
	chars = chars + len(line)

But still there remains the problem of the stdin

Hi gawain_,

Why are you reading input using stdin? Why not use one of python's input commands? If you're set on using stdin, you could try this:

import sys
inp = sys.stdin.readline()
while inp.strip() != "":
   print inp
   # Do your calculations right here
   inp = sys.stdin.readline()

Stolen from here. It doesn't make sense to run a for-loop through sys.stdin, as sys.stdin isn't a collection of anything.

Hope this helps!

I am not sure where you get that stdin notion from? Python has its own functions for input and file handling:

# write a test text file
jingle = """Dashing through the snow
In a one horse open sleigh
Over the fields we go
Laughing all the way
Bells on bob tail ring
Making spirits bright
What fun it is to ride and sing
A sleighing song tonight
"""

write_text = open("JingleBell.txt", "w")
write_text.write(jingle)
write_text.close()

# use the text file you have just created or
# pick a filename you have in your working directory
filename = "JingleBell.txt"

# bring the whole text in as a string
read_text = open(filename, "r")
text = read_text.read()
read_text.close()

# assume each line ends with a '\n'
num_lines = text.count('\n')
# assume each word is separated by whitespaces
num_words = len(text.split(None))
# assume all characters including newline
num_char  = len(text)

# show results
print "Number of lines", num_lines
print "Number of words", num_words
print "Number of characters", num_char

Thanks a lot.
Your answers have been quite useful. I don't mind writing from stdin; I'm just learning Python.
I eventually prefer to process files.
Just a couple of questions:

inp.strip() != "": this statement means that the newline \n is wiped out while it is not equal to!= a blank space ""?

split(None)) is the same as split(' ') ?

And if I were to find the end of a page (so that I can build an analytical index), which character should I write? \f, \ff, \form feed? I've been searching for it but I didn't manage to understand.

STill thanks for the help

Hi gawain_,

To answer your first question, you have it backwards. The conditional expression which controls the while loop is (inp.strip() != "") - a boolean expression, one which resolves to either true or false. Python handles this guy by first evaluating inp.strip(), which removes the newline from the end of the variable inp. Then Python compares the result to the empty string literal (""). If they are not equal (!=), Python resolves the whole expression to true and the while loop does another iteration, picking up a new inp. If they are equal after all, Python resolves the whole expression to false and the while loop stops, with the program picking up just after the loop body.

As for your second question, yes, that is the default behavior of split().

I'm not sure I understand what you're asking for the third question. What is a page? A paging system page? Like - an OS page? Or a page in some kind of formatted document?

Thanks for the answers G-Do

I'm not sure I understand what you're asking for the third question. What is a page? A paging system page? Like - an OS page? Or a page in some kind of formatted document?

As to the page I mean the page of a formatted document - for instance a pdf turned into a txt keeps the division in pages.
I'd like to get to the point ot be able to build an index like this:
word_1: 1,3,4,6,9
word_2: 4, 77, 190
...
where the numbers are the pages in which the word is written.
That's it

Thanks again.

Thanks G-Do, we were told by our instructor to always use split(None) rather than just split(). I imagine it is just style.

Hi gawain_,

Unfortunately I can't tell you how to do what you're asking without seeing the files you're working on. Python isn't magic - it can't figure out where page breaks are supposed to go unless there is some symbol or token in the file which gives them away. So, you need to google the file format to figure out what token denotes page breaks, then do something like this:

# Read all the text of the file into a single string
f = open("file.dat", "r")
text = f.read()
f.close()

# Use the split function to split text into pages
some_token = "INSERT ACTUAL PAGE BREAK SYMBOL HERE"
pages = text.split(some_token)

# Now, you have a list of page strings - print the first page
print pages[0]

You can also read PDF files using Python directly via a special module from ReportLab, which can be found here. This might actually be easier if you are just dealing with straight PDFs.

Hope this helps!

Thanks G-Do

Python isn't magic - it can't figure out where page breaks are supposed to go unless there is some symbol or token in the file which gives them away. So, you need to google the file format to figure out what token denotes page breaks,

I usually transform pdf file from shell like this: pdftotext -layout my_file.pdf
It changes the file into a txt one, keeping the original layout (as much as it can) and the page breaks as well. The page break token is FF (hexadecimal 0C).

Now I tried your script with \FF and \F, like this
pages = text.split("\FF")
but it prints the first word of the file (both \F and \FF).

Interesting the PDF module: sooner or later I'll try it, but ... as you can see these are my first steps in a language programming - apart from some html. (I do it on my own, just for fun. I've started browsing the posts of the forum - really helpful)

This article has been dead for over six months. Start a new discussion instead.