Hi, I'm new to Python and have a task of reading a user input text file that is tab-delimited and contains 4 columns in each line: Authors, Year, Title and Journal.

I currently am just able to open a file, and now I don't know how to begin parsing the data.

The recommended way of sorting the data is to use the following three lists (which I set as):

authorsList = []
journalsList = []
papersList = []

In the papersList, each paper's entry is its title, year published, the index of each author(s) and the index of the journal; in this way the name of each journal and author is only stored in one place.

What I learned to do in Python: basic I/O, loops and conditions, defining functions and little exception handling. I've been going through google but a lot of answers to the same question I have, have been using the csv module and regular expressions, which I tried to learn myself but couldn't understand the code that was suggested. Is there a way to do it without the csv and re module?

I was thinking of doing something like this:

for line in openfile:
   a, b, c, d = line.split("\t")
   authorsList.append(a)
   papersList.append(b, c)
   journalsList.append(d)

but dont think that is right at all.
Any suggestions or tips?
Thanks for your time and consideration.

Recommended Answers

All 10 Replies

Any sample of input/expected output?

i'm not quite sure exactly what you mean :D sorry
Well, the text file contains data that looks like this:

AUTHOR(S) YEAR TITLE JOURNAL/CONFERENCE
Accot;Zhai 2001 Scale effects in steering law tasks Proc. ACM CHI
Acredolo 1977 Developmental Changes in the Ability to Coordinate Perspectives of a Large-Scale Space Developmental Psychology
Aginsky;Harris;Rensink;Beusmans 1997 Two strategies for learning a route in a driving simulator Journal of Environmental Psychology

And the entire program function is for a user to input a text file, and then be able to search for papers either by author or journal with the following suggested prompt:

Search an author (A = ***) or journal/conference (J = ***), where *** is any string [Q = quit]:

and then the output of the search will look like this:

Acredolo. (1977). Developmental Changes in Large-Scale Space. Developmental Psychology.

Allen & Ondracek. (1995). Age-sensitive cognitive abilities related to children's acquisition of spatial knowledge. Developmental Psychology.

Aginsky, Harris & Beusmans. (1997). Two strategies for learning a route. Journal of Environmental Psychology.
Submission.

where multiple author names are separated by "&" or ","

hope that answers your question :D
thanks for replying!

While waiting for replies, I have been working on the program and this is what I have so far:

from string import *

while True:
   inputfile = raw_input("Input filename: ")
   try:
      openfile = open(inputfile, "r")
      break
   except:
      print "Invalid file.  Please enter a correct file."

tempAuthorsList = []
journalsList = []
authorsList = []
papersList = []

for line in openfile:
   authors, year, title, journal = line.split("\t")
   papersList.append(year)
   journalsList.append(journal)
   tempAuthorsList.append(authors)

for line in tempAuthorsList:
   names = line.split(";")
   authorsList.append(names)

def authorNameView(terms):
   for term in terms:
      if "," in term:
         sepchar = "& "
         break
   else:
      sepchar = ", "
   
   n = len(terms)
   if n == 0:
      return ""
   elif n == 1:
      return terms[0]
   elif n == 2:
      return " & ".join(terms)
   return "%s%s& %s" % (sepchar.join(terms[:-1]), sepchar, terms[-1])

openfile.close()

print "Search an author (A = ***) or journal/conference (J = ***), where *** is any string [Q = quit]:"

I was able to parse the data to tabs, but I don't know how to append the papersList with both the title and the year, since I was getting an error when I tried to do this:

papersList.append(year, title)

The error said append() only takes one argument

Mayby you need to do one of these:

paperList.append(year).append(title)
paperList.append([year,title])
paperList.extend([year,title])

Thanks! That fixed the error!

i'm not quite sure exactly what you mean :D sorry
Well, the text file contains data that looks like this:

And the entire program function is for a user to input a text file, and then be able to search for papers either by author or journal with the following suggested prompt:

Can you resend the input in code tags or attach from "Additional Options -> Attach files" an input file

Can you resend the input in code tags or attach from "Additional Options -> Attach files" an input file
a3-example-data.txt

a3-example-error1.txt

The first file is a correct file
and I am also supposed to be able to error handle incomplete files as the second one is.

Here small error check for main format of the file

def notfourrecords(a):
    return a.count('\t') != 3

notfound=True
while notfound:
   inputfile = raw_input("Input filename: ")
   try:
      openfile = open(inputfile, "r")
      errors = [i for i in openfile if notfourrecords(i)]
      if errors:
         print "Incorrect record format in records: \n"
         for i in errors: print i
         raise ValueError
      openfile.close()
      openfile = open(inputfile, "r")
      notfound=False

   except:
      print "Invalid file.  Please enter a correct file."
      notfound=True
input filename: a3-example-error1.txt
Incorrect record format in records: 

Acredolo	Developmental Changes in the Ability to Coordinate Perspectives of a Large-Scale Space	Developmental Psychology

Invalid file.  Please enter a correct file.
Input filename:

have you still got this code?

Cahram.

do you still have this code ?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.