0

I am trying to have the user of this script define what html tag they want printed from a document and then print all lines between those tags.

e.g.

<html>
<p>;lasdjf;lsdakjf</p>

if users raw input is <p> I want to print ';lasdjf;lsdakjf'

Right now it doesn't print anything but gives me no errors. Could someone please explain what I am doing wrong.

#!/usr/bin/python
# Filename: Meta_grab.py

import re, os, glob

l = 1

tags = {

"<a>" :	"</a>",
"<abbr>" :	"</abbr>",
"<acronym>" :	"</acronym>",
"<address>" :	"</address>",
"<applet>" :	"</applet>",
"<area>" :	"</area>",
"<b>" :	"</b>",
"<base>" :	"</base>",
"<bdo>" :	"</bdo>",
"<big>" :	"</big>",
"<blockquote>" :	"</blockquote>",
"<body>" :	"</body>",
"<br>" :	"</br>",
"<button>" :	"</button>",
"<caption>" :	"</caption>",
"<cite>" :	"</cite>",
"<code>" :	"</code>",
"<col>" :	"</col>",
"<colgroup>" :	"</colgroup>",
"<dd>" :	"</dd>",
"<del>" :	"</del>",
"<dfn>" :	"</dfn>",
"<div>" :	"</div>",
"<dl>" :	"</dl>",
"<DOCTYPE>" :	"</DOCTYPE>",
"<dt>" :	"</dt>",
"<em>" :	"</em>",
"<fieldset>" :	"</fieldset>",
"<form>" :	"</form>",
"<frame>" :	"</frame>",
"<frameset>" :	"</frameset>",
"<h1>" :	"</h1>",
"<h2>" :	"</h2>",
"<h3>" :	"</h3>",
"<h4>" :	"</h4>",
"<h5>" :	"</h5>",
"<h6>" :	"</h6>",
"<head>" :	"</head>",
"<hr>" :	"</hr>",
"<html>" :	"</html>",
"<i>" :	"</i>",
"<iframe>" :	"</iframe>",
"<img>" :	"</img>",
"<input>" :	"</input>",
"<ins>" :	"</ins>",
"<kbd>" :	"</kbd>",
"<label>" :	"</label>",
"<legend>" :	"</legend>",
"<li>" :	"</li>",
"<link>" :	"</link>",
"<map>" :	"</map>",
"<meta>" :	"</meta>",
"<noframes>" :	"</noframes>",
"<noscript>" :	"</noscript>",
"<object>" :	"</object>",
"<ol>" :	"</ol>",
"<optgroup>" :	"</optgroup>",
"<option>" :	"</option>",
"<p>" :	"</p>",
"<param>" :	"</param>",
"<pre>" :	"</pre>",
"<q>" :	"</q>",
"<samp>" :	"</samp>",
"<script>" :	"</script>",
"<select>" :	"</select>",
"<small>" :	"</small>",
"<tr>" :	"</tr>",
"<tt>" :	"</tt>",
"<ul>" :	"</ul>",
"<var>" :	"</var>",
"<accesskey>" :	"</accesskey>",
"<class>" :	"</class>",
"<dir>" :	"</dir>",
"<id>" :	"</id>",
"<lang>" :	"</lang>",
"<style>" :	"</style>",
"<tabindex>" :	"</tabindex>",
"<title>" :	"</title>",
"<onblur>" :	"</onblur>",
"<onchange>" :	"</onchange>",
"<onclick>" :	"</onclick>",
"<ondblclick>" :	"</ondblclick>",
"<onfocus>" :	"</onfocus>",
"<onkeydown>" :	"</onkeydown>",
"<onkeypress>" :	"</onkeypress>",
"<onkeyup>" :	"</onkeyup>",
"<onload>" :	"</onload>",
"<onmousedown>" :	"</onmousedown>",
"<onmousemove>" :	"</onmousemove>",
"<onmouseout>" :	"</onmouseout>",
"<onmouseover>" :	"</onmouseover>",
"<onmouseup>" :	"</onmouseup>",
"<onreset>" :	"</onreset>",
"<onselect>" :	"</onselect>",
"<onsubmit>" :	"</onsubmit>",
"<onunload>" :	"</onunload>",

}

while l == 1:

   get = raw_input("Type the html tag to get [Example: <p>] ")

   for file in glob.glob('*.txt'):

      docs = open(file, 'r')
      lines = docs.readlines()
      
      for lines in docs: 
         if lines.startswith('%s' % get) and endswith("%s" % tags['%s' % get]):
            print line
         else:
            pass
   
   docs.close()
4
Contributors
4
Replies
5
Views
8 Years
Discussion Span
Last Post by jlm699
0

At the very least, your lines probably end with \n so any endswith() will probably fail unless you account for that.

0

Try stripping leading and trailing whitespace. Add lines.strip(). in front of endswith.

if lines.strip().startswith('%s' % get) and lines.strip().endswith("%s" % tags['%s' % get]):
0

what I am doing wrong.

docs = open(file, 'r')
      lines = docs.readlines()
      
      for lines in docs: 
         if lines.startswith('%s' % get) and endswith("%s" % tags['%s' % get]):
            print line
         else:
            pass
   
   docs.close()

You read in contents of docs to lines, but then try to iterate over docs using the same variable name. It's getting wiped.
correct way:

docs = open(file, 'r')
      lines = docs.readlines()

      for line in lines:
          # Do whatever here
      docs.close()

Also, get rid of that ridiculous dictionary at the beginning. What if somebody uses a tag that your dictionary doesn't contain?

Do something like this:

>>> usr_inp = raw_input('Enter an HTML tag (eg. <p>): ')
Enter an HTML tag (eg. <p>): <foobar>
>>> open_tag = usr_inp
>>> close_tag = usr_inp[:1] + '/' + usr_inp[1:]
>>> print open_tag,close_tag
<foobar> </foobar>
>>>
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.