Gribouillis 1,391 Programming Explorer Team Colleague
Your mission is to
write a program to
open any text file
    containing a tree like this one
and output
    a python list
    having the same structure as this tree
    for example
        this is a nested node
        but this one
            is more deeply nested
    You can add an option
        to accept different encodings
        for the file
    et voilà!

The output list must be the following

['',
 [u'Your mission is to'],
 [u'write a program to'],
 [u'open any text file', [u'containing a tree like this one']],
 [u'and output',
  [u'a python list'],
  [u'having the same structure as this tree'],
  [u'for example',
   [u'this is a nested node'],
   [u'but this one', [u'is more deeply nested']]],
  [u'You can add an option',
   [u'to accept different encodings'],
   [u'for the file']],
  [u'et voil\xe0!']]]

Now you wrote your first text parser.

Gribouillis 1,391 Programming Explorer Team Colleague

A known solution is

import codecs
file = codecs.open(path, encoding='iso8859-1')

see if it works for you.

Gribouillis 1,391 Programming Explorer Team Colleague

This python 2 code snippet uses unicode chess symbols and box drawings symbols to display a chessboard in a terminal. With a terminal profile having a monospace font of size 20 or more, it looks quite usable to play chess.

maxvanden commented: good +0
Gribouillis 1,391 Programming Explorer Team Colleague

Yes, your statements are correct. Here is how you could implement the cpp static field behavior in python using an accessor function instead of an attribute

from __future__ import print_function
from functools import partial

# preliminary definitions

_default = object()

def _helper_static_cpp(container, value = _default):
    if value is _default:
        return container[0]
    else:
        container[0] = value

def static_cpp(initializer = None):
    return partial(_helper_static_cpp, [initializer])

# class example imitating a cpp static field with an accessor function

class A(object):
    istat = static_cpp(111)

if __name__ == '__main__':
    a = A()
    print(A.istat(), a.istat())
    A.istat(112)
    print(A.istat(), a.istat())
    a.istat(113)
    print(A.istat(), a.istat())
    b = A()
    b.istat(114)
    print(A.istat(), a.istat(), b.istat())

""" my output -->
111 111
112 112
113 113
114 114 114
"""
HiHe commented: nice help +5
Gribouillis 1,391 Programming Explorer Team Colleague

Probably

for row in c:
    print (row)
Gribouillis 1,391 Programming Explorer Team Colleague

This code snippet provides methods to convert between various ieee754 floating point numbers format. For example double precision to single precision. The format is given by a pair (w, p) giving the size in bits of the exponent part and the significand part in the ieee754 representation of a number (these are (11, 52) for a standard 8 byte float and (8, 23) for a single precision float, but the snippet allows exotic sizes like (3, 4) or (9, 47)). Conversion to and from python floats are provided, given that your architecture's python floats have 8 bytes. Python >= 2.7 is required.

Gribouillis 1,391 Programming Explorer Team Colleague

I have many single float but how I can make then to be list of float?

It's fairly easy. At the top of your program, you add thelist = list(). Then when you want to add a score to the list, you write thelist.append(score). Finally, add print(thelist) at the end.

vegaseat commented: should solve it +14
HiHe commented: helpful +4
Gribouillis 1,391 Programming Explorer Team Colleague

I wrote a nice class to convert between various ieee754 formats

#!/usr/bin/env python
# -*-coding: utf8-*-
# Title: anyfloat.py
# Author: Gribouillis for the python forum at www.daniweb.com
# Created: 2012-05-02 06:46:42.708131 (isoformat date)
# License: Public Domain
# Use this code freely.

from collections import namedtuple
from math import isnan
import struct
import sys

if sys.version_info < (2, 7):
    raise ImportError("Module anyfloat requires python 2.7 or newer.")


class anyfloat(namedtuple("anyfloat", "sign log2 mantissa")):
    """A class storing real numbers independently from the ieee754 format.

    This class stores a real number as a triple of integers (sign, log2, mantissa)
    where sign is alway 0 or 1 and:
        a) mantissa == -2 is used to represent NaN values. In this case, sign == 0 and log2 == 0.
           There is only one NaN value in this representation.
        b) mantissa == -1 is used to represent +Infinity if sign == 0 and -Infinity if sign == 1.
           In this case, log2 == 0
        c) mantissa == 0 is used to represent 0.0 if sign == 0 and -0.0 if sign == 1. In this
           case, log2 == 0
        d) mantissa > 0 is used to represent any other real number with a finite number of binary
           digits. The real number x corresponding to the anyfloat instance is mathematically
                x = +/- pow(2, log2) * y
           where y is the number in [1, 2[ which binary digits are the binary digits of the mantissa.
           For example the real number corresponding to anyfloat(1, 5, 39) …
Gribouillis 1,391 Programming Explorer Team Colleague

Take care that you have strings, not integer values, so '9' would be maximum at column 1 and '10000' would be minimum at column 2.

Oh yes it is true. Here is the corrected version

def extremes(records):
    for key, group in itt.groupby(records, itemgetter(0)):
        M, m = (int(x) for x in next(group)[1:3])
        for record in group:
            M, m = max(M, int(record[1])), min(m, int(record[2]))
        yield (key, str(M), str(m))

print list(extremes(valuearray))
weblover commented: thank you +2
Gribouillis 1,391 Programming Explorer Team Colleague

This is a typical use case for the __new__() method:

class BalancedTernary(long):
    def __new__(cls, n):
        instance = long.__new__(cls,
            balanced_ternary_value(n) if isinstance(n, str) else n)
        return instance

    def __repr__(self):
        return make_balanced(self)

    __str__ = __repr__

Your code seems to work now.

TrustyTony commented: Thanks for __new__ teaching! +12
Gribouillis 1,391 Programming Explorer Team Colleague

You are making an uncontroled use of recursion: Get_num() calls Get_num(), etc. Each function must have a precise task to do, which can be described in a short documentation string. Look at the following pseudo code for example

# THIS IS PSEUDO CODE, NOT REAL CODE

def main_job():
    """Repeatedly ask a number, compute and display its factorial"""
    indefinitely:   # in python this is 'while True:'
        handle_one_number()
        if not want_another_one():
            display quit message
            return

def handle_one_number():
    """get a number from user, compute and display its factorial"""
    num = get_number()
    f = compute_factorial(num)
    display_result(num, f)

def get_number():
    """get an non negative integer from user"""
    indefinitely:
        answer = raw_input("please enter a number")
        try:
            num = int(answer)
        except ValueError:
            display error message
        else:
            if 0 > num:
                display error message
                continue
            else:
                return num

Also notice that function names are traditionaly not capitalized in python (it's not compulsory, however, read Pep 8 for a coding style guide)

Gribouillis 1,391 Programming Explorer Team Colleague

You can use sub() with a method as argument

import re
from functools import partial

repl_dict = {'cat': 'Garfield', 'dog': 'Oddie' }

def helper(dic, match):
    word = match.group(0)
    return dic.get(word, word)

word_re = re.compile(r'\b[a-zA-Z]+\b')
text = "dog ate the catfood and went to cat's bed to see dog dreams on caterpillars"

print word_re.sub(partial(helper, repl_dict), text)

""" my output -->
Oddie ate the catfood and went to Garfield's bed to see Oddie dreams on caterpillars
"""
TrustyTony commented: Elegant partial+function +12
Gribouillis 1,391 Programming Explorer Team Colleague

This new code generates the index at 13 MB/s with a few assumptions. It should handle the 4GB in a little more than 5 minutes. It uses this code snippet http://www.daniweb.com/software-development/python/code/418239/1783422#post1783422

#!/usr/bin/env python
# -*-coding: utf8-*-
# Title: dups2.py
# Author: Gribouillis

"""Generate the index file with regexes and chunked input and output

    This code does not parse xml, but it assumes that:
    * records are delimited by <tag> and </tag> items, and that these items
        are only used with this meaning in the file.
    * within <tag> and </tag> sections, record's identity is delimited by <id> and </id>
        tags containing an integer value, and these items are only used with this meaning in the file.
    
    The code contains a few assert statements to check these assumptions.
"""

import re
from writechunks import MB, ChunkedOutputFile

class State:
    BASE = 0
    TAG = 1
    ID = 2
    TAGEND = 3

expected_state = {
    '<tag>': State.BASE,
    '<id>': State.TAG,
    '</id>': State.ID,
    '</tag>': State.TAGEND,
}

def next_state(state):
    return (state + 1) % 4

tag = re.compile("</?(?:tag|id)>")

def main2(input_filename, input_chunk, ofh):
    with open(input_filename) as ifh:
        state = State.BASE
        offset = 0
        last_end = 0
        id_saved = ''
        tail = ''
        while True:
            s = ifh.read(input_chunk)
            if s:
                if tail:
                    s = tail + s
            else:
                ofh.write("%d\teof\n" % (offset + len(tail)))
                return
            size = len(s)
            for match in tag.finditer(s):
                t = match.group(0)
                assert expected_state[t] == state
                last_end = match.end()
                if state == State.TAG:
                    begin_id = last_end
                elif state == State.ID:
                    id = id_saved …
Gribouillis 1,391 Programming Explorer Team Colleague

When you call character.useItem(item) you want the item to update the character's statistics using an item-specific dynamic rule. One way to do this is to define an action function for the item instead of a dictionary of statistics

class Character(object):
    def __init__(self, name):
        self.name = name
    def useItem(self, item):
        item.action(item, self)
        
class Item(object):
    def __init__(self, name, action):
        self.name = name
        self.action = action

def action_hello(item, character):
    # implement whatever action the item should do here
    print("hello %s!" % character.name)

item_hello = Item("hello", action_hello)
charles = Character("Charles")

charles.useItem(item_hello)

"""my output -->
hello Charles!
"""

Other ways would be to define specific item classes with an action() method.

TrustyTony commented: Unfortunately now no points, likely. Bug is eating them ;) +12
Gribouillis 1,391 Programming Explorer Team Colleague

When working whith large data files, it may be desirable to output a sequence of bytes by large chunks. This snippet defines a file adapter class to handle this transparently. Writing bytes to an ChunkedOutputFile will automatically write the underlying file object by fixed length chunks.

TrustyTony commented: looks useful +13
Gribouillis 1,391 Programming Explorer Team Colleague

Hm ''.join(the_list).split()

Gribouillis 1,391 Programming Explorer Team Colleague

It is nice, but I think echo() should take a single argument and return a single argument like this

def echo(arg):
    print(arg)
    return arg

Having a unary function allows use with imap() for example

from itertools import imap
s = sum(imap(echo, (x * x for x in range(5))))
Gribouillis 1,391 Programming Explorer Team Colleague

I think there are a lot of misconceptions about classes in your approach.

First, a more accurate name for class Person would be MinimalDataThatThisProgramNeedsToStoreAboutAPerson. It means that the purpose of a class Person is not to mimic a person from the real world but to provide some space in memory to handle some data relative this person. For example this program stores the person's name but not the person's parents names or date of birth because it does not need it. The argument that 'it is more natural' to do such or such way should not be pushed too far. The important question is 'what does this program need'.

Second, classes are a way of factorizing data, because all the instances can access the class members. It means that data common to all people are good candidates to be stored in the class object.

Third, storing instances in a container is common, but it is very different from storing the number of instances. The reason is that stored instances don't disappear unless there are removed from the container or the container is destroyed. Instances often have a short life. For example an instance may be created as a local variable in a function and it disappears when the function returns. Such temporary instances should not be stored. If your program only needs the number of instances created, don't store instances because you think that 'it is more natural'.

For your first question you can check the …

HiHe commented: very helpful +5
Gribouillis 1,391 Programming Explorer Team Colleague

You should be able to install xlwt with the pip installer. For this type

sudo pip install xlwt

in a terminal (with a working internet connection).

If you don't have pip, then you must install pip first. For this type

sudo easy_install pip

If you don't have easy_install, then you must install setuptools first. (I'm assuming you're using python 2, for python 3 you can get easy_install by installing distribute)

ozzyx123 commented: thank you so much +0
Gribouillis 1,391 Programming Explorer Team Colleague

Hm, try this

outstr = "\n".join("\t".join(str(i*j) for j in range(1, 13)) for i in range(1, 13))
print outstr
vegaseat commented: wow +15
Gribouillis 1,391 Programming Explorer Team Colleague

I'll try researching the terms used before continuing this conversation. Thanks again!

Here is a picture. The line can be anywhere in the plane.

Gribouillis 1,391 Programming Explorer Team Colleague

When one of c or d is not zero, the equation has an infinite number of solutions. In fact the solutions of x * c + y * d = f are

x = (f * c - z * d)/(c**2 + d**2)
y = (f * d + z * c)/(c**2 + d**2)

Here z can be any real number.
When c = d = 0, the equation has no solution if f is not zero, otherwise every pair (x, y) is a solution.

Gribouillis 1,391 Programming Explorer Team Colleague

This code is not working on my version of Python (3.2). It just restarts the shell and does not do anything else.

If you are using this in Idle, it won't work because the python process running in the shell is different from Idle gui's process. This will only restart the process running in the shell, not Idle itself.

Gribouillis 1,391 Programming Explorer Team Colleague

Replace line 18 with lpre.append(subject.name) . You can also replace line 19 with return ", ".join(lpre) .

Gribouillis 1,391 Programming Explorer Team Colleague

It's a design issue. Here is my son's school report

FRANCAIS        Mrs xxx     17,0 , 13,0 , 16,0 , 19,0 , 10,5 , 18,0
LATIN           Mrs xxx     5,0
ANGLAIS LV1     Miss xxx    12,5 , 18,0
ESPAGNOL LV1    Mrs xxx     12,5 , 8,0 , 12,0
HIST/GEO/ED.CIV Miss xxx    15,0 , 13,5
MATHEMATIQUES   Mrs xxx     16,0 , 17,5 , 20,0 , 17,5
PHYSIQUE        Miss xxx    10,0 , 13,5
SC VIE ET TERRE Mrs xxx     18,5 , 15,0

As you can see there is a list of subjects and for each subject, a list of marks. These lists of marks belong to both my son and the subject. So I suggest the following class design:

class Student:
    # members
    name
    report

class Subject:
    # members
    name
    teacher_name

"report" can be a list of pairs (subject, list of marks), or a dictionary subject --> list of marks.

Gribouillis 1,391 Programming Explorer Team Colleague

If any letter in "x" has global variable, it will use global variable, else it will just the letter unchanged.

Here it is

def gv(s):
    return globals().get(s, s)
Gribouillis 1,391 Programming Explorer Team Colleague
def gv(s):
    return globals()[s]

print gv(x[0])
Gribouillis 1,391 Programming Explorer Team Colleague

Thanks for pointing that out. I still get that same error though.:(
items is still highlighted.
I'm still learning python and know absolutely nothing about debugging. I bet you can help.

Did you save the code to a python file (.py) and run the file with idle ? Then what is the complete traceback ? What you can do is zip the file and attach the zipped version to a post.

Gribouillis 1,391 Programming Explorer Team Colleague

Use f = file.read() and f = f.split() .

mr_noname commented: Very Helpful! Thanks. +0
Gribouillis 1,391 Programming Explorer Team Colleague

Perhaps use this snippet http://www.daniweb.com/software-development/python/code/257449 instead of os.system()

FALL3N commented: answered my question exactly the way I was looking for it to be answered +3
Gribouillis 1,391 Programming Explorer Team Colleague

It looks easy, replace all the v1, v2, v3 .. variables with arrays v[0], v[1], v[2], something like

import tkinter as tk
from functools import partial

def klik(n):
    button[n].config(image=s[n])

root = tk.Tk()
 
frame1 = tk.Frame(root)
frame1.pack(side=tk.TOP, fill=tk.X)

karirano = tk.PhotoImage(file="kari.GIF")
s = list(tk.PhotoImage(file="%d.GIF" % (i+1)) for i in range(4))
button = list()
for i in range(4):
    button.append(tk.Button(frame1, image=karirano, command=partial(klik, i)))
    button[-1].grid(row=0,column=i)

root.mainloop()
Gribouillis 1,391 Programming Explorer Team Colleague

I would say that the time complexity of the first function is O((j-i) Log(j-i)). The rough argument is this: it is obvious that the number of 'print' only depends on j-i. Call f(z) the number of items printed when prel(x, i, j) is called with an interval length z = j-i. There are z explicit prints in the function, and the function is recursively called 3 times with an interval of length about z/3. It means that f(z) = z + 3 f(z/3) approximately. The mathematical function which satisfies exactly this relation is f(z) = z log(z)/log(3). So this must be roughly the number of prints.

Gribouillis 1,391 Programming Explorer Team Colleague

Here is a possible solution without regexes

def add_dash(str1, str2):
    src = iter(str2)
    return "".join('-' if x == '-' else next(src) for x in str1)

s1 = "75465-54224-4"
s2 = "245366556346"

print add_dash(s1, s2)

"""my output -->
24536-65563-4
"""

Notice that the 6 at the end of s2 was lost because s1 has 11 digits and s2 has 12 digits, which contradicts your statement that all numbers have the same length.

Gribouillis 1,391 Programming Explorer Team Colleague

Did you try

next(os.walk(directory))

In the doc, it seems that os.walk only works with strings (unicode), so that you shouldn't have encoding/decoding issues.

Gribouillis 1,391 Programming Explorer Team Colleague

Thank you Sir

When you have a useful piece of code and no issue, write a code snippet instead of a regular thread. You can choose this in the 'title' section when you start a new thread.

Gribouillis 1,391 Programming Explorer Team Colleague

These two functions compute the orders of the lowest bit and the highest bit set in the binary representation of an integer. I expect them to handle securely very large integer values.

Gribouillis 1,391 Programming Explorer Team Colleague

This snippet defines a decorator @mixedmethod similar to @classmethod, which allows the method to access the calling instance when it exists. The decorated functions have two implicit arguments self, cls, the former having a None value when there is no instance in the call.
Using python's descriptor protocol, the implementation of @mixedmethod needs no more than 5 lines of code !

Gribouillis 1,391 Programming Explorer Team Colleague

You must not insert or remove items in a list while iterating on this list. In your case, you could simply use

for x in a:
    print x
a[:] = ()

If you want to delete only some items, use a pattern like

a = range(10)
print a

def keep_me(item):
   print item
   return item % 2

a[:] = (x for x in a if keep_me(x))
print a
Gribouillis 1,391 Programming Explorer Team Colleague

It's a really strange idea to mix wxpython and tkinter in the same program. You should probably use a wx.FileDialog. See the FileDialog example in this page http://wiki.wxpython.org/Getting%20Started#Dialogs

Gribouillis 1,391 Programming Explorer Team Colleague

The result should include sublists of length k which do not contain first element and lists of first element followed by all sublists from other elements of length k - 1, when length of argument is more than k.

I don't agree with Tony, I think it's better to find all the lists with k-1 elements in the n-1 first items, and for each of these lists, add all the possible last elements.

Gribouillis 1,391 Programming Explorer Team Colleague

ok so now i have:

def is_prime(n):
    if n < 3:
       return False
    i = 2
    while n > i:
       if n % i != 0:
          i += 1
       else:
          return False
    return True

it seems it works fine for me, now :), maybe you could explain why it's inefficient?

First, 2 is a prime number.
It's inefficient because you are testing too many divisors. It would suffice to test all numbers up to n ** 0.5 instead of n. There are other techniques to test less numbers. Efficient primality tests are a jungle, see wikipedia to start with. Also there are very fast probabilistic primality tests, which give primality with a high but non certain probability.

Gribouillis 1,391 Programming Explorer Team Colleague

It should be

def is_prime(n):
     i = 2
     while n > i:
        if n % i != 0:
           i += 1
        else:
           return False
     return True

but this code is inefficient. Search this forum for similar functions.

Gribouillis 1,391 Programming Explorer Team Colleague

It's very simple. First python finds modules installed in some site-packages directories. For example /usr/lib/python2.7/site-packages or /usr/lib64/python2.7/site-packages if you have a 64 bits python, or your per-user site-packages directory ~/.local/lib/python2.7/site-packages (you can create it yourself if it does not exist). If you want python to find modules in other directories, for example ~/foo/bar and ~/baz/taz, you can add the following line to your ~/.bashrc

export PYTHONPATH="$HOME/foo/bar:$HOME/baz/taz:$HOME/.local/lib/python2.7/site-packages:/usr/local/lib/python2.7/site-packages"

You can add as many directories as you need. Then restart a bash shell and import module sc if sc.py is in one of these directories.

@valorien: also read this as an alternative to the shebang line: http://www.daniweb.com/software-development/python/code/241988

valorien commented: very informative +1
Gribouillis 1,391 Programming Explorer Team Colleague

Wel...if you ignore all of that...what do you think about the gzip error. I don't understand it at all...

The gzip error may happen because your site sometimes sends gzipped data and sometimes uncompressed data. I suggest a function which recognizes compressed data

from urllib2 import urlopen
from gzip import GzipFile
from cStringIO import StringIO

def download(url):
    s = urlopen(url).read()
    if s[:2] == '\x1f\x8b': # assume it's gzipped data
        with GzipFile(mode='rb', fileobj=StringIO(s)) as ifh:
            s = ifh.read()
    return s

s = download('http://www.locationary.com/place/en/US/Virginia/Richmond-page28/?ACTION_TOKEN=NumericAction')
print s
jacob501 commented: Very helpful!! +1
Gribouillis 1,391 Programming Explorer Team Colleague

Oh. Okay. I ran it a few times to check and it worked! Thanks! Now I know what a BOM is too!

You can also uncompress it without using a temporary file like this

from urllib2 import urlopen
from gzip import GzipFile
from cStringIO import StringIO
fobj = urlopen('http://www.locationary.com/place/en/US/North_Carolina/Raleigh/Noodles_%26_Company-p1022884996.jsp')
fobj = StringIO(fobj.read())
ifh = GzipFile(mode='rb', fileobj=fobj)
data = ifh.read()
Gribouillis 1,391 Programming Explorer Team Colleague

Sorry. I'm kind of new to all this prgramming stuff. What is a BOM and how will it help?

The BOM is the 2 first bytes of the file. It's used to detect encoding (see wikipedia). In our case, I found \x1f\x8b, and google tells me that this marks files compressed with gzip. Indeed my linux system detects a compressed file and it is able to uncompress it with gunzip. Python can do this too with module gzip. Here we go:

>>> from urllib import urlretrieve
>>> urlretrieve('http://www.locationary.com/place/en/US/North_Carolina/Raleigh/Noodles_%26_Company-p1022884996.jsp', 'myfile')
>>> import gzip
>>> data = gzip.open('myfile', 'rb').read()

!!!

Gribouillis 1,391 Programming Explorer Team Colleague

Oh well...thats what my code looks like already. Daniweb just changed it a little...putting it on one line doesn't change anything for me...I still get the weird result ("&lsaquo (DOT))

Or do you mean that that link worked for you and you got the HTML from it?

I mean did you replace the %26 in the url by & ?

Gribouillis 1,391 Programming Explorer Team Colleague

I get a better result with

page = urllib2.urlopen('http://www.locationary.com/place/en/US/North_Carolina/Raleigh/Noodles_&_Company-p1022884996.jsp').read()

(I replaced %26 with &)

Gribouillis 1,391 Programming Explorer Team Colleague

Or even

with open('Blueprints.txt','a') as file:
    file.write(
        "world.setBlockWithNotify(i {X}, j {Y}, k {Z}, Block.{B}.BlockID);"
        .format(X=X, Y=Y, Z=Z, B=B))
Gribouillis 1,391 Programming Explorer Team Colleague

Add print statements to see what it does

neededwordlist= ['a','p','p','l','e']
rawlist = ['a','l','p','p','e']

need, raw = list(neededwordlist), list(rawlist) # make copies
for i in range(min(len(need), len(raw))):
    if need[i] == raw[i]:
        need[i] = raw[i] = '!'
    print "i = ", i
    print need
    print raw
wrongposition = set(need) & set(raw)
print wrongposition
wrongposition.discard('!')
print wrongposition

""" my output -->
i =  0
['!', 'p', 'p', 'l', 'e']
['!', 'l', 'p', 'p', 'e']
i =  1
['!', 'p', 'p', 'l', 'e']
['!', 'l', 'p', 'p', 'e']
i =  2
['!', 'p', '!', 'l', 'e']
['!', 'l', '!', 'p', 'e']
i =  3
['!', 'p', '!', 'l', 'e']
['!', 'l', '!', 'p', 'e']
i =  4
['!', 'p', '!', 'l', '!']
['!', 'l', '!', 'p', '!']
set(['!', 'p', 'l'])
set(['p', 'l'])
"""

There are other ways to do it, for python gurus, like

A, B = (set(x) for x in zip(*((a,b) for (a,b) in zip(neededwordlist, rawlist) if a != b)))
wrongposition = A & B