poeticinsanity 2 Light Poster

I am attempting to encode using a module called Beautiful Soup. All I need some direction on solving the problem. The encoding maps to <undefined>, so the unicode is not defined within the charmap.

The error I get is: UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-34: character maps to <undefined>

The sequence being encoded is: u'\u0411\u044a\u043b\u0433\u0430\u0440\u0441\u043a\u0438 \u043f\u0440\u0435\u0432\u043e\u0434 \u043d\u0430 \u0440\u0430\u0437\u0433\u043b\u0435\u0436\u0434\u0430\u0447\u0430 \u041c\u043e\u0437\u0438\u043b\u043b\u0430.'

The test should be: Български превод на разглеждача Мозилла.

The code being used is pieced below. Without an understanding of BeautifulSoup, it wouldn't make much sense. However, the above encoding error is where I need help:

#parses the long name for a project from index page
def parse_project_longname(html):
    p=re.compile('>Name: <strong>.+?</strong>')
    results=p.findall(html)
    if(results):
        name=results[0]
        name=name[15:len(name)-9]
        name=BeautifulSoup(name,convertEntities=BeautifulSoup.HTML_ENTITIES)
        name=name.contents[0]
    else:
        name=None
    return name

def test():
    utils=FLOSSmoleutils('dbInfoTest.txt')
    select='SELECT project_name, indexhtml FROM sv_project_indexes WHERE datasource_id=2'
    utils.cursor.execute(select,)
    results=utils.cursor.fetchall()
    for result in results:
        name=result[0]
        html=result[1]
        print("Name: "+name)
        id=SavannahParsers.parse_project_longname(html)
        print(id)
test()

Any help or direction would be appreciated. Thank you.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.