I have files that contain lists of lists. Each file has about 20.000 lists that each list member is a list of itself with 15 values. Think of it as a database table. I know we can use a binary search to find a member of a list but can I use a binary search to find matching list members in both files based on the say, second and third members of the lists and if so how?
Thanks
file1
[0][1][xx][yy][4][5]
[0][1][2][3][4][5]

file1
[0][1][2][3][4][5]
[0][1][xx][yy][4][5]

Recommended Answers

All 2 Replies

Binary search applies to ordered data. If you want to perform a binary search based on the second member, you must first sort your data according to the second member and write the result to another file.
Perhaps you should describe your problem more completely: what do you expect as input and output ? and give an exemple of your files.
Another possibility is to use a database (sqlite comes with python) and search with sql requests. After all, filtering data is the main purpose of databases.

Here is a basic example of the use of Sqlite3

# test the sqlite3 module, write and read a database file
# note that Python25 and higher versions have sqlite3 builtin
# sqlite3.connect(database, timeout=5.0, isolation_level=None,
#   detect_types=0, factory=100)
# timeout=5.0 --> allows multiple access for 5 seconds (for servers)
# isolation_level=None -->  autocommit mode
# detect_types=0 --> native types TEXT, INTEGER, FLOAT, BLOB and NULL
# factory=100 --> statement cache to avoid SQL parsing overhead
# tested with Python26 and Python31
# Source:  Sneekula

import sqlite3

# create/connect to a permanent file database
con = sqlite3.connect("my_db.db3")
# for temporary testing you can use memory only
#con = sqlite3.connect(":memory:")

# establish the cursor, needed to execute the connected db
cur = con.cursor()

# create/execute a table:
# (optionally used capital letters to show commands)
cur.execute('CREATE TABLE IF NOT EXISTS clients \
    (id INT PRIMARY KEY, \
    firstname CHAR(60), \
    lastname CHAR(60))')

# insert several lines at once using a
# list of (id, firstname, lastname) tuples
# use try/except or the existing db will complain about
# the non-unique id since it is already in the db
try:
    clients = [
    (107, "Ella", "Fitzgerald"),
    (108, "Louis", "Armstrong"),
    (109, "Miles", "Davis")
    ]
    cur.executemany("INSERT INTO clients (id, firstname, lastname) \
        VALUES (?, ?, ?)", clients )
except:
    pass

# add another client
# again, use try/except or the existing db will complain about
# the non-unique id if it is already in the db
try:
    new_client = (110, "Benny", "Goodman")
    cur.execute("INSERT INTO clients (id, firstname, lastname) \
        VALUES (?, ?, ?)", new_client)
except:
    pass

# important if you make changes to the database
# commits current data to the db file (data is persistant now)
con.commit()

# now test it
# get data row by row
print("Show data row by row:")
# also tell it to sort/order data by lastname
cur.execute('SELECT id, firstname, lastname FROM clients \
    ORDER BY lastname')
for row in cur:
    print(row)

print('-'*40)

# select just one data item from each row ...
cur.execute('SELECT firstname FROM clients')
print(cur.fetchall())

print('-'*40)

# or ...
cur.execute('SELECT firstname FROM clients')
for row in cur:
    print(row[0])

print('-'*40)

# select a specific data row ...
cur.execute('SELECT * FROM clients WHERE lastname="Davis"')
print(cur.fetchall())

# finally ...
con.close()

"""
my output -->
Show data row by row:
(108, u'Louis', u'Armstrong')
(109, u'Miles', u'Davis')
(107, u'Ella', u'Fitzgerald')
(110, u'Benny', u'Goodman')
----------------------------------------
[(u'Ella',), (u'Louis',), (u'Miles',), (u'Benny',)]
----------------------------------------
Ella
Louis
Miles
Benny
----------------------------------------
[(109, u'Miles', u'Davis')]
"""
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.