Merge Sorted Iterables

Gribouillis 0 Tallied Votes 584 Views Share

This snippet defines a function to merge sorted iterables containing similar items into a single iterable. Items are supposed to be sorted in ascending order. Only one item per iterable is stored at a time.

# python 2 or 3
"""
Author: Written by Gribouillis for the python forum at www.daniweb.com
Date: November 15, 2010
License: public domain
"""

def merged(*iterables):
    """Merge a sequence of sorted iterables into a single iterable.
    contract:
        * each iterable argument is sorted in ascending order.
        * the returned iterable is sorted in ascending order
    """
    import sys
    if sys.version_info < (3,):
        import Queue as queue
    else:
        import queue
    iterables = [iter(it) for it in iterables]
    heap = queue.PriorityQueue(len(iterables))
    for n, it in enumerate(iterables):
        try:
            heap.put((next(it), n), block = False)
        except StopIteration:
            pass
    try:
        while True:
            item, n = heap.get(block = False)
            heap.task_done()
            yield item
            try:
                heap.put((next(iterables[n]), n))
            except StopIteration:
                pass
    except queue.Empty:
        raise StopIteration
    
if __name__ == "__main__":
    from random import random
    from pprint import pprint
    L = [ sorted(random() for i in range(n)) for n in (10, 5, 15) ]
    pprint(L)
    pprint(list(merged(*L)))
TrustyTony 888 pyMod Team Colleague Featured Poster

This could be maybe transformed to one item lookahead of stream (like ungetch is C), which I have thought sometimes to be usefull.

arkoenig 340 Practically a Master Poster

I don't understand -- what aspect of the program's behavior are you proposing to change, and in what way?

TrustyTony 888 pyMod Team Colleague Featured Poster

I don't understand -- what aspect of the program's behavior are you proposing to change, and in what way?

I do not propose to change this program, I see possibility to use it as 'template code' for one feature I have been missing from python, doing one item lookahead for generators.

Now Googling little there seems to be some alternatives explained before for that:
http://code.activestate.com/recipes/528943/

for example.

Also http://mail.python.org/pipermail/python-ideas/2007-July/000951.html

def insert_iter(it):
     """
         An insertable iterator.
         it.send(value) -> puts value in the iterator.
     """
     it = iter(it)
     buff = []
     while 1:
         if buff:
             nv = yield buff.pop(0)
         else:
             nv = yield it.next()
         while nv:
             buff.append(nv)
             nv = yield nv

it = insert_iter('python')
print (it.send(it.next()),    # send it back, so it is avialable again.
        it.send(it.next()),
        it.next(),
        it.send(it.next()),
        it.next(),
        it.next())

"""PRINTS:
('p', 'p', 'p', 'y', 'y', 't')"""
Gribouillis 1,391 Programming Explorer Team Colleague

Here is an improved version: I realized that using a double while loop would avoid entering a try ... except block too often. With this version, at most 2n+1 'try' blocks are entered during execution, where n is the number of iterables.

# python 2 or 3

def merged(*iterables):
    """Merge a sequence of sorted iterables into a single iterable.
    contract:
        * each iterable argument is sorted in ascending order.
        * the returned iterable is sorted in ascending order
    """
    import sys
    if sys.version_info < (3,):
        import Queue as queue
    else:
        import queue
    iterables = [iter(it) for it in iterables]
    heap = queue.PriorityQueue(len(iterables))
    put, get, task_done = heap.put, heap.get, heap.task_done

    for n, it in enumerate(iterables):
        try:
            put((next(it), n), block = False)
        except StopIteration:
            pass
    try:
        while True:
            try:
                while True:
                    item, n = get(block = False)
                    task_done()
                    yield item
                    put((next(iterables[n]), n), block = False)
            except StopIteration:
                pass
    except queue.Empty:
        pass
    
if __name__ == "__main__":
    from random import random
    from pprint import pprint
    L = [ sorted(random() for i in range(n)) for n in (10, 5, 15) ]
    pprint(L)
    pprint(list(merged(*L)))
Gribouillis 1,391 Programming Explorer Team Colleague

OOOPs, since python 2.6 there is a better implementation in the standard module heapq

Help on function merge in heapq:

heapq.merge = merge(*iterables)
    Merge multiple sorted inputs into a single sorted output.
    
    Similar to sorted(itertools.chain(*iterables)) but returns a generator,
    does not pull the data into memory all at once, and assumes that each of
    the input streams is already sorted (smallest to largest).
    
    >>> list(merge([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
    [0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]

Better use this one !

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.