| tags: [ programming Python ] categories: [Development ]
Random Words, Sets and Instance Identity
Random Words, Sets and Instance Identity
Recently I’ve been working around testing database related operations and in that case, I have need of many little arrays of random words. I also have run into some cases where I wanted to remove all duplicate items from a list, and that sent me down into understanding how the set class in python works with instances of custom classes.
Building a list of random words for testing
I’m still in love with python comprehensions. In this case I wanted to generate a list of random words.
In [1]:
import string
import random
In [204]:
## Produce a word list generator where:
## wordlen is an integer describing the length of each word
## listlen is the number of words that will eventually be generated
## charlist is a list of characters that will be chosen from to generate the wordlist.
##
def wordlist(wordlen=10, listlen=10, charlist=string.ascii_uppercase):
for _ in range(listlen):
yield "".join([random.choice(seq=charlist) for _ in range(wordlen)])
Demonstrate a wordlist
In [205]:
list(wordlist())
Out[205]:
['PNVEEHHJKD',
'ZSQEIPZVQA',
'ZLPMMMKCDA',
'GBJLDDZCZG',
'JAABASOGKQ',
'RNVGMYFJBG',
'PASFFOCRKH',
'CGZYSCCVJS',
'ELIOIQXYMF',
'RQFSAMXKWE']
Demonstrate a word list with a different set of characters.
In [207]:
list(wordlist(charlist=string.ascii_letters))
Out[207]:
['jiEYIVSwtM',
'aLSxzKSOph',
'FLykQrpaIQ',
'eTVzQGxYUC',
'UCcTakLbgZ',
'BmqUfdyKAw',
'yUFqkAAsjf',
'SbivFeUqGp',
'PocykjUfKq',
'uYqukzslRa']
Removing duplicates
First setup a word list with some unique words to start with
In [208]:
ulist = list(wordlist(wordlen=4, listlen=3))
ulist
Out[208]:
['JGWH', 'YOAP', 'TMBN']
Repeatedly call choice over ulist (will result in duplicates) Note, ulist must be a list because choice uses indexing to choose one when called. Sets are not index addressable.
In [209]:
duplist = [random.choice(ulist) for _ in range(10)]
duplist
Out[209]:
['JGWH',
'TMBN',
'JGWH',
'JGWH',
'TMBN',
'TMBN',
'YOAP',
'TMBN',
'TMBN',
'YOAP']
Demonstrate using the set class to show removal of duplicates.
In [212]:
set(duplist)
Out[212]:
{'JGWH', 'TMBN', 'YOAP'}
In [215]:
list(set(duplist)).sort() == ulist.sort()
Out[215]:
True
Note: By definition sets are not ordered so can’t impose some known order when printing a set
Sets on instances of a defined class
I have some cases where I would like to be able to strip duplicate
objects out of a list. I can use the set class approach to filter out
duplicate objects. However, in order to do so, I have to implement
__eq__
and __hash__
methods in the class. This is necessary to define
unicity in the class.
A counter example: class without unicity
As an example, here is an item class that does not implement __eq__
and __hash__
.
In [185]:
class item:
def __init__(self, key, data=None):
self.key = key
self.data = data
def __repr__(self):
return('item(key={},data={})'.format(self.key,self.data))
Definition of item without __eq__
and __hash__
defined. Want two
item objects to be considered the same.
In [186]:
a = item('THAT')
b = item('THAT',data=4)
In [190]:
print(a.__hash__())
print(b.__hash__())
278991750
-9223372036575784062
Two different has values indicate that these two classes are not considered to be the same thing.
In [191]:
a == b
Out[191]:
False
Class with unicity definition
Now redefining item to include __eq__
and __hash__
based on the key attribute
In [216]:
class item:
def __init__(self, key, data=None):
self.key = key
self.data = data
def __repr__(self):
return('item(key={},data={})'.format(self.key,self.data))
#In my case, equal keys is sufficient to consider two objects to be equal
def __eq__(self,other):
return(self.key == other.key)
#Use the key to produce the hash for this instance.
def __hash__(self):
return(hash(self.key))
Now these two objects are considered to be the same.
In [217]:
a = item('THAT')
b = item('THAT',data=4)
In [218]:
print(a.__hash__())
print(b.__hash__())
-3822845408751240381
-3822845408751240381
Identical hash indicates same object
In [198]:
a == b
Out[198]:
True
Now a and b are considered equal even though they may have different data.
Sets of items using unicity by key
Now a set of item will reduce down to a distinct list of items with unique keys. Note that the instance of item kept in the set is abitrary. You won’t know which three items identified by key as ‘this’ will be represented in the set.
In [199]:
{item("this",data=3),item("this",data=44),item("this",data=220),item("that")}
Out[199]:
{item(key=that,data=None), item(key=this,data=3)}