When writing tests, there’s one thing we’re now sure to do because it’s caught us out so many times before: use unicode everywhere, particularly in Python.
Often, people will just go to Wikipedia and paste bits of unicode chosen at random into sample values, but that doesn’t always make for good readability when the tests you forgot to comment break two months later and you have to revisit them.
I’ve found a simple way to generate test unicode values that make sense, is to use an upside-down text generator, or some other l33t text transformer which produces unicode. Using text detailing whatever the sample value is supposed to represent, it’s still pretty legible at a glance and you’ll hopefully flag up those pesky UnicodeDecode errors quicker.
There’s a handy list of different text transformations and websites that will perform them for you on Wikipedia.