Yesterday was the first day I ever had to seriously optimize some code I had written. The code in question? A point cloud implementation.
When I started work on said implementation, I coded the library in pure Python, and tried it out on a 50,000-point volume. It took 15 seconds just to initialize and move the volume - completely unacceptable. I had a lot of optimization work to do.
The first technique I tried was just a faster algorithm. I switched from using a SQLite database to flat dictionaries, which gave a speed boost of around 2 seconds. Next I tried binding expensive non-local variables to local scope - an extra second or so. Now I was out of ideas (most of the performance-intensive code was just simple for loops) - I decided to re-write the code in Cython.
Simply compiling my library took the performance up to about 7 seconds; adding static typing in strategic spots took the performance even farther - to 2 seconds. A few Cython-specific optimizations further (binding class member variables to local scope, for one), and my library could process a 50,000-point volume in about 0.35 seconds.
Unfortunately, that's still way to slow for my needs - I need 0.1 seconds or less (preferably 0.05 or less) per 50,000 points, so I still have a ways to go. What I'll probably do is re-code some of the library in flat C++, and then access that from Cython. I'll try to keep my blog updated on whether or not I'm successful in this venture. :)