Make it faster

As is often the case it would be desireable to get this code faster. There's not so much to be gained since already for the small test-cases around 30%-50% of the time is spent in lapack routines. Therefore the performance of the code is a direct measure of the underlying lapack implementation. Still for the smaller system sizes some benefits can be obtained: Some results obtained on my laptop using the (awful....) reference lapack implementation:

Ising: old new

    3.586 +- 0.10   s           3.2333 +- 0.0078s

Hub:

       339.5 s                           340.9s

Kondo:

       38.18s +- 0.13s                 34.377  +- 0.093s