ALF issueshttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues2021-12-08T13:33:11Zhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/208rework hop_mod and wrapgrup and wrapgrdo2021-12-08T13:33:11ZFlorian Gothrework hop_mod and wrapgrup and wrapgrdoFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/175Include MSCBDECOMP + higher order checkerboard2021-01-06T01:03:41ZFlorian GothInclude MSCBDECOMP + higher order checkerboard- [x] include MSCBDECOMP
- [x] make an object for it
- [x] Higher order checkerboard- [x] include MSCBDECOMP
- [x] make an object for it
- [x] Higher order checkerboardFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/174More Use cases for the ContainerElementBase Object Hierarchy2020-12-03T15:16:37ZFlorian GothMore Use cases for the ContainerElementBase Object HierarchyNow with the ContainerElementBase Object Hierarchy in place and a dynamic container present that is able to take up
these objects in some "dynamic" way we can think about extensions.
## Approximate techniques
- [ ] Minimum Split Checker...Now with the ContainerElementBase Object Hierarchy in place and a dynamic container present that is able to take up
these objects in some "dynamic" way we can think about extensions.
## Approximate techniques
- [ ] Minimum Split Checkerboard + Higher Order Checkerboard #175
- [ ] Lanczos
## Exact techniques
- [ ] Block techniques #176
- [ ] FFT techniques
## Use cases in other places
- [ ] Op_V is not that much different. Get rid of the various types?Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/169Flo's playground with Op_Ts2020-11-30T16:15:54ZFlorian GothFlo's playground with Op_TsSome playground for ideas.Some playground for ideas.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/129higher order integration/ multivalued Ising spins2020-09-17T15:35:49ZFlorian Gothhigher order integration/ multivalued Ising spinsEverybody wants to have higher order Integration, hence we have to implement it.
This could be useful if somebody wants to explore the behaviour of ALF if we go from the current spins
to the continuous fields.Everybody wants to have higher order Integration, hence we have to implement it.
This could be useful if somebody wants to explore the behaviour of ALF if we go from the current spins
to the continuous fields.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/113Cholesky vs maxent2022-05-30T13:12:54ZFlorian GothCholesky vs maxentyou have to touch xqmc1 and Vhelp.you have to touch xqmc1 and Vhelp.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/91Test the compact WY decomposition in real life2018-10-01T15:55:30ZFlorian GothTest the compact WY decomposition in real lifeMake a branch from Johannes' branch and introduce xGEQRT into udv_state in the case of the logscale where permutations are disabled anyway.Make a branch from Johannes' branch and introduce xGEQRT into udv_state in the case of the logscale where permutations are disabled anyway.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/90PLASMAAlf2020-09-04T12:08:18ZFlorian GothPLASMAAlfPlasma,
https://bitbucket.org/icl/plasma
seems to have a superior QR decomposition that is highly parralizable.
Let's see where this leadsPlasma,
https://bitbucket.org/icl/plasma
seems to have a superior QR decomposition that is highly parralizable.
Let's see where this leadsFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/62clALF2017-06-22T17:05:01ZFlorian GothclALFSo we'd like to use ALF on certain Accelerators. The most simple solution would be to use AMDs ACML:
http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-downloads-resources/
This library provides...So we'd like to use ALF on certain Accelerators. The most simple solution would be to use AMDs ACML:
http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-downloads-resources/
This library provides a full lapack and BLAS Implementation with a working Fortran Interface that can autoamtically use
external accelerators via OpenCL. Sadly AMD has only 2 profiles for two of their GPUs from around 2014 and I could not get it to work satisfactory.
The next idea is to use clMAGMA from the MAGMA initiative:
http://icl.cs.utk.edu/magma/
The most recent version has a working Fortran interface(I suppose), support for sparse vector operations and is maintained by a similar set of people as the reference lapack. But it essentially only supports CUDA.
There is magmaMIC that can utilize Xeon Phi's and there's clMAGMA that uses OpenCL as backend.
Sadly when the authors tried to add a Fortran Interface they found out that there would be some work involved. So this is also out....
There is ViennaCL:
http://viennacl.sourceforge.net/
But this is only C++ but it looks very powerful especially for sparse operations.
Since ALF spends its time mostly in low-level BLAS3 Routines (ZHEMM's in my branch on the Hubbard-model)
we can get away with just trying to plug in a library that emulates the BLAS interface.
To my knowledge thereis no library that provides a full Fortran Interface.
If we go on to write our own wrappers there are two contenders:
clBLAS: https://github.com/clMathLibraries/clBLAS
This was a part of AMD's ACML and is now open sourced and seems to be a little bit maintained.
and TomTom recently released clBLAST:
https://arxiv.org/abs/1705.05249 , https://cnugteren.github.io/clblast/clblast.html
It is very new and being from the outside of HPC puts a lot of effort into ensuring the portability, and also has Netlib.org lapack interface that can be almost linked against fortran.
So for now I will try to see wether clBLAS works and I can offload ZHEMM calls...
First experiences with clBLAS:
Adding the ZHEMM call is now finished. This works and gives correct results.
For now I could only test execution on a CPU(i7-2600). The Multiplication is automatically
parralellized but oversubscribes my CPU with ~ 8 threads. This would be OK, but the runtime is 5 times longer than plain single thread execution...
Some numbers:
(core-i7 920, 8x8 lattice ) master: 13s clalf: 97s (upto 4 threads...), CLBlast: 97s (around 1.5 threads effectively used)
(core-i7 920, 12x12 lattice ) master: 136s clalf: 415s (upto 4 threads...), clBlast: 171s (~ 2.5 threads)
(core-i7 920, 16x16 lattice ) master: 776(single thread) , clBlast: 545s (~ 4 threads used well)
(core -i7 2600 20x20) master: 357s, clBlas: 880s
For now I concur that clBLAS is an AMD GPU only solution.
The numbers didn't change much by using the inbuilt auto-tuner for my CPU for CLBlast.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/57Predict Global Moves2017-06-18T13:38:40ZFlorian GothPredict Global MovesThis branch serves to test strategies to predict various global moves.
It seems that in the end it boils down to find an approximate model for the transition probability T(s, s')
Given the current state s' we can either try to invert...This branch serves to test strategies to predict various global moves.
It seems that in the end it boils down to find an approximate model for the transition probability T(s, s')
Given the current state s' we can either try to invert the equation for s or, just throw random configurations until we
find a configuration where T is sufficiently large.
For interpolating the function T and throwing random configurations s against it we can use a feed-forward network.
If we want to go for inverting the function for s we can use a Deep Belief network or any other (and hopefully simpler)
generative network.Florian GothFlorian Goth