ALF issueshttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues2021-12-08T13:33:11Zhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/208rework hop_mod and wrapgrup and wrapgrdo2021-12-08T13:33:11ZFlorian Gothrework hop_mod and wrapgrup and wrapgrdoFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/195Automatic computation of Hopping_Matrix_Type%Multiplicity2021-11-22T13:45:01ZFrancesco Parisen ToldinAutomatic computation of Hopping_Matrix_Type%MultiplicityThe field Hopping_Matrix_Type%Multiplicity contains the number of times a given orbital appears in the list of bonds. Therefore it could be deduced from Hopping_Matrix_Type%List which contains the bonds emanating from a unit cell.
It see...The field Hopping_Matrix_Type%Multiplicity contains the number of times a given orbital appears in the list of bonds. Therefore it could be deduced from Hopping_Matrix_Type%List which contains the bonds emanating from a unit cell.
It seems a good idea to have it automatically computed, instead of relying in the user to enter it, in particular if the user defines a new set of hopping with checkerboard decomposition. This will ensure consistency and avoid bugs potentially difficult to detect.
The algorithm to compute it is basically easy (just a loop over Hopping_Matrix_Type%List).
Concerning the design, I see some possibilities:
1. Define a procedure like this
```
Subroutine Set_Multiplicity(this)
Type(Hopping_Matrix_Type), intent(INOUT) :: this
```
Pros: simplicity
Cons: again, we rely in the user to call it. And we have to properly document it.
2. Considering that, as far as I can see, Hopping_Matrix_Type%Multiplicity is only used in Predefined_Hoppings_set_OPT, we could remove altogether the field Hopping_Matrix_Type%Multiplicity and define a function that computes it, something like
```
Subroutine Compute_Multiplicity(hop_matrix, multiplicity)
Type(Hopping_Matrix_Type), intent(IN) :: hop_matrix
Integer, intent(OUT) :: multiplicity(:)
```
Pros: we remove completely the possibility of errors in Multiplicity. No need to document it.
Cons: at the moment Multiplicity is only used in a subroutine, presumably called only once. Should we use it in more routines, we may lose in efficiency by computing it every time.
3. Let the object compute Multiplicity the first time is needed, and cache the result. This is obtained by having a method of Hopping_Matrix_Type like that
```
Function Multiplicity(n)
implicit none
n, intent(IN) :: Integer
Multiplicity :: dble
If (is_cache_valid) then
Multiplicity = m_Multiplicity(n)
else
call Set_Multiplicity()
is_cache_valid = .true.
Multiplicity = m_Multiplicity(n)
end if
```
Here, is_cache_valid is a boolean that signals if multiplicity has been computed. m_Multiplicity is where the data are stored. Both should be, I think, private variables.
Pro: This can be regarded as a "internal" method that we do not need to necessarily document. Good efficiency. We hide the actual implementation of Multiplicity, without changing the syntax: that is, still Multiplicity(n) gives me the multiplicity of the n-th orbital.
Cons: Perhaps slightly more complex in the logic.
I personally favor solution 3., but I am not entirely sure: does Fortran allow for such type of c++-like design?Francesco Parisen ToldinFrancesco Parisen Toldinhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/181Reduce memory consumption on a node by sharing data between MPI task2021-05-31T11:05:37ZJohannes HofmannReduce memory consumption on a node by sharing data between MPI taskSome memory objects are independent of the rank of the MPI task within a tempering group that shares all parameters. Those can be shared between different MPI jobs on the same node and thus reduce the memory footprint of ALF.Some memory objects are independent of the rank of the MPI task within a tempering group that shares all parameters. Those can be shared between different MPI jobs on the same node and thus reduce the memory footprint of ALF.Johannes HofmannJohannes Hofmannhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/177FFT for application of Op_T matrices2021-03-24T17:53:33ZFlorian GothFFT for application of Op_T matricesFakher wants to use an FFT to optimize for circulant matrices.Fakher wants to use an FFT to optimize for circulant matrices.Fakher F. AssaadFakher F. Assaadhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/176Block Diagonal Op_T2020-11-30T13:03:23ZFlorian GothBlock Diagonal Op_TFakher wants to automatically decompose block diagonal matrices.Fakher wants to automatically decompose block diagonal matrices.Fakher F. AssaadFakher F. Assaadhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/175Include MSCBDECOMP + higher order checkerboard2021-01-06T01:03:41ZFlorian GothInclude MSCBDECOMP + higher order checkerboard- [x] include MSCBDECOMP
- [x] make an object for it
- [x] Higher order checkerboard- [x] include MSCBDECOMP
- [x] make an object for it
- [x] Higher order checkerboardFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/174More Use cases for the ContainerElementBase Object Hierarchy2020-12-03T15:16:37ZFlorian GothMore Use cases for the ContainerElementBase Object HierarchyNow with the ContainerElementBase Object Hierarchy in place and a dynamic container present that is able to take up
these objects in some "dynamic" way we can think about extensions.
## Approximate techniques
- [ ] Minimum Split Checker...Now with the ContainerElementBase Object Hierarchy in place and a dynamic container present that is able to take up
these objects in some "dynamic" way we can think about extensions.
## Approximate techniques
- [ ] Minimum Split Checkerboard + Higher Order Checkerboard #175
- [ ] Lanczos
## Exact techniques
- [ ] Block techniques #176
- [ ] FFT techniques
## Use cases in other places
- [ ] Op_V is not that much different. Get rid of the various types?Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/167Why is UDV%D a full complex number2020-10-26T12:41:40ZFlorian GothWhy is UDV%D a full complex number@fassaad, @Hofmann :
UDV%D is still a full complex number. We calculate D from the return values of norm related functions (Abs) which give positive real numbers. To my knowledge we never store anything complex in this array anyway and h...@fassaad, @Hofmann :
UDV%D is still a full complex number. We calculate D from the return values of norm related functions (Abs) which give positive real numbers. To my knowledge we never store anything complex in this array anyway and hence we might want to optimize that away.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/114block diagonal interaction terms2019-09-29T10:50:34ZJohannes Hofmannblock diagonal interaction termsSome models have a block diagonal structure in the interaction terms, similar to the already exploited N_fl structure. However, the non-interacting part may not be block diagonal such that N_fl cannot be used.
This branch aims to impleme...Some models have a block diagonal structure in the interaction terms, similar to the already exploited N_fl structure. However, the non-interacting part may not be block diagonal such that N_fl cannot be used.
This branch aims to implement this structure!
I would propose to redefine the operator type to contain a list of the "dense" blocks of the operators. I.e., P(:) --> P(:,b); O(:,:) --> O(:,:,b) and so on. b should be the index of the block and appear as the last index for memory efficiency as with n_fl. The rest of the code should then be updated accordingly.
It might also be convenient, to use the number of blocks as an optional argument when the operator is allocated, ideally in a way that the existing model without that structure can stay as they are. This way, the more experienced user can make use of this option whereas a "new" user doesn't have to care about this in the beginning.Emilie HuffmanEmilie Huffmanhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/113Cholesky vs maxent2022-05-30T13:12:54ZFlorian GothCholesky vs maxentyou have to touch xqmc1 and Vhelp.you have to touch xqmc1 and Vhelp.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/103Suggestion: increment N_auto in powers of two to speed up autocorrelation ana...2019-05-21T12:02:40ZMartin HohenadlerSuggestion: increment N_auto in powers of two to speed up autocorrelation analysisThe documentation points out that the autocorrelation analysis is expensive. However, there is very little use in analyzing and including every bin length between 1 and 500 (in the example of Fig.1). It is common in the literature to inc...The documentation points out that the autocorrelation analysis is expensive. However, there is very little use in analyzing and including every bin length between 1 and 500 (in the example of Fig.1). It is common in the literature to increase N_auto by a factor of two instead.https://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/94automatically generate zslgemm and zslhemm with arbitary N2019-02-07T22:01:21ZJonas Schwabautomatically generate zslgemm and zslhemm with arbitary NCreate Libraries/Modules/Mat_subroutines.f90 with script during compilation with arbitrary high N.Create Libraries/Modules/Mat_subroutines.f90 with script during compilation with arbitrary high N.Jonas SchwabJonas Schwabhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/91Test the compact WY decomposition in real life2018-10-01T15:55:30ZFlorian GothTest the compact WY decomposition in real lifeMake a branch from Johannes' branch and introduce xGEQRT into udv_state in the case of the logscale where permutations are disabled anyway.Make a branch from Johannes' branch and introduce xGEQRT into udv_state in the case of the logscale where permutations are disabled anyway.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/90PLASMAAlf2020-09-04T12:08:18ZFlorian GothPLASMAAlfPlasma,
https://bitbucket.org/icl/plasma
seems to have a superior QR decomposition that is highly parralizable.
Let's see where this leadsPlasma,
https://bitbucket.org/icl/plasma
seems to have a superior QR decomposition that is highly parralizable.
Let's see where this leadsFlorian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/82Introduve Type 3 for continous field operators2019-09-25T12:45:19ZJohannes HofmannIntroduve Type 3 for continous field operatorsHere we will first upgrade the integer nsigma storage of the field (ising and auxiliary) to double such that it can also accommodate continues field implementation. This will also require an update of the operator module to handle the fi...Here we will first upgrade the integer nsigma storage of the field (ising and auxiliary) to double such that it can also accommodate continues field implementation. This will also require an update of the operator module to handle the fields accordingly.
Second, there has to follow the implementation of the updates of the continuous fields using HMC.Johannes HofmannJohannes Hofmannhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/75Use a logscale for the QMC intrinsic scales2017-11-01T18:40:54ZJohannes HofmannUse a logscale for the QMC intrinsic scalesI ran into double over-flowing and under-flowing issues which extremely limited the range of beta*V for my model. This problem gets solved in this issue as long as beta*V (or beta*bandwidth for weakly interacting Hubbard models) is small...I ran into double over-flowing and under-flowing issues which extremely limited the range of beta*V for my model. This problem gets solved in this issue as long as beta*V (or beta*bandwidth for weakly interacting Hubbard models) is smaller then ~10^300.Johannes HofmannJohannes Hofmannhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/74Optimization for hopping operators2017-09-12T15:59:41ZJohannes HofmannOptimization for hopping operatorsIt might be useful to specialize hopping operator as many of us simulate Ising fields coupled to bond hopping or hopping operators squared in Kondo type couplings.It might be useful to specialize hopping operator as many of us simulate Ising fields coupled to bond hopping or hopping operators squared in Kondo type couplings.https://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/73introduce generic function to multiply a small matrix onto a larger on2017-10-30T18:43:33ZJohannes Hofmannintroduce generic function to multiply a small matrix onto a larger onDuring a discussion with Florian, we saw some potential benefit to introduce the following function:
ZSLMM(char side, char op, int N, int M, complex A(:,:), int P(:), complex B(:,:))
This function should multiply the (small) matrix op(A)...During a discussion with Florian, we saw some potential benefit to introduce the following function:
ZSLMM(char side, char op, int N, int M, complex A(:,:), int P(:), complex B(:,:))
This function should multiply the (small) matrix op(A) [A; A^T; A^C] onto the subblock spezified by the indices in P(:) of the large matrix B from the left (side='L') or right (side='R').
This definitions is general enough to treat (almost) every Operator-Matrix multiplication and also provides a well defined area for optimizations/specializations.Johannes HofmannJohannes Hofmannhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/65Filter Branch 592017-10-19T09:56:34ZFlorian GothFilter Branch 59Branch 59 contains stuff that works and stuff that doesn't work.
the simple optimizations for the Hubbard model are among the good things and the fiddling with the QR related routines
are among the things that don't make things go faster...Branch 59 contains stuff that works and stuff that doesn't work.
the simple optimizations for the Hubbard model are among the good things and the fiddling with the QR related routines
are among the things that don't make things go faster. We want to keep branch 59 for reference.Florian GothFlorian Gothhttps://git.physik.uni-wuerzburg.de/ALF/ALF/-/issues/62clALF2017-06-22T17:05:01ZFlorian GothclALFSo we'd like to use ALF on certain Accelerators. The most simple solution would be to use AMDs ACML:
http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-downloads-resources/
This library provides...So we'd like to use ALF on certain Accelerators. The most simple solution would be to use AMDs ACML:
http://developer.amd.com/tools-and-sdks/archive/compute/amd-core-math-library-acml/acml-downloads-resources/
This library provides a full lapack and BLAS Implementation with a working Fortran Interface that can autoamtically use
external accelerators via OpenCL. Sadly AMD has only 2 profiles for two of their GPUs from around 2014 and I could not get it to work satisfactory.
The next idea is to use clMAGMA from the MAGMA initiative:
http://icl.cs.utk.edu/magma/
The most recent version has a working Fortran interface(I suppose), support for sparse vector operations and is maintained by a similar set of people as the reference lapack. But it essentially only supports CUDA.
There is magmaMIC that can utilize Xeon Phi's and there's clMAGMA that uses OpenCL as backend.
Sadly when the authors tried to add a Fortran Interface they found out that there would be some work involved. So this is also out....
There is ViennaCL:
http://viennacl.sourceforge.net/
But this is only C++ but it looks very powerful especially for sparse operations.
Since ALF spends its time mostly in low-level BLAS3 Routines (ZHEMM's in my branch on the Hubbard-model)
we can get away with just trying to plug in a library that emulates the BLAS interface.
To my knowledge thereis no library that provides a full Fortran Interface.
If we go on to write our own wrappers there are two contenders:
clBLAS: https://github.com/clMathLibraries/clBLAS
This was a part of AMD's ACML and is now open sourced and seems to be a little bit maintained.
and TomTom recently released clBLAST:
https://arxiv.org/abs/1705.05249 , https://cnugteren.github.io/clblast/clblast.html
It is very new and being from the outside of HPC puts a lot of effort into ensuring the portability, and also has Netlib.org lapack interface that can be almost linked against fortran.
So for now I will try to see wether clBLAS works and I can offload ZHEMM calls...
First experiences with clBLAS:
Adding the ZHEMM call is now finished. This works and gives correct results.
For now I could only test execution on a CPU(i7-2600). The Multiplication is automatically
parralellized but oversubscribes my CPU with ~ 8 threads. This would be OK, but the runtime is 5 times longer than plain single thread execution...
Some numbers:
(core-i7 920, 8x8 lattice ) master: 13s clalf: 97s (upto 4 threads...), CLBlast: 97s (around 1.5 threads effectively used)
(core-i7 920, 12x12 lattice ) master: 136s clalf: 415s (upto 4 threads...), clBlast: 171s (~ 2.5 threads)
(core-i7 920, 16x16 lattice ) master: 776(single thread) , clBlast: 545s (~ 4 threads used well)
(core -i7 2600 20x20) master: 357s, clBlas: 880s
For now I concur that clBLAS is an AMD GPU only solution.
The numbers didn't change much by using the inbuilt auto-tuner for my CPU for CLBlast.Florian GothFlorian Goth