SPIVET installation and tests

Attempt to install SPIVET on CentOS 7 via Anaconda

Dependencies

Create a python 2 env

conda create -n py27_new python=2.7 ipykernel
conda activate py27_new
conda install -c conda-forge backports.functools_lru_cache
python -m ipykernel install --user --name py27_new

python packages

Matplotlib may have conflict with package (maybe vtk), so install that first:
conda install -c conda-forge matplotlib
Then general packages:

conda install -c anaconda numpy scipy netcdf4 pillow
conda install -c conda-forge vtk

Since the PIL package required by SPIVET is not maintained any more, nor can it be installed via pip or conda, we install pillow instead.
pillow is a fork of PIL, it does not support import Image since v1.0 (now v7.2), so we need to manually modify all import Image to
from PIL import Image
This is similar for modules like ImageFilter, ImageOps.
We also need to add the correct path in setup.py for the pkgs:

include_dirs = [
    ibpath,
    ibpath + '/..',# for dependencies like netcdf.h
    lbpath + '/numpy/core/include',
    lbpath + '/numpy/numarray',
    'lib/pivlib/exodusII',
    #'/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Headers',
    '/usr/include' # for lapack/blas
]

LAPACK and BLAS

SPIVET use LAPACK and BLAS via clapack.h, cblas.h, and it seems to be from Accelerate.framework (vecLib) under Mac by default, since it includes:

typedef __CLPK_integer lpk_int;

which does not exist in lapack or atlas package under CentOS.
On the other hand, MKL uses mkl_lapack.h, mkl_blas.h, and more variables/types are not in common.

Attempt to use Atlas

Atlas is more optimized than lapack. In its clapack.h it defined ATL_INT instead of __CLPK_integer.
These are the file structures of atlas and lapack, they can co-exist:
https://centos.pkgs.org/7/centos-x86_64/lapack-devel-3.4.2-8.el7.x86_64.rpm.html
https://centos.pkgs.org/7/centos-x86_64/lapack-3.4.2-8.el7.x86_64.rpm.html
https://centos.pkgs.org/7/centos-x86_64/atlas-devel-3.10.1-12.el7.x86_64.rpm.html
https://centos.pkgs.org/7/centos-x86_64/atlas-3.10.1-12.el7.x86_64.rpm.html

Install Atlas:

yum install -y atlas atlas-devel

replace all __CLPK_integer to ATL_INT in SPIVET.
Now under root dir of SPIVET:

python setup.py build
python setup.py install

And then SPIVET can be installed under this python env of anaconda.
Test:

cd tests/
python run_tests.py

We can pass some tests, but fail some important ones:

Ran 149 tests in 23.684s

FAILED (failures=1, errors=37)
======================================================================
EXODUSII Tests Run: 123, Failed 0

use lapacke for lapack instead of atlas

Install lapacke:

yum install -y lapack lapack-devel

lapacke uses lapacke/lapacke.h under /usr/include, lapack_int instead of __CLPK_integer.
Do the changes accordingly, the tests give identical results:

Ran 149 tests in 23.709s

FAILED (failures=1, errors=37)
======================================================================
EXODUSII Tests Run: 123, Failed 0

Use MKL instead of Atlas

Change:
setup.py:

include_dirs = [
    ibpath,
    ibpath + '/..',
    lbpath + '/numpy/core/include',
    lbpath + '/numpy/numarray',
    'lib/pivlib/exodusII',
    #'/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Headers',
    '/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/include'
]

In lib/spivet/pivlib/pivlibc.c:

#include <mkl_lapack.h>
#include <mkl_cblas.h>


//#include <lapacke/lapacke.h>
//#include <cblas.h>

typedef MKL_INT lpk_int;
//typedef __CLPK_integer lpk_int;

In lib/spivet/flolib/floftlec.c:

#include <mkl_lapack.h>

//#include <clapack.h>
//#include <lapacke/lapacke.h>

typedef MKL_INT lpk_int;
//typedef lapack_int lpk_int;
//typedef __CLPK_integer lpk_int;

build, install,test:

Ran 149 tests in 23.353s

FAILED (failures=1, errors=37)
======================================================================
EXODUSII Tests Run: 123, Failed 0

It is probably because the deprecated spivet code it self.

Attempt to solve errors in the code one by one

Most common type:

File "/home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/spivet/pivlib/pivdata.py", line 1116, in __setitem__
    if ( self.m_eshape == None ):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Fix: change == to is, != to is not
https://stackoverflow.com/questions/23086383/how-to-test-nonetype-in-python/23086405

related lines:

  • lib/spivet/pivlib/pivdata.py: 797,1116,1317
  • lib/spivet/pivlib/pivir.py: 114, 140, 226, 498
  • lib/spivet/pivlib/pivsim.py: 510
  • A lot in lib/spivet/: pivlib/pivof.py tlclib/tlcutil.py steputil.py steps.py pivlib/pivutil.py pivlib/pivsim.py pivlib/pivdata.py pivlib/pivpickle.py flolib/flotrace.py pivlib/pivpg.py pivlib/pivpgcal.py flolib/flotrace.py flolib/floftle.py pivlib/pivpost.py pivlib/pivir.py tlclib/tlctf.py(== None, != None replace using %s)
    Now errors reduce to 4!

Second type:

    po.GetOutput().Update()
AttributeError: 'vtkCommonDataModelPython.vtkPolyData' object has no attribute 'Update'

  File "/home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/spivet/pivlib/pivsim.py", line 840, in dump2vtk
    dw.SetInput(rpd)
AttributeError: 'vtkIOLegacyPython.vtkPolyDataWriter' object has no attribute 'SetInput'

  File "/home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/spivet/pivlib/pivsim.py", line 504, in getVTKRep
    apd.AddInput(pd)
AttributeError: 'vtkFiltersCorePython.vtkAppendPolyData' object has no attribute 'AddInput'

This is due to incompatible updates of VTK6:
https://vtk.org/Wiki/VTK/VTK_6_Migration/Replacement_of_SetInput#Replacement_of_SetInput.28.29_with_SetInputData.28.29_and_SetInputConnection.28.29
https://stackoverflow.com/questions/47776121/vtk-changes-for-getimage-and-update

Fix:

  • lib/spivet/pivlib/pivsim.py:
    313,322,471,480,587,596: po.GetOutput().Update()-->po.Update()
    349,500,510,1441,1446: dw.SetInputData(rpd) ---> dw.SetInputData(rpd)
    353,354,504,505,600: apd.AddInput ---> apd.AddInputData

More Errors & Warnings

Warning: will check in the end

testCalibration (pivpg_test.test_pivpg) ... /home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/scipy/optimize/minpack.py:447: RuntimeWarning: Number of calls to function has reached maxfev = 1000.
  warnings.warn(errors[info][0], RuntimeWarning)
ok

Error:

testOctreeVTKINode (pivsim_test.testSimOctree) ... I am here: test-output/octree
ERROR
testoctreeVTKLNode (pivsim_test.testSimOctree) ... ERROR
testOutput (pivsim_test.testTraceRectangle) ... ERROR
testOutput (pivsim_test.testTraceCylinder) ... ERROR
testImagesMatch (pivsim_test.testTraceBitmapRectangle) ... Segmentation fault (core dumped)

The summary report is not available due to segmentation fault.
Comment out line 1124 in pivsim_test.py

suite.addTest( unittest.makeSuite( testTraceBitmapRectangle  ) )

rerun the tests

Currently passed tests:

    suite.addTest( unittest.makeSuite( testSimObjectSetup        ) )
    suite.addTest( unittest.makeSuite( testSimObjectTransforms   ) )
    suite.addTest( unittest.makeSuite( testSimRay                ) )
    suite.addTest( unittest.makeSuite( testSimCylindricalSurface ) )
    suite.addTest( unittest.makeSuite( testSimRectPlanarSurface  ) )
    suite.addTest( unittest.makeSuite( testSimCircPlanarSurface  ) )
    suite.addTest( unittest.makeSuite( testSimLeafNode           ) )
    //suite.addTest( unittest.makeSuite( testSimOctree             ) )
    suite.addTest( unittest.makeSuite( testSimRefractiveObject   ) )
    suite.addTest( unittest.makeSuite( testSimCamera             ) )
    suite.addTest( unittest.makeSuite( testSimLight              ) )
    suite.addTest( unittest.makeSuite( testSimEnv                ) )
    //suite.addTest( unittest.makeSuite( testTraceRectangle        ) )
    //suite.addTest( unittest.makeSuite( testTraceCylinder         ) )
    //suite.addTest( unittest.makeSuite( testTraceBitmapRectangle  ) )
    //suite.addTest( unittest.makeSuite( testTraceSurfaceRender    ) )

Currently failed tests:

    suite.addTest( unittest.makeSuite( testSimOctree             ) )
    suite.addTest( unittest.makeSuite( testTraceRectangle        ) )
    suite.addTest( unittest.makeSuite( testTraceCylinder         ) )
    suite.addTest( unittest.makeSuite( testTraceBitmapRectangle  ) )
    suite.addTest( unittest.makeSuite( testTraceSurfaceRender    ) )

They are all related to vtk (dump2vtk).

======================================================================
ERROR: testOctreeVTKINode (pivsim_test.testSimOctree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-modified/tests/pivsim_test.py", line 583, in testOctreeVTKINode
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (0,3) (600,3) 

======================================================================
ERROR: testoctreeVTKLNode (pivsim_test.testSimOctree)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-modified/tests/pivsim_test.py", line 595, in testoctreeVTKLNode
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (0,3) (1890,3) 

======================================================================
ERROR: testOutput (pivsim_test.testTraceRectangle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-modified/tests/pivsim_test.py", line 925, in testOutput
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (0,3) (48,3) 

======================================================================
ERROR: testOutput (pivsim_test.testTraceCylinder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-modified/tests/pivsim_test.py", line 976, in testOutput
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (15,3) (12,3) 

======================================================================
FAIL: testImagesMatch (pivsim_test.testTraceSurfaceRender)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-modified/tests/pivsim_test.py", line 1105, in testImagesMatch
    self.assertEqual(tm.hexdigest(),km.hexdigest())
AssertionError: '57cfefd1c6ac988f11faf064e7e3939f' != '784a96839ef2d92fefce8cea9f807733'



testImagesMatch (pivsim_test.testTraceBitmapRectangle) ... Segmentation fault (core dumped)

The segmentation fault is due to python path in the C extension lib/pivlib/pivsimc.c somehow does not include the path we need.
After line 1602, add:

PyRun_SimpleString("import sys;sys.path.append('/home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/spivet/pivlib')");
  mod     = PyImport_ImportModule("pivutil");
  if (mod==NULL){
        printf("%s\n","mod null!");
        PyErr_Print();
        fflush(stdout);}

PyErr_Print will report no module named pivutil if we do not add that path.
We will replace /home/xbao/.conda/envs/py27_new/lib/python2.7/site-packages/ to
ibpath = distutils.sysconfig.get_python_inc() later
After this correction the error turns into FAIL:

FAIL: testImagesMatch (pivsim_test.testTraceBitmapRectangle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-mkl/tests/pivsim_test.py", line 1047, in testImagesMatch
    self.assertEqual(tm.hexdigest(),km.hexdigest())
AssertionError: '9b874624537e026d7dc1394e283ad3eb' != '62e3ea406f87a374587082f0cfc5b02f'

The rest of the errors continue to show up even if we modified the code to use the updated api of vtk6 (and later version). We do not know the intermediate output, and the desperate debugging part can be really time-consuming. It is therefore a good idea to try older version of vtk instead.

Attempt to install vtk 5.10

clone the conda env py27_piv and uninstall vtk7 from conda source.

conda create --clone py27_new --name py27_bak
conda uninstall vtk

get vtk5.10 from gitlab
https://gitlab.kitware.com/vtk/vtk/-/archive/v5.10.1/vtk-v5.10.1.zip
8.3.1 is too new, will report lots of error like

CMake Error at CMake/vtkCompilerExtras.cmake:40 (if):   if given arguments:      "gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)    Copyright (C) 2018 Free Software Foundation, Inc.    This is free software" " see the source for copying conditions.  There is   NO    warranty" " not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR   PURPOSE." "VERSION_GREATER" "4.2.0" "AND" "BUILD_SHARED_LIBS" "AND"   "HAVE_GCC_VISIBILITY" "AND" "VTK_USE_GCC_VISIBILITY" "AND" "NOT" "MINGW"   "AND" "NOT" "CYGWIN"    Unknown arguments specified Call Stack (most recent call first):   CMakeLists.txt:73 (INCLUDE)

switch to gcc 4.8.5

module load gcc/8.3.1
module unload gcc/8.3.1

cmake(or edit CMakeList.txt):
https://stackoverflow.com/questions/28761702/getting-error-glintptr-has-not-been-declared-when-building-vtk-on-linux
https://vtk.org/Wiki/VTK/Configure_and_Build
https://unix.stackexchange.com/questions/306682/how-to-install-vtk-with-python-wrapper-on-red-hat-enterprise-linux-rhel
https://cmake.org/cmake/help/v3.0/module/FindPythonLibs.html
https://groups.google.com/forum/#!msg/vmtk-users/3jrKrW7qWZA/ejWDwuotOGAJ
http://vtk.1045678.n5.nabble.com/Compiling-VTK-Python-from-source-td5746733.html

cmake -DBUILD_SHARED_LIBS=ON -DBUILD_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DVTK_WRAP_PYTHON=ON -DCMAKE_C_FLAGS=-DGLX_GLXEXT_LEGACY -DCMAKE_CXX_FLAGS=-DGLX_GLXEXT_LEGACY -DCMAKE_INSTALL_PREFIX=/home/app/vtk5 -Wno-dev \
-DPYTHON_INCLUDE_DIR=/home/xbao/.conda/envs/py27_bak/include/python2.7 \
-DPYTHON_LIBRARY=/home/xbao/.conda/envs/py27_bak/lib/libpython2.7.so \
../vtk-v5.10.1

Then

make -j 56
make install

some error will show up related to __cxa_throw_bad_array_new_length, we need gcc 4.9 or newer :
https://stackoverflow.com/questions/29230777/program-linking-fails-when-using-custom-built-gcc

As root, install gcc 4.9.2 with gcc 4.8.5 [due-to-c11-errors, we cannot use newer gcc ] (https://stackoverflow.com/questions/41204632/unable-to-build-gcc-due-to-c11-errors)

https://gist.github.com/craigminihan/b23c06afd9073ec32e0c

sudo yum install libmpc-devel mpfr-devel gmp-devel
#in /usr/syssoft/gcc4.9
curl ftp://ftp.mirrorservice.org/sites/sourceware.org/pub/gcc/releases/gcc-4.9.2/gcc-4.9.2.tar.bz2 -O
tar xvfj gcc-4.9.2.tar.bz2
cd gcc-4.9.2
md build
../configure  --enable-languages=c,c++,fortran --prefix=/usr/local/gcc4.9.2 --disable-multilib
make -j 56
make install

Write a module file like what we did for gcc 8.3.1 in /etc/modulefiles/gcc/4.9.2

#%Module 1.0
#
#  gcc4.9.2 module built from source with system python3:
#
unsetenv COMP_WORDBREAKS;
prepend-path PATH {/usr/local/gcc4.9.2/bin};
prepend-path MANPATH {/usr/local/gcc4.9.2/share/man};
#append-path PERL5LIB {/opt/rh/devtoolset-8/root//usr/lib64/perl5/vendor_perl};
#append-path PERL5LIB {/opt/rh/devtoolset-8/root/usr/lib/perl5};
#append-path PERL5LIB {/opt/rh/devtoolset-8/root//usr/share/perl5/vendor_perl};
prepend-path LD_LIBRARY_PATH {/usr/local/gcc4.9.2/lib};
prepend-path LD_LIBRARY_PATH {/usr/local/gcc4.9.2/lib64};
#prepend-path LD_LIBRARY_PATH {/opt/rh/devtoolset-8/root/usr/lib/dyninst};
#prepend-path LD_LIBRARY_PATH {/opt/rh/devtoolset-8/root/usr/lib64/dyninst};
prepend-path LD_LIBRARY_PATH {/usr/local/gcc4.9.2/lib};
prepend-path LD_LIBRARY_PATH {/usr/local/gcc4.9.2/lib64};
#prepend-path PKG_CONFIG_PATH {/usr/local/gcc4.9.2/lib64/pkgconfig};
prepend-path INFOPATH {/usr/local/gcc4.9.2/share/info};
#prepend-path PYTHONPATH {/opt/rh/devtoolset-8/root/usr/lib/python2.7/site-packages};
#prepend-path PYTHONPATH {/opt/rh/devtoolset-8/root/usr/lib64/python2.7/site-packages};

Test:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc4.9.2/libexec/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../configure --enable-languages=c,c++,fortran --prefix=/usr/local/gcc4.9.2 --disable-multilib
Thread model: posix
gcc version 4.9.2 (GCC)

When cmake and make the same ABI error appears.(ABI1.3.9 is required, not present in libstdc++.so.6.0.19 )
Failed attempt
export CXXFLAGS='-D_GLIBCXX_USE_CXX11_ABI=0'
ln -s libstdc++.so.6.0.26 libstdc++.so.6 for /usr/lib64 or gcc4.9.2/lib64

Now install gcc5.5.0 similar to 4.9.2 before.
In its /usr/local/gcc5.5.0/lib64:

strings libstdc++.so.6|grep CXXABI
CXXABI_1.3
CXXABI_1.3.1
CXXABI_1.3.2
CXXABI_1.3.3
CXXABI_1.3.4
CXXABI_1.3.5
CXXABI_1.3.6
CXXABI_1.3.7
CXXABI_1.3.8
CXXABI_1.3.9
CXXABI_TM_1
CXXABI_FLOAT128
CXXABI_1.3
CXXABI_1.3.2
CXXABI_1.3.6
CXXABI_FLOAT128
CXXABI_1.3.9
CXXABI_1.3.1
CXXABI_1.3.5
CXXABI_1.3.8
CXXABI_1.3.4
CXXABI_TM_1
CXXABI_1.3.7
CXXABI_1.3.3

Now
Now in vtk-build dir, under py27_bak:
(Note that CMAKE_INSTALL_PREFIX should contain bin/python, not a path like /home/app/vtk5. You may need to add vtk path to path manually if the prefix path is not correct. http://blog.sina.com.cn/s/blog_62746020010125e6.html)

module load gcc/5.5.0
cmake -DBUILD_SHARED_LIBS=ON -DBUILD_TESTING=ON -DCMAKE_BUILD_TYPE=Release -DVTK_WRAP_PYTHON=ON \
-DCMAKE_C_FLAGS=-DGLX_GLXEXT_LEGACY -DCMAKE_CXX_FLAGS=-DGLX_GLXEXT_LEGACY \
-DCMAKE_INSTALL_PREFIX=/home/xbao/.conda/envs/py27_bak/ -Wno-dev \
-DPYTHON_INCLUDE_DIR=/home/xbao/.conda/envs/py27_bak/include/python2.7 \
-DPYTHON_LIBRARY=/home/xbao/.conda/envs/py27_bak/lib/libpython2.7.so \
../vtk-v5.10.1
make -j 112
make install

And it works!

#The last-line-output of make
[100%] Built target vtkpython_pyc

The cmake version is 3.14.6.
(Use sudo alternatives --config cmake to switch!)
Test in python

$python
Python 2.7.15 | packaged by conda-forge | (default, Mar  5 2020, 14:56:06)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import vtk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xbao/.conda/envs/py27_bak/lib/python2.7/site-packages/VTK-5.10.1-py2.7.egg/vtk/__init__.py", line 41, in <module>
    from vtkCommonPython import *
ImportError: libvtkCommonPythonD.so.5.10: cannot open shared object file: No such file or directory

We need to set LD_LIBRARY_PATH to solve this issue. We want vtk5 just for this conda env, follow this link, we can find the env dir, and

cd /home/xbao/.conda/envs/py27_bak/
touch ./etc/conda/activate.d/env_vars.sh
touch ./etc/conda/deactivate.d/env_vars.sh

for activate:

#!/bin/bash
export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/home/xbao/.conda/envs/py27_bak/lib/vtk-5.10/:${LD_LIBRARY_PATH}

for deactivate:

#!/bin/bash
export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}
unset OLD_LD_LIBRARY_PATH

Open a new terminal and enter py_bak env, and verify the vtk can be imported.

it seems that does not work jupyter notebook at least for python2
https://github.com/Anaconda-Platform/nb_conda_kernels/issues/54
https://stackoverflow.com/questions/37890898/how-to-set-env-variable-in-jupyter-notebook
A temporary fix is to add them in jupyter cells

import os
os.environ['LD_LIBRARY_PATH']=....

or
https://github.com/jupyter/notebook/issues/3704

Or the best way: modify the kernel file:

jupyter kernelspec list
cd /home/xbao/.local/share/jupyter/kernels/py27_bak
vi kernel.json

Example:
{
"display_name": "py27_bak",
"language": "python",
"argv": [
"/home/xbao/.conda/envs/py27_bak/bin/python",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"env": {"LD_LIBRARY_PATH":"/home/xbao/.conda/envs/py27_bak/lib/vtk-5.10/:${LD_LIBRARY_PATH}"}
}

restart the kernel and import vtk now works!

Now use the original pivsim.py (except in line 510 != to is not), build, install spivet and rerun the pivsim tests, we can pass more:

testOctreeVTKINode (pivsim_test.testSimOctree) ... I am here: test-output/octree
ok
testoctreeVTKLNode (pivsim_test.testSimOctree) ... ok
testCreate (pivsim_test.testSimRefractiveObject) ... ok
testSetOrientation (pivsim_test.testSimRefractiveObject) ... ok
testSetPosition (pivsim_test.testSimRefractiveObject) ... ok
testCenterRayHeading (pivsim_test.testSimCamera) ... ok
testLeftRayHeading (pivsim_test.testSimCamera) ... ok
testRayCount (pivsim_test.testSimCamera) ... ok
testRightRayHeading (pivsim_test.testSimCamera) ... ok
testSource (pivsim_test.testSimCamera) ... ok
testIRayHeading (pivsim_test.testSimLight) ... ok
testIRaySource (pivsim_test.testSimLight) ... ok
testLocalIRayHeading (pivsim_test.testSimLight) ... ok
testLocalIRaySource (pivsim_test.testSimLight) ... ok
testExitingRefraction1 (pivsim_test.testSimEnv) ... ok
testExitingRefraction2 (pivsim_test.testSimEnv) ... ok
testNormalIncidence (pivsim_test.testSimEnv) ... ok
testRefraction1 (pivsim_test.testSimEnv) ... ok
testRefraction2 (pivsim_test.testSimEnv) ... ok
testTotalInternalReflection (pivsim_test.testSimEnv) ... ok
testOutput (pivsim_test.testTraceRectangle) ... ok
testOutput (pivsim_test.testTraceCylinder) ... ERROR
testImagesMatch (pivsim_test.testTraceBitmapRectangle) ... FAIL
testImagesMatch (pivsim_test.testTraceSurfaceRender) ... FAIL

======================================================================
ERROR: testOutput (pivsim_test.testTraceCylinder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-mkl/tests/pivsim_test.py", line 976, in testOutput
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (15,3) (12,3)

======================================================================
FAIL: testImagesMatch (pivsim_test.testTraceBitmapRectangle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-mkl/tests/pivsim_test.py", line 1047, in testImagesMatch
    self.assertEqual(tm.hexdigest(),km.hexdigest())
AssertionError: '9b874624537e026d7dc1394e283ad3eb' != '62e3ea406f87a374587082f0cfc5b02f'

======================================================================
FAIL: testImagesMatch (pivsim_test.testTraceSurfaceRender)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-mkl/tests/pivsim_test.py", line 1105, in testImagesMatch
    self.assertEqual(tm.hexdigest(),km.hexdigest())
AssertionError: '57cfefd1c6ac988f11faf064e7e3939f' != '784a96839ef2d92fefce8cea9f807733'

----------------------------------------------------------------------
Ran 58 tests in 1.844s

FAILED (failures=2, errors=1)

Try use vtk5.2 instead, failed to compile due to too many errors.(under different cmake, gcc, mpicc/gcc, etc.)

Go back to tests

Now go back to tests/test-output, actually the output figures for testImagesMatch (surface-render.png surface-img-render.png) look exactly the same as the benchmark results. We can verify this using
https://stackoverflow.com/questions/5132749/diff-an-image-using-imagemagick

sudo yum install -y imagemagick
compare surface-img-render.png ../data/surface-img-render-known.png -compose src diff.png
compare surface-render.png ../data/surface-render-known.png -compose src diff2.png

The two diff.png are transparent.
We can read the images as binary files using other tools:
cmp
vim&xxd

cmp -l surface-render.png ../data/surface-render-known.png
 37  53 313
 44 143 142
 60 314  14
 61 104   0
 62 254   0
 63 201   0
 64 304 377
 65 202 377
 66 121 142
 67   3  42
 68 107 326
 69  15 100
 70  34 142
 71  65   1
 72 160   0
 73 324   0
 74 300   0
 75 121 377
 76   3 377
 77 207 242
 78 212 272
 81  50   0
 82 376   0
 84 301 377
 85  13 377
 86 233 242
 87 347 272
 88 113 201
 93 111 377
 94 105 377
 95 116 242
 96 104 272
 97 256 201
 98 102   0
 99 140   0
100 202   0
cmp: EOF on surface-render.png
vim -b surface-render.png
#in vim
:%!xxd
#you will see
0000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
0000010: 0000 0050 0000 0014 0800 0000 003f b3e5  ...P.........?..
0000020: 0b00 0000 2b49 4441 5478 9c63 6420 1230  ....+IDATx.cd .0
0000030: b240 0033 0b3a 0389 c5c2 c2cc 44ac 81c4  .@.3.:......D...
0000040: 8251 0347 0d1c 3570 d4c0 5103 878a 8100  .Q.G..5p..Q.....
0000050: 28fe 00c1 0b9b e74b 0000 0000 4945 4e44  (......K....IEND
0000060: ae42 6082                                .B`.
vim -b ../data/surface-render-known.png
#in vim
:%!xxd
#you will see
0000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
0000010: 0000 0050 0000 0014 0800 0000 003f b3e5  ...P.........?..
0000020: 0b00 0000 cb49 4441 5478 9c62 6420 1230  .....IDATx.bd .0
0000030: b240 0033 0b3a 0389 c5c2 c20c 0000 00ff  .@.3.:..........
0000040: ff62 22d6 4062 0100 0000 ffff a2ba 8100  .b".@b..........
0000050: 0000 00ff ffa2 ba81 0000 0000 ffff a2ba  ................
0000060: 8100 0000 00ff ffa2 ba81 0000 0000 ffff  ................
0000070: a2ba 8100 0000 00ff ffa2 ba81 0000 0000  ................
0000080: ffff a2ba 8100 0000 00ff ffa2 ba81 0000  ................
0000090: 0000 ffff a2ba 8100 0000 00ff ffa2 ba81  ................
00000a0: 0000 0000 ffff a2ba 8100 0000 00ff ffa2  ................
00000b0: ba81 0000 0000 ffff a2ba 8100 0000 00ff  ................
00000c0: ffa2 ba81 0000 0000 ffff a2ba 8100 0000  ................
00000d0: 00ff ffa2 ba81 0000 0000 ffff a2ba 8100  ................
00000e0: 0000 00ff ffa2 ba81 0000 0000 ffff 0300  ................
00000f0: 28fe 00c1 10ff 986d 0000 0000 4945 4e44  (......m....IEND
0000100: ae42 6082                                .B`.

However, it is convenient to convert output and known to another format and compare:

convert surface-render.png surface-render.jpg
convert ../data/surface-render-known.png surface-render-known.jpg
cmp surface-render-known.jpg surface-render.jpg
convert surface-img-render.png surface-img-render.jpg
convert ../data/surface-img-render-known.png surface-img-render-known.jpg
cmp surface-img-render.jpg surface-img-render-known.jpg

No output, means they are exactly the same in terms of data. We have reason to believe this is purely because we replaced PIL with pillow.

The last error

ERROR: testOutput (pivsim_test.testTraceCylinder)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/xbao/SPIVET-mkl/tests/pivsim_test.py", line 976, in testOutput
    d = abs(tpts -kpts) < self.eps
ValueError: operands could not be broadcast together with shapes (15,3) (12,3)

However, we can pass the lnode test in testTraceCylinder, means the cylinder has been correctly created.
It is hard to debug using these numbers in vtk file only, we need to understand the physical image.
Load the lnode and ray vtk files to paraview:

image.png

Use Wireframe for lnode and surface for ray, we can see that the rays are not refracted at the second time:
image.png

Set breakpoints and we can locate this issue.
Line 960 in tests/pivsim_test.py, when we call se.image(), spivet starts to trace the rays and form an image at the camera.
This relates to line 848 in lib/spivet/pivlib/pivsim.py, the method image of class SimEnv. We then notice line 872 pivsimc.SimEnv_image(self,rays,chnli,maxrcnt), this calls the function SimEnv_image in the C extension lib/spivet/pivlib/pivsimc.c.
Line 2495 in pivsimc.c finds the intersection between the ray and the object of leaf node, i.e. the cylinder in this case. SimRefractiveObject_intersect at line 2068 find the intersection for all the objects in the scene, and also determine the next intersection using the nearest distance t between the last ray source point and intersections for surfaces of all objects. Line 2101 will calculate the function that finds intersections for a particular type of surface, and we care about SimCylindricalSurf_intersect at line 1118 for cylindrical surface. It turns out for the second refraction,

ptc=0,lsource=-0.00,-2.12,2.12,lhead=0.00,0.98,-0.18,lray->points=-0.00,-2.12,2.12
phv=0.98,-0.18,phs=1.000000
i=0,ptv[i]=8.88178e-16,phv[0]=9.84211e-01
ic=-2.121320,plv[0]=-2.121320,lhead[1]=0.984211,t=0.000000
i=1,ptv[i]=4.92660e+00,phv[0]=9.84211e-01
ic=2.727493,plv[0]=-2.121320,lhead[1]=0.984211,t=4.926600
this 3d t=0.00000
this 3d t=4.92660

so the zero distance t leads to the issue. We can either add

    if (t < PIVSIMC_DEPS) {
      continue;
    }

at line 1189, or better to modify line 1179-1181 from:

for ( i = 0; i < 2; i++ ) {
    if ( ptv[i] < 0. )
      continue;

to

  for ( i = 0; i < 2; i++ ) {
    if ( ptv[i] < PIVSIMC_DEPS )
      continue;

Now rebuild and install, run the test:

SimCylindricalSurf_intersect!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ptc=0,lsource=-0.00,-2.12,2.12,lhead=0.00,0.98,-0.18,lray->points=-0.00,-2.12,2.12
phv=0.98,-0.18,phs=1.000000
i=1,ptv[i]=4.92660e+00,phv[0]=9.84211e-01
ic=2.727493,plv[0]=-2.121320,lhead[1]=0.984211,t=4.926600
this 3d t=4.92660
......
testOutput (pivsim_test.testTraceCylinder) ... STARTING: SimEnv.image
 | CAM 0
 | Initializing rays.
 | Tracing ...
 | EXITING: SimEnv.image
ok
image.png

Comparison:


image.png

Success!


To get intermediate carriage (frames for loop_plane_worker):

In steps.py:(version saved as crg_steps.py)

  • in loop_plane.excute:skip all the lines after tpd = carriage['pivdata']
  • in _loop_epoch_worker: comment the redirect to stdout/stderr, save plane carriage as gcrg, return gcrg in the end
  • in 'loop_epoch': add temporary carriage external port self.m_crg in __init__, unpack erslt[2] to m_crg in execute

To rotate existing calibration upside down:

At the end of dewarpimg:

simgchnls[cam][frm][c] = timg
#Xiyuan:rotate cal
if (self.m_config.has_key('rotate_cal')):
                                if (self.m_config['rotate_cal']):
                                        simgchnls[cam][frm][c] = rot90(timg,2)

To make mkflatplotmm-anim.py work:

add from flotrace import mptassmblr in flolib/__init__.py

Problems during generating calibration

pivlib/pivpgcal.py

Float cannot be array index in python 2.7

Need to force related variable as int in both pivlib/pivpgcal.py and pivlib/pivutil.py

Sometimes cross correlation method finds no intersection

(rv empty)
add EMPTY rv case

also added some debug images


Speed up SPIVET

parallel in ofcomp failed due to GIL(only 1 CPU is runing)

failed version saved as fail_parallel_pivof.py

parallel epoch, like the NetWorkSpace

We don't want to use NWS because it was last updated in 2007.
Basically multiprocessing is good enough. Both 'apply_async' +Pool and 'Process' are implemented. The call_back feature somehow will fail silently.
Expected processing time for 43 planes x 56 epoch: <2 hours
Till this point, no external module is required.

using openpiv-python-gpu to replace the PIV part of ofcomp.

Specifically, SPIVET use hybrid PIV/PTV method in ofcomp to get first-order result. This result will be smoothed, filtered (FFT) and dewarped, then back to ofcomp, iterate until converge, i.e. refine the result, to overcome the degradation due to TLCs (lack of valid tracers, disappearing and emerging of tracers).
It turns out particle tracking velocimetry is not that useful, especially when no reflective lines are present near the center. We can then just use PIV+refinement.
For input with 1280x960 pixels, CPU PIV in SPIVET needs 9~35 seconds. The WiDIM method Cython version of OpenPIV needs just slightly less time, but GPU version needs <0.4s in general. To implement this to SPIVET, we need to align the mesh, consider w/o overlap, and the minimum window size.
The ofcomp_gpu has been added to pivof.py. Input images are cropped(less info than spivet ofcomp), but results are usually better.

Current best settings for WiDIM:

trust_1st_iter = 0,validation_iter = 3
min_window_size = 32, overlap_ratio = 0.5, coarse_factor = 1, nb_iter_max = 2 is comparable to
min_window_size = 16, overlap_ratio = 0, coarse_factor = 2, nb_iter_max = 3, and the latter is slightly slower.

Weird memory error

The GPU WiDIM works fine in jupyter or terminal, but not inside ofcomp_gpu. frame_b cannot be loaded correctly while being sliced into interrogation windows, in . The wrokaround is:

  • pass the numpy array to the correlation class, instaed of gpuarray
  • use frame_a as carriage, i.e. transfer data from frame_b to the address of frame_a.
    In `class CorrelationFunction.init':
#Have to avoid sending frameb to GPU directly, otherwise the index is wrong in spivet refinement process
        frame_a=frame_a.astype(np.float32)
        d_frame_a = gpuarray.to_gpu(frame_a.astype(np.float32))
        frame_b=frame_b.astype(np.float32)
        frame_a[:]=frame_b
        d_frame_b=gpuarray.to_gpu(frame_a.astype(np.float32))

Failed Validation

The 3rd validation failed for EFRMLST-E21-F0_1 of PLUME-3_51V-B25_2-11032008. Added logic to gpu_process.pyx to skip last failed validation.

multiprocess on top of CPU+GPU

Unfortunately, cuda does not support multiprocessing initiated by fork, the spawn method required is available after python 3.4. Before migrating to python 3, we have to manually start multiple python scripts, or use subprocess.

Details

I first add wrapper and conditions in steps.py and precipe.py, to determine which scheme will be used. Conditions are:

  • In conf_loop_epoch:
    1. multiprocess (MP in steps.py). Don't use parallel as keywords so we can leave the NetWorkSpace part untouched.
    2. GPU. If True for both GPU and MP, multiple subprocesses will be initiated.
  • In pivdict:
    gpu: If True, SPIVET will try to use OpenPIV-GPU.

So there are several possibilities:

  • no MP: call the original epoch worker.
    • if 'gpu': ofcomp_gpu will be tried instead of ofcomp.
  • MP & no GPU: The CPU parallel version using ofcomp will be used. gpu should be changed to False, unless in future python 3 version.
  • MP & GPU: Multiple subprocesses using ofcomp_gpu instead of ofcomp will be started. If gpu is False, ofcomp will be adopted, which is not recommended and not necessary.

To implement the GPU subprocess, an external python script epoch_worker.py is needed in the same folder of precipe.py, which will send the folder path to SPIVET. pickle is used to save input parameters for each subprocess(stepsin, and crg%i, %i is cnt, the epoch number), epoch_worker.py will load this files later.

The server has a RTX 2080Ti with 11GB memory. Each subprocess uses 215~1200MB memory(usualy 389 MB) for umich data(1280px X 960px after calibration). If we run too many gpu processes at the same time, they may either fail at the start or in the middle:


image.png

image.png
did not match C++ signature:
    set_dst_device(pycuda::memcpy_2d {lvalue}, unsigned long long)
nMany
    cufftCheckStatus(status)
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
    raise e
cufftAllocFailed
r
= self.allocator(self.size * self.dtype.itemsize)
MemoryError: cuMemAlloc failed: out of memory


/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
Traceback (most recent call last):
  File "/home/xbao/LAB_test/PLUME-3_51V-B25_2-11032008/GPU/epoch_worker.py", line 99, in _loop_epoch_worker_gpu
    step.execute()
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/spivet/steps.py", line 2338, in execute
    step.execute()
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/spivet/steps.py", line 1508, in execute
    pivdict)
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/spivet/pivlib/pivof.py", line 210, in ofcomp_gpu
    validation_iter = 3)
  File "openpiv/src/gpu_process.pyx", line 1113, in openpiv.gpu_process.WiDIM
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/pycuda/gpuarray.py", line 313, in copy
    _memcpy_discontig(new, self)
  File "/home/xbao/.conda/envs/gpupiv1.0/lib/python2.7/site-packages/pycuda/gpuarray.py", line 1333, in _memcpy_discontig
    copy.set_dst_device(dst.gpudata)
ArgumentError: Python argument types in
    Memcpy2D.set_dst_device(Memcpy2D, NoneType)
did not match C++ signature:
    set_dst_device(pycuda::memcpy_2d {lvalue}, unsigned long long)

To avoid these errors, we need to reduce the number of concurrent subprocess. They need to take turns to start and enter memory, and keep enough empty memory for the peak memory usage. Here are time log for 2 parameter sets:

wait_time = int(cnt/26.0)*600+cnt
                time.sleep(wait_time)

                interval = 10
                mem_need = 1000#Actually 389MB to 1200MB
35 process in GPU, slower down all because of memory issue
53 epoch x 43 planes(18,25refine 3iter):10087s
-------------------------------------------------------------------
wait_time = int(cnt/20.0)*1000+cnt
                time.sleep(wait_time)

                interval = 10
                mem_need = 4000#Actually 389MB to 1200MB
~20 process in GPU, faster!
53 epoch x 43 planes(18,25refine 3iter):5774s
-------------------------------------------------------------------
Sequence, non-parallel:(ideal speed)
(500+280)s*53=41340
If slowed down due to 1 CPU per process(25% speed for each PIV)->165000s (28.5xspeed)
A sole gpu piv process usually takes 0.35s, but here much slower (~1.6s)
-------------------------------------------------------------------
The pure CPU version took 7390 s (with 56 physical cores), 5956 `ofcomp`
  • To run the program smoothly, we need to add "insurance", since OpenPIV-GPU may still fail when enough memory space is available. We thus try gpupiv first, then OpenPIV-Cython(to get consistent result). But the Cython version still fails sometimes because of data issue, so we finally fall back to ofcomp is necessary. With the optimized settings(wait_time = int(cnt/20.0)*1000+cnt), only 5% PIV used ofcomp in the test.
`ofcomp_gpu`:6126 (openpiv.gpu_process.WiDIM)
`ofcomp`:326
`ofcomp_Cython`:1360 (openpiv.process.WiDIM)
  • Further tests show that about 1800s is required for the first 20 epochs (1 batch) using MP+GPU. So 53 epochs->3 batches-> at least 5400s is needed

The possibly best practice (for umich data), run time 5541s:

try:
                wait_time = int(cnt/20.0)*600+cnt
                time.sleep(wait_time)

                interval = 10
                mem_need = 3000#Actually 389MB to 1200MB
                #wait until gpu memory is available
                while True:

                        wait_gpu(interval, mem_need)
                        seed(cnt)
                        for i in range(cnt):
                                #wait_time = 10*random()
                                wait_time =2* cnt
                                wait_gpu(wait_time, 3500)
                        print "start cnt:",cnt
                        try:
                                flag = _loop_epoch_worker_gpu(crg, steps, cnt,  obpath)
                                print "epoch finished, flag:",str(flag)
                                break
                        except:
                                print "except here!"
                                continue
                print flag

keep 2~3G gpu memory empty:

image.png

ofcomp_gpu:6204 Cython:40 ofcomp:22 (FAILED OPENPIV).

The key is try to stagger the subprocesses. A time lag exists between the start of subprocess and ofcomp_gpu.

56208 s in total was used for refinement. Divided by the parallel speed up ratio 20, basically half of the total time was doing refinement, or thin plate -spline interpolation.

A remaining problem

Keyboard Interruption cannot work on the subprocess. They need to be killed manually.

Plan:implement optical flow to refine PIV +image warping

speed up image warping

OPENBLAS setting

It turns out functions like np.linalg.inv uses multithreading via OPENBLAS. In my case, it uses all 112 threads and is only 50% of the speed with one thread.

To turn that off, add export OPENBLAS_NUM_THREADS=1 to .bashrc permanently.

最后编辑于
?著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,100评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,308评论 3 388
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,718评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,275评论 1 287
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,376评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,454评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,464评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,248评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,686评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,974评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,150评论 1 342
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,817评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,484评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,140评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,374评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,012评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,041评论 2 351