All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Added
dpctl.program.SyclKernel.max_sub_group_size
property #1208.
- Removed
dpctl.select_host_device
,dpctl.has_host_device
,dpctl.SyclDevice.is_host
, anddpctl.SyclDevice.has_aspect_host
since support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1208.
- Implemented
dpctl.tensor.linspace
function from array-API #875. - Implemented
dpctl.tensor.eye
function from array-API #896. - Implemented
dpctl.tensor.tril
anddpctl.tensor.triu
functions from array-API #910. - Added data type objects to
dpctl.tensor
namespace,finfo
,iinfo
,can_cast
, andresult_type
functions #913. - Implemented
dpctl.tensor.meshgrid
creation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flags
property #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabled
context manager for targeted trace collection #903. - Added support for
stream
keyword in__dlpack__
method, enabling support for sendingusm_ndarray
using mpi4py #906. dpctl.tensor.asarray
can now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"
header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernel
anddpctl.program.SyclProgram
. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizes
property to retrieve supported sizes of sub-group by the device #985.
- Improved queue compatibility testing in
dpctl.tensor
's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directory
on Windows to ensure thatDPCTLSyclInterface
library can be found #918. - Refactored
dpctl.tensor
's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarray
class implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbol
encounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctl
to use pybind11 2.10.1 #967. - Extended
dpctl.tensor.full
to accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshape
function #915. - Fixed unexpected
UnboundLocalError
exception in #922. - Fixed bugs in
dpctl.tensor.arange
in #945. - Fixed issue with type inferencing in
dpctl.tensor.asarray
in #949. - Added missing docstrings for
dpctl.SyclDevice
properties #964.
-
Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__
, implementaion ofasarray
,dpctl.tensor.copy
functions. -
Implemented dedicated copying kernel for
dpctl.tensor.reshape
function #810, added support forcopy
keyword #807. -
Implemented dedicated kernel to copy with casting from
numpy.ndarray
intodpctl.tensor.usm_ndarray
#817. -
Implemented
dpctl.tensor.permute_dims
function from array-API #787. -
Implemented
dpctl.tensor.expand_dims
function from array-API #788. -
Implemented
dpctl.tensor.squeeze
function from array-API #790. -
Implemented
dpctl.tensor.broadcast_to
function from array-API #791. -
Implemented
dpctl.tensor.broadcast_arrays
function from array-API #798. -
Implemented
dpctl.tensor.flip
function from array-API #801. -
Implemented
dpctl.tensor.usm_ndarray.mT
property per array-API #805. -
Implemented
dpctl.tensor.roll
function from array-API #809. -
Implemented
dpctl.tensor.arange
function from array-API #814. -
Implemented
dpctl.tensor.zeros
function from array-API #816. -
Implemented
dpctl.tensor.zeros
function from array-API #816. -
Implemented
dpctl.tensor.ones
,dpctl.tensor.full
,dpctl.tensor.empty_like
,dpctl.tensor.zeros_like
,dpctl.tensor.ones_like
,dpctl.tensor.full_like
functions from array-API #822. -
Implemented
DPCTLQueue_Memset
function in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*
classes #815. -
Implemented
dpctl.utils.get_coerced_usm_type
to deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. -
Added
dpctl.SyclDevice.profiling_timer_resolution
property #825. -
Added
dpctl.SyclDevice.platform
anddpctl.SyclPlatform.default_context
properties #827. -
Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarray
container applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. -
Wrote manual page about working with
dpctl.SyclQueue
#829. -
Added cmake scripts to dpctl package layout and a way to query the location #853.
-
Implemented
dpctl.tensor.concat
function from array-API #867. -
Implemented
dpctl.tensor.stack
function from array-API #872.
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_alive
utility indpctl4pybind11.hpp
header #820. The utility usessycl::handler::host_task
to keep given Python arguments alive until eacsycl::event
from the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEvent
to avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit
#823. - Fixed docstring for
dpctl.SyclTimer
#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevice
fromValueError
todpctl.SyclDeviceCreationError
#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgram
from using deprecatedsycl::program
tosycl::kernel_bundle
#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.h
to version 0.7 #847.
- Fixed
dpctl.lsplatform()
to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)
anddpctl.SyclDevice.print_device_info
#866. - Fixed issue with slicing reported in gh-870 in #871.
- Properties added to MemoryUSM* objects. #647
- Added
dpctl.tensor.asarray
#646 - Implemented DLPack support for usm_ndarray #682
- Exported
dpctl.tensor.Device
class #708 #718 - Added testing of examples in CI #722
- Added user manuals to dpctl documentation #712 #773
- Folder dpctl-capi/ renamed to libsyclinterface/ in sources and documentation. #666 #768
- Added workflow to publish rendered documentation on PRs #673 #753 #726
- Synchronization functions and USM allocation functions release GIL #736 #766
dpctl.SyclEvent
destructor is made non-blocking #751
- Fixed for issue in code of
dpctl.tensor.usm_ndarray.T
#653 - Fixed issue with
dpctl.tensor.reshape
's affect on contiguity flags of usm_ndarray #695 - Fixed handling of empty list by
dpctl.tensor.asarray
#694 - Fixed type inference with array of empty arrays in
dpctl.tensor.asarray
#697 - Fixed issue gh-698 with
dpctl.tensr.asarray
#709 - Fixed performance of item assignment from numpy array #724
DPCTLDeviceMgr_GetNumDevices
should not operate on rejected devices #737- Fixed issue gh-729 for
dpctl.tensor.reshape
applied to 0-element usm_ndarray #756 - Fixed issue gh-728 with
dpctl.tensor.astype
#757 - Fixed type in memory overlapping test #770
- Fixed issue with operator.pos for
dpctl.tensor.usm_ndarray
#783 - Only call
PyThread_Ensure
from host_task if the main-thread interpreter is initialized and not finalizing #776 #778 #721
Full Changelog: https://github.com/IntelPython/dpctl/compare/0.11.4...0.12.0
- Fix tests for nested context factories expecting for integration environment by @PokhodenkoSA in IntelPython#705
- Set the last byte in allocated char array to zero [cherry picked from #650] #699
- Extending
dpctl.device_context
with nested contexts #678
- Fixed issue #649 about incorrect behavior of
.T
method on sliced arrays #653
- Replaced uses of clang compiler with icx executable #665
- Use Python 3.9 in public CI #599
- Add a new C API utility function (
DPCTLDeviceMgr_GetDeviceInfoStr
) to return the device info as a C string object #620 - New Github workflow to build dpclt with nightly Intel llvm/sycl + drivers #621
- Always raise SubDeviceCreationError even when sub-device counts are zero #622
- Updated OpenCL interoprability code to fix build with Intel llvm/sycl bundle #625
- Enabled use of default platform context extension in SYCL compilers that implement this extension #627
- Implemented
dpctl.utils.get_execution_queue(queue_seq)
utility to help implementing "compute-follows data" convention for offload target #632 #631
- Replaced
host_device
device type withhost
in tests #616 - Rework the logic in
dpctl.memory
'scopy_from_device
method to work correctly withhost
device #618 - Use
dpctl.device_type.host
instead ofdpctl.device_type.host_device
#626 - Reinstate deprecated
sycl::program
and that was conditionally removed from open source DPC++ toolchain #633 - Use
LoadLibraryExA
instead ofLoadLibraryA
to mitigate a possible DLL injection issue when we load the Level zero DLL on windows #636 - Github coverage workflow is changed to use oneAPI 2021.3 instead of latest to work around broken profiling instrumentation in DPC++ 2021.4 #614
- Update build dependencies for NumPy #641
- Use "readelf" on SYCL's
pi_level_zero
library to find out and use the exact name ofze_loader.so
in SyclInterface library #617
- Removed use of DPC++ features deprecated in 2021.4 and open source Intel llvm/sycl compiler #603
- Suppress errant CMake log #610
- Fixes to compile dpctl using Intel llvm/sycl compiler #603
- Fix for the hang is to avoid passing
nullptr
argument tosycl::queue::prefetch
#612 - Fixed the logic to return device count #623
- Enabled building of C extensions with dpctl by including header defining
bool
type for C compilers #604
- Added methods bool, float, int, index, and complex to usm_ndarray #578
- Added data-API required special methods to usm_ndarray class, as well as to_numpy/from_numpy, astype, reshape functions #586
- Added methods to query dpctl.SyclDevice for size of global/local memory #589
- Added tests for constructors with invalid capsules #577
- Improved test coverage of
dpctl.SyclQueue
implementation #574 - Added a test to exercise API exported function (get_event_ref). #570
- Expanded tests in test_sycl_context to improve coverage #571
- Tweaks to test_sycl_event to improve coverage #567
- Improved coverage of dpctl.init file and other service functions #563
- Added test for repr and test for default argument to constructor #565
- Added some tests to involve capsule #564
- Added workflow for Public CI on Windows #534
- DPCTLQueue_Memcpy, _Prefetch, _Memadvise become asynchronous #557
- Added device aspect selector,
dpctl.select_device_with_aspects
#558 - Added test based on example from #583
- Parametrized tests for executing OpenCL kernels compiled from source in types of arguments #581
- Temporary disabled self-hosted CI jobs runner #559
- Changed static method
SyclQueue._create_from_context_and_device
#579 - Transitioned all Python API to use pytest over unittest, improved coverage in dpctl/memory #575
- Changed
dpctl.SyclEvent.profiling_info_submit
from method to a property #573 - Simplified arg parsing in SyclDevice constructor #572
- Used tag with alignment attribute set in README #562
- Moved sycl timer into dpctl.SyclTimer #555
- Used clang-format off, clang-format on to avoid include reordering in pybind11 example #588
- Implemented a workaround for running conda-build using Klocwork #566
- Separated pipelines for Linux and Windows #582
- Fixed inconsistency in
__sycl_usm_array_interface__
ofusm_ndarray
instance #584 - Fixed memory leak: Capsule deleters now free resources for renamed capsules too #568
- Fixed version test to allow for semantic versioning #569
- Improved coverage of _types.pxi #556
- Fixed
UnboundLocalError
when default queue could not be created #554
- Improvements to logic for working with custom DPC++ toolchain #481
- Add SyclContext unit test cases #488
- Consolidate configurations of tools that support PEP 518 into pyproject.toml #486
- Added C-API hash function, used them in Python interface #491
- Add missing extra checks to ensure unwrapped pointer is not Null
- Add error messages to L0 program creation routine
- Improve test coverage for dpctl_sycl_queue_interface #492
- Use pytest.warns in test_lsplatform3 #495
- Added test class to test DRef=nullptr case #496
- Extend parameterized test in test_sycl_queue_interface #497
- Use Memcpy, memadvise in tests
- Expanded types tests by TestQueueSubmitRange
- Added a test that retrieved DPCPP compiled kernel and submits them via DPCTLQueue_SubmitRange #499 , DPCTLEvent_GetCommandExecutionStatus #516, , DPCTLEvent_GetWaitList #510 functions
- Propagate compile flags #512
- Add conda package CI pipeline on GitHub Actions #515
- Run tests on GPU #518
- Add 3 wrapper func for event::get_profiling_info #519
- Changes to build_backend.py to enable sycl-compiler-prefix on Windows
- dtype keyword of usm_ndarray now supports np.double and other types #526
- Implemented DPCTLQueue_SubmitBarrier, DPCTLQueue_SubmitBarrierForEvents, SyclQueue.submit_barrier #524
- Added C-API DPCTLQueue_HasEnableProfiling
- Added Python API SyclQueue.has_enable_profiling
- Use public for data owning class definitions
- Queue has enable profiling #531
- Use public for data owning class definitions #533
- Added logic to verify that all bits of property integer were recognized and used #494
- Added support for some properties/methods of underluing device
- A test for properties, method of q mirroring that of device
- Conda build scripts should build wheels in the same setup invocation as install #538
- Added install_requires keyword to setup call
- Added requirements.txt files in dpctl/ and in dpctl/docs #540
- Improved C-API for dpctl Cython classes, added example of using them in Pybind11 extension. #550
- dpctl.SyclEvent acquired ability to get command status and get profiling information. #553
- Moved DPCLSyclInterface library from MANIFEST.in #482
- Refactored tests
- Use dpcpp compiler package for Linux #514
- Update conda-package.yml
- Static methods _init_helper made into functions and removed from PXD files #532
- Remove imports from future #485
- Fix sub devices #479
- Fix addressof_ref function in
SyclContext
#488 - Follow
DPCTLDevice_CreateFromSelector
which passes the check #487 - Fix a typo in the pytest configuration #490
- Fixed dbg_build.sh script for Linux to use L0
- Reuse IntelSycl_LIBRARY_DIR variable in cmake
- CXX, dpcpp used on Windows too
- Update conda-recipe/bld.bat
- Change to SyclQueue.repr to reflect properties #531
- Static methods
_init_helper
made into functions and removed from PXD files #532 - Fixed typo in pip installation instruction #536
- Fixed dpctl_config.h, added dpctl_service.h, .cpp #539
- Fixed
__sycl_usm_array_interface__
output for 0d arrays #547
- Implemented support for constructing MemoryUSM* from object with sycl_usm_array_interface when array-info is not contiguous #400
- Print the backend as part of SyclDevice.print_device_info function #409
- Added dpctl/tensor/_usmarray submodule #427
- Added arg checking to functions in dpctl_sycl_usm_interface.cpp #430
- A static method of _Memory to create from external allocation #430
- Added usm_ndarray accessors #435
- Added Device class representing Data-API notion of device #440
- Added free Python function as_usm_memory(obj) #443 and associated unit tests #449
- Dependency for numpy 1.17 #445
- Add a flag to make doxygen HTML generation optional #450
- Added a feature to get the filter string for a device from Python using the new dpctl.SyclDevice.get_filter_string method. Also added the corresponding DPCTLDeviceMgr_GetPositionInDevices(DRef, device_mask) C API function #453
- New options to setup.py to specify which dpcpp compiler to use, if L0 program creation is to be supported, and to generate code coverage #426
- Github action to check Python code quality #422
- Github action to auto-publish Sphinx docs for master #446
- Github action to generate coverage report and publish to coveralls.io #459
- Rename dpctl.dptensor to dpctl.tensor #407
- Changed repr for Memory objects #442
- Used dpctl.SyclQueue instead of manager and get current queue in tests for SyclProgram #448
- Issue #189 dpctl.memory.MemoryUSMShared(np.int64(16)) should work #392
- Use size_t instead of Py_ssize_t to fit device USM pointer #405
- Various code quality issues identified by flake8 (#417, #419, #420, #422)
- Fixed issues in slicing and array construction #441
- Fixed an issue #447 where dpctl.get_devices does not return devices in the same order as sycl::device::get_devices #451
- L0 program creation support on Windows #319
- Removing public keyword to get_current_queue Cython declaration #437
- Complete support for
sycl::ONEAPI::filter_selector
in dpctl. , andsycl::platform
#298 creation using opaque pointers. - A
DPCTLDeviceMgr
module in C API that caches a default context for root devices #277. DPCTLSyclBackendType
andDPCTLSyclDeviceType
have a new memberALL
#287.- C API now provides helper functions to convert between dpctl and SYCL enum values #296.
- Macros to help create opaque vector classes for opaque SYCL types #297.
,
SyclContext
#334,SyclPlatform
(#336, #298),SyclQueue
#323 have constructors that recognize filter selectors and closely follow DPC++ interface. - Add API to get a
PyCapsule
fromSyclQueue
,SyclContext
instances #350. - Added
get_queue_ref_from_ptr_and_syclobj(ptr, syclobj)
that createsDPCTLSyclQueueRef
from a USM pointer and Python objectsyclobj
from__sycl_usm_array_interface__
#380. - Support for SYCL sub-devices, including sub-device creation, queue, and context creation using sub-devices #343.
SyclDevice.parent_device
property to indicate if an instance has a parent device #366.- Several new getter functions for device info descriptors to device interface (#300, #335, #318, #315, #308).
- Support for SYCL device aspects #307.
- Properties for every
sycl::device
info and aspect that we support inSyclDevice
#324. - Support handling async errors inside
SylQueue
instances #346. get_backend
,get_platform
,get_device_type
to PythonSyclDevice
class #300- A
_sycl_device_factory.pyx
module providingSyclDevice
constructors using standardsycl::device_selector
classes (previously in_sycl_device.pyx
) and a newget_devices
#277 function to enumerate all devices. _sycl_device_factory.pyx
implementsget_num_devices
andhas_*_device(s)
functions #320.- Enable Python coverage in CI for Linux #369.
- Use
public
keyword in_sycl_*.pxd
to generate header files allowing non-Cython centric native extensions to work with dpctl's Python objects #218. - Documentation improvements #341.
- Rename dpCtl to dpctl in all comments, license headers, and docs. #342
dpctl.memory.MemoryUSM*
constructors now usedpctl.SyclQueue()
instead ofdpctl.get_current_queue()
when thequeue
keyword argument isNone
(default) #382.dpctl.set_default_queue
has been renamed todpctl.set_global_queue()
#323.- Changed
dpctl.dump
todpctl.lsplatform
#336. - Various
SyclDevice
methods related to queryingsycl::info::device
were converted to properties #324. - Various C API functions names were changed.
- Possible crashes when a SYCL platform is not available #349.
- Fix tests which fail if GPU is not available (only CPU is available) #359.
- Fix breaking C API tests #358.
- Bandit warning about "subprocess.check_call(shell=True)" for Windows #306.
- Removed
get_num_platforms
,has_cpu_queues
,has_gpu_queues
,get_num_queues
,has_sycl_platforms
#320.
- Do not use POP_FRONT in FindDPCPP.cmake so that we can use a cmake version older that 3.15.
- Documentation improvements.
- Cmake improvements and Coverage for C API, Cython and Python.
- Added support for Level Zero devices and queues.
- Added support for SYCL standard device_selector classes.
- SyclDevice instances can now be constructed using filter selector strings.
- Code of conduct.
- Building wheels.
- Queue manager improvements.
- Adding
__array_function__
so that Numpy calls with dparrays work. - Using clang-format for C/C++ code formatting.
- Using pytest for running tests.
- Add python and cython file coverage.
- Using Bandit for finding common security issues in Python code.
- Add instructions about file headers formats.
- Changed compiler name usage from clang++ to dpcpp.
- Reformat backend.pxd to be closer to black style.
- Remove
cython
frominstall_requires
. It allows usedpCtl
innumba
extensions. - Incorrect import in example.
- Consistency of file headers.
- Klocwork issues.
_Memory.get_pointer_type
static method which returns kind of USM pointer.- Utility functions to transform string to device type and back.
- New
dpctl.dptensor.numpy_usm_shared
module containing USM array. USM array extends NumPy ndarray. - A lot of new examples. Including examples of building Cython extensions with DPC++ compiler that interoperate with dpCtl.
- Mechanism for registering a callback function to look and see if the object supports USM.
- setup.py builds C++ backend for develop and install commands.
- Building wheels.
- Use DPC++ runtime from package
dpcpp_cpp_rt
. - All usage of
DPPL
in C-API functions was changed toDPCTL
, e.g.,DPPLQueueMgr_GetCurrentQueue
toDPCTLQueueMgr_GetCurrentQueue
. - Renamed the C-API directory is now called
dpctl-capi
instead ofbackends
. - Refactoring the
dpctl-capi
functions to prepare for changes to add Level Zero program creation. SyclProgram
andSyclKernel
classes were moved out ofdpctl
into thedpctl.program
sub-module.
- Klockwork static code analysis warnings.
- Device descriptors "max_compute_units", "max_work_item_dimensions", "max_work_item_sizes", "max_work_group_size", "max_num_sub_groups" and "aspects" for int64 atomics inside dpctl C API and inside the dpctl.SyclDevice class.
- MemoryUSM* classes moved to
dpctl.memory
module, added support for aligned allocation, added support forprefetch
andmem_advise
(sychronous) methods, implementedcopy_to_host
,copy_from_host
andcopy_from_device
methods, pickling support, and zero-copy interoperability with Python objects which implement__sycl_usm_array_inerface__
protocol. - Helper scripts to generate API documentation for both C API and Python.
- Compiler warnings when building libDPPLSyclInterface and the Cython extensions.
- The Legacy OpenCL interface.
- How the initial active queue is populated inside DPPLQueueMgr.
- dpctl.SyclQueueManager only reports the number of non-host platform.
- dpctl.SyclQueueManager now raises an exception if DPCTL C API returns a nullptr instead of a valid Sycl queue.
- Several crashes in cases where an OpenCL or Level Zero platform is not available.
- Fix failing platform test case. #116
- Properly skip tests when no OpenCL devices are available.
- Add skip tests to test_sycl_usm.py
- Fix Gtests configuration.
- A crash on Windows due a Level Zero driver problem. Each device was getting enumerated twice. To handle the issue, we added a temporary fix to use only first device for each device type and backend #118.
- Changelog was added for dpctl.
- Windows build was fixed.
- Add a helper function to all Python SyclXXX classes to get the address of the base C API pointer as a long.
- Rename PyDPPL to dpCtl in comments (function name renaming to come later)
- Fix bugs highlighted by tools.
- Various code clean ups.
- Dump functions were enhanced to print back-end information.
- dpctl gained support for unint_8 and unsigned long data types.
- oneAPI Beta 10 tool chain support was added.
- dpctl is now aware of DPC++ Sycl PI back-ends. The functionality is now exposed via the context interface.
- C API's queue manager was refactored to require back-end.
- dpct's device_context now requires back-end, device-type, and device-id to be provided in a string format, e.g. opencl:gpu:0.
- Fixed some important bugs found by static analysis.
- Add dpctl.get_curent_device_type().
- Set _cpu_device and _gpu_device to None by default.
- Add get include and include headers.
- DPPL shared objects are installed into dpctl.
- Refactor unit tests.
- Adds C and Cython API for portions of Sycl queue, device, context interfaces.
- Implementing USM memory management.
- Refactored API to expose a minimal sycl::queue interface.
- Modify cpu_queues, gpu_queues and active_queues to functions.
- Change static vectors to static pointers to verctors. It disables call for destructors. Destructors are also call in undefined order.
- Rename package PyDPPL to dpCtl.
- Use dpcpp.exe on Windows instead of dpcpp-cl.exe deleted in oneAPI beta08.
- Correct use ERRORLEVEL in conda scripts for Windows.
- Fix using dppl.has_sycl_platforms() and dppl.has_gpu_queues() functions in skipIf