Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for timestamps #70

Merged
merged 18 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/osx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
matrix:
os:
- 13
- 14
config:
- { name: Debug }
- { name: Release }
Expand Down
30 changes: 29 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,22 @@ message(STATUS "Building sparrow v${${PROJECT_NAME}_VERSION}")

OPTION(BUILD_TESTS "sparrow test suite" OFF)

include(CheckCXXSymbolExists)

if(cxx_std_20 IN_LIST CMAKE_CXX_COMPILE_FEATURES)
set(header version)
else()
set(header ciso646)
endif()

check_cxx_symbol_exists(_LIBCPP_VERSION ${header} LIBCPP)
if(LIBCPP)
message(STATUS "Using libc++")
# Allow the use of not visible yet availabile features, such
# as some formatter for new types.
add_compile_definitions(_LIBCPP_DISABLE_AVAILABILITY)
endif()

# Linter options
# =============

Expand All @@ -55,6 +71,16 @@ endif()
# Dependencies
# ============

set(SPARROW_INTERFACE_DEPENDENCIES "" CACHE STRING "List of dependencies to be linked to the sparrow target")

# libc++ does not provide all the elements of the C++20 standard library,
# so we need to use `HowardHinnant/date`.
# See: https://libcxx.llvm.org/Status/Cxx20.html#note-p0355
if (LIBCPP)
find_package(date CONFIG REQUIRED)
list(APPEND SPARROW_INTERFACE_DEPENDENCIES date::date)
endif()

# Build
# =====

Expand All @@ -65,7 +91,7 @@ set(SPARROW_HEADERS
${SPARROW_INCLUDE_DIR}/sparrow/buffer.hpp
${SPARROW_INCLUDE_DIR}/sparrow/buffer_view.hpp
${SPARROW_INCLUDE_DIR}/sparrow/config.hpp
${SPARROW_INCLUDE_DIR}/sparrow/contracts.hpp
${SPARROW_INCLUDE_DIR}/sparrow/contracts.hpp
${SPARROW_INCLUDE_DIR}/sparrow/data_traits.hpp
${SPARROW_INCLUDE_DIR}/sparrow/data_type.hpp
${SPARROW_INCLUDE_DIR}/sparrow/dynamic_bitset.hpp
Expand All @@ -86,6 +112,8 @@ target_include_directories(sparrow INTERFACE
$<BUILD_INTERFACE:${SPARROW_INCLUDE_DIR}>
$<INSTALL_INTERFACE:include>)

target_link_libraries(sparrow INTERFACE ${SPARROW_INTERFACE_DEPENDENCIES})

# We do not use non-standard C++
set_target_properties(sparrow PROPERTIES CMAKE_CXX_EXTENSIONS OFF)
target_compile_features(sparrow INTERFACE cxx_std_20)
Expand Down
2 changes: 2 additions & 0 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@ dependencies:
# Build
- cmake
- ninja
# Alternative of `std::chrono` introduced in C++20.
- howardhinnant_date
jjerphan marked this conversation as resolved.
Show resolved Hide resolved
# Tests
- doctest
8 changes: 7 additions & 1 deletion include/sparrow/data_traits.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,12 @@ namespace sparrow
using default_layout = variable_size_binary_layout<value_type, std::span<byte_t>, const std::span<byte_t>>; // FIXME: this is incorrect, change when we have the right types
};

template <>
struct arrow_traits<timestamp> : common_native_types_traits<timestamp>
{
static constexpr data_type type_id = data_type::TIMESTAMP;
};

namespace predicate
{

Expand All @@ -147,4 +153,4 @@ namespace sparrow
} constexpr has_arrow_traits;
}

}
}
25 changes: 21 additions & 4 deletions include/sparrow/data_type.hpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2024 Man Group Operations Limited

Check notice on line 1 in include/sparrow/data_type.hpp

View workflow job for this annotation

GitHub Actions / build

Run clang-format on include/sparrow/data_type.hpp

File include/sparrow/data_type.hpp does not conform to Custom style guidelines. (lines 17, 18, 19, 20, 22, 24)
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand All @@ -14,6 +14,15 @@

#pragma once

#if __cpp_lib_chrono <= 201907L
// P0355R7 (Extending chrono to Calendars and Time Zones) has not been entirely implemented in libc++ yet.
// See: https://libcxx.llvm.org/Status/Cxx20.html#note-p0355
// For now, we use HowardHinnant/date as a replacement if we are compiling with libc++.
// TODO: remove this once libc++ has full support for P0355R7.
# include <chrono>
#else
# include <date/tz.h>
#endif
#include <climits>
#include <cstdint>
#include <optional>
Expand Down Expand Up @@ -45,6 +54,12 @@
using float64_t = std::float64_t;
#endif

#if __cpp_lib_chrono <= 201907L
using timestamp = std::chrono::time_point<std::chrono::system_clock, std::chrono::nanoseconds>;
#else
using timestamp = date::zoned_time<std::chrono::nanoseconds>;
#endif

// We need to be sure the current target platform is setup to support correctly these types.
static_assert(sizeof(float16_t) == 2);
static_assert(sizeof(float32_t) == 4);
Expand Down Expand Up @@ -82,14 +97,15 @@
BINARY,
// Fixed-size binary. Each value occupies the same number of bytes
FIXED_SIZE_BINARY,
// Number of nanoseconds since the UNIX epoch with an optional timezone.
// See: https://arrow.apache.org/docs/python/timestamps.html#timestamps
TIMESTAMP,
};

/// C++ types value representation types matching Arrow types.
// NOTE: this needs to be in sync-order with `data_type`
using all_base_types_t = mpl::typelist<
std::nullopt_t // REVIEW: not sure about if we need to have this one? for representing NA? is this
// the right type?
,
std::nullopt_t,
bool,
std::uint8_t,
std::int8_t,
Expand All @@ -103,7 +119,8 @@
float32_t,
float64_t,
std::string,
std::vector<byte_t>
std::vector<byte_t>,
sparrow::timestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is one type missing before that one, see compare with the order of the enum, BINARY matches std::vector<byte_t> but we dont have something for FIXED_SIZE_BINARY. Not sure which type is missing though. @JohanMabille any idea what it would match?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think FIXED_SIZE_BINARY is meant to be used for a fixed size binary type which is not a primitive type, in which case we could use a struct as a type to test FIXED_SIZE_BINARY.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the values from Arrow implementation, not sure which type this value should match.

Copy link
Collaborator Author

@jjerphan jjerphan May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, must we work on supporting DATE32 and DATE64 which come before TIMESTAMP?

Edit: I guess we could support TIMESTAMP first if we were to have explicit integer identifiers for those types as introduced in arrow-cpp with apache/arrow#37149 as proposed by 2f961ef

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The support for DATE32 and DATE64 can be added later. Explicit integer identifiers would be required later for backward compatibility, but it doe snot matter for now, nor the enum "stability".

// TODO: add missing fundamental types here
>;

Expand Down
Loading