Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Build with opentelemetry support #1048

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

h-vetinari
Copy link
Member

No description provided.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@h-vetinari
Copy link
Member Author

Looks like either opentelemetry or arrow need an #include <winsock2.h> or #include <ws2tcpip.h> somewhere

@xhochy
Copy link
Member

xhochy commented May 8, 2023

@h-vetinari You might get better error messages if you deactivate the unity builds. The error message triggers a gut feeling of "oh, maybe there is a macro defined twice".

@h-vetinari
Copy link
Member Author

You might get better error messages if you deactivate the unity builds.

Interestingly, this gets much further now (didn't know that knob). It now fails with missing symbols while trying to create arrow_acero:

[322/458] Linking CXX shared library release\arrow_acero.dll
FAILED: release/arrow_acero.dll release/arrow_acero.lib 
cmd.exe /C "cd . && D:\bld\apache-arrow_1683618300204\_build_env\Library\bin\cmake.exe -E vs_link_dll --intdir=src\arrow\acero\CMakeFiles\arrow_acero_shared.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\link.exe /nologo @CMakeFiles\arrow_acero_shared.rsp  /out:release\arrow_acero.dll /implib:release\arrow_acero.lib /pdb:release\arrow_acero.pdb /dll /version:1200.0 /machine:x64  /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO  && cd ."
LINK: command "C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\link.exe /nologo @CMakeFiles\arrow_acero_shared.rsp /out:release\arrow_acero.dll /implib:release\arrow_acero.lib /pdb:release\arrow_acero.pdb /dll /version:1200.0 /machine:x64 /NODEFAULTLIB:LIBCMT /INCREMENTAL:NO /MANIFEST /MANIFESTFILE:release\arrow_acero.dll.manifest" failed (exit code 1120) with the following output:
   Creating library release\arrow_acero.lib and object release\arrow_acero.exp
project_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
swiss_join.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
aggregate_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
exec_plan.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
filter_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
hash_join.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *)" (?UnwrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@@Z)
project_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
source_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
swiss_join.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
util.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
aggregate_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
exec_plan.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
filter_node.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)
hash_join.cc.obj : error LNK2001: unresolved external symbol "class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span> & __cdecl arrow::internal::tracing::RewrapSpan(class arrow::util::tracing::SpanDetails *,class opentelemetry::v1::nostd::shared_ptr<class opentelemetry::v1::trace::Span>)" (?RewrapSpan@tracing@internal@arrow@@YAAEAV?$shared_ptr@VSpan@trace@v1@opentelemetry@@@nostd@v1@opentelemetry@@PEAVSpanDetails@1util@3@V4567@@Z)

release\arrow_acero.dll : fatal error LNK1120: 2 unresolved externals

The __cdecl arrow::internal::tracing::UnwrapSpan(class arrow::util::tracing::SpanDetails *) makes it look like this comes from arrow somehow (I'm guessing there are no shared windows builds with OTEL in upstream CI). Do we even need tracing in conda-forge? Isn't that usually for sanitizer instrumentation?

@xhochy
Copy link
Member

xhochy commented May 12, 2023

The UnwrapScan function is probably missing an ARROW_EXPORT declaration: https://github.com/apache/arrow/blob/cdefbb8f4b4183b29fbcdb014af1f6fc0030475c/cpp/src/arrow/util/tracing_internal.h#LL125C1-L125C1

Although it is internal, it is used across library boundaries and thus it needs to be exported. Thus should be addressed upstream.

@h-vetinari
Copy link
Member Author

Good analysis @xhochy. The build now succeeded, but we run into something else when trying to load the library:

import: 'pyarrow'
[libprotobuf ERROR D:\bld\libprotobuf-split_1670986007159\work\src\google\protobuf\descriptor_database.cc:642] File already exists in database: opentelemetry/proto/common/v1/common.proto
[libprotobuf FATAL D:\bld\libprotobuf-split_1670986007159\work\src\google\protobuf\descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): 
Traceback (most recent call last):
  File "D:\bld\apache-arrow_1683937204901\test_tmp\run_test.py", line 2, in <module>
    import pyarrow
  File "D:\bld\apache-arrow_1683937204901\_test_env\lib\site-packages\pyarrow\__init__.py", line 65, in <module>
    import pyarrow.lib as _lib
ImportError: DLL load failed while importing lib: A dynamic link library (DLL) initialization routine failed.

I haven't seen this before, but it seems there's some double-loading(?) of proto information into a protobuf-internal database?

If so, it sounds like arrow shouldn't be re-loading the otel proto definitions?

CC @lidavidm

@h-vetinari
Copy link
Member Author

If so, it sounds like arrow shouldn't be re-loading the otel proto definitions?

To top things off, it looks like there's a third potential source for this (at least in conda-forge), in addition to opentelemetry & arrow: https://github.com/conda-forge/proto-opentelemetry-proto-feedstock

The opentelemetry lib includes this as a build dependency(?), which seems wrong somehow? Shouldn't it be a host-dependence?

Also, if there's a need to match the proto-opentelemetry-proto version here, I think this should become a run-export of libopentelemetry-cpp, rather than duplicating the version ("0.19") here and then trying to keep it in sync?

@xhochy
Copy link
Member

xhochy commented May 15, 2023

File already exists in database: opentelemetry/proto/common/v1/common.proto
2023-05-15T06:23:18.8525852Z [libprotobuf FATAL

This happens if two libraries load the same protobuf definition in a different, conflicting version. We have the same problem with all the onnx-* packages. The workaround for this is build all (except at most one) with static protobuf so that each library gets its own protobuf (global) namespace.

@h-vetinari
Copy link
Member Author

Thanks for the input!

This happens if two libraries load the same protobuf definition in a different, conflicting version.

Can we not ensure that they get the same protobuf definitions? There's even a specific feedstock for those.

@h-vetinari
Copy link
Member Author

BTW: The recent commits also showed that the unity build was directly implicated in the windows failures. Switching it back on lead to failure again. Should I raise an issue for this upstream?

@lidavidm
Copy link
Contributor

The opentelemetry lib includes this as a build dependency(?), which seems wrong somehow? Shouldn't it be a host-dependence?

The Protobuf definitions are only needed to generate code, not at runtime. But the version needs to be matched, so if there's a better way...

This happens if two libraries load the same protobuf definition in a different, conflicting version. We have the same problem with all the onnx-* packages. The workaround for this is build all (except at most one) with static protobuf so that each library gets its own protobuf (global) namespace.

Hmm, something seems wrong. Arrow itself shouldn't be loading the OpenTelemetry Protobuf generated code. Either we actually built a bundled copy of OpenTelemetry (doesn't seem so) or we still statically linked OpenTelemetry somehow? (Shouldn't be possible?) Or possibly multiple OpenTelemetry libraries included the Protobuf generated code?

BTW: The recent commits also showed that the unity build was directly implicated in the windows failures. Switching it back on lead to failure again. Should I raise an issue for this upstream?

Yes, please do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants