Skip to content

Conversation

CJCombrink
Copy link
Contributor

Trying to get cross tests to work:

@Jens-G I am trying to investigate a possible issue with the UUID type between implementations.
These are some changes I had to make to get me going. If you perhaps have some time to look through and maybe comment or consider why the CI build does not need these changes.

I am following the steps as per Raw Commands for Building with Docker

docker build --build-arg uid=$(id -u) --build-arg gid=$(id -g) -t thrift-jammy build/docker/ubuntu-jammy
docker run -v $(pwd):/thrift/src -it thrift-jammy /bin/bash
build/docker/scripts/cross-test.sh

My current observation is that the Java and Swift implementation of UUID is not correct as per the spec: Universal unique identifier encoding

My current test includes running one server and running another client.
What I observe: I check the UUID being sent by the client and the UUID as reported by the server. In all cases the expectation is that what is sent is received and then returned. This however is not the case.

For all of these I am running the C++ server: /thrift/src/test/cpp/TestServer --port=37471

Language Client Sent Server Received cmd
C++ 00112233-4455-6677-8899-aabbccddeeff 00112233-4455-6677-8899-aabbccddeeff /thrift/src/test/cpp/TestClient --port=37471
Java 00112233-4455-6677-8899-aabbccddeeff 8899aabb-ccdd-eeff-0011-223344556677 /thrift/src/lib/java/build/runclient --port=37471
go 00112233-4455-6677-8899-aabbccddeeff 00112233-4455-6677-8899-aabbccddeeff /thrift/src/test/go/bin/testclient --port=37471
swift 00112233-4455-6677-8899-aabbccddeeff 8899aabb-ccdd-eeff-0011-223344556677 thrift/src/test/swift/CrossTests/.build/x86_64-unknown-linux-gnu/debug/TestClient --port=37471
.NET 00112233-4455-6677-8899-aabbccddeeff 00112233-4455-6677-8899-aabbccddeeff /thrift/src/test/netstd/Client/bin/Release/net9.0/Client
  • 'Client Sent' is the string that is hard coded in the client, and printed on the console when the client is run
  • 'Server Received' is what the server interpreted and printed to the console

That about covers the php support except for Haxe and Delphi as per the LANGUAGES.md (I am having trouble building Delphi as seen in the cross-test.sh script.

From the above at least the C++, go and .NET implementations are compatible (and I would argue correct)

NB: This is not about what the client sends and receives: The client that sends the UUID receives the same UUID which means the server re-transmits as-is. The issue is about what one langauge sends (encode) and what another language receives (decode).

Where to from here?

  1. First I would like someone else to look into this and confirm or deny my hypothesis
  2. If there is a mismatch, irrespective of which implementation is correct, this is a problem and should be addressed.
    • I don't know if this can be changed, it probably depends on the how strict is the spec and how wide is adoption
    • I know of projects using the C++ and go side already (with interop)
  3. I need UUIDs to work for C++, Java, JS (no support yet), python (no support yet)
  • Did you create an Apache Jira ticket? (Request account here, not required for trivial changes)
    • No, I first want engagement on this before making a ticket
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

- Not interested in these now and looked more involved to solve
- d: complains about some libss missing linker object
- dart: complains about an sdk version
- lisp: some compile error
- Configure and build issues
- Ask to run "go mod tidy"  without these
- Replace with proper os function
- UUID support was added
- Need to see it to compare to Java and C++
- And print what is sent
- To assist in debugging and writing up the findings
@EnigmaTriton
Copy link
Contributor

Hello @CJCombrink,
I don’t know if you were using the same language on both client and server side (I’m guessing so) but in this case, there is a way to determine which end is not translating correctly to/from the network encoding: perform a network capture to analyze with Wireshark.

Your assumption that C++, Go, and .NET show the correct implementation is right and 00112233-4455-6677-8899-aabbccddeeff is indeed how Wireshark should display the data on the network capture.

Hope it helps.

@fishy
Copy link
Member

fishy commented May 21, 2025

what you observed sounds like the conversion between unsigned and signed int8 went wrong. I know java does not have unsigned ints and probably had something implemented wrong because of that, not sure if swift has a similar issue.

@Jens-G
Copy link
Member

Jens-G commented May 21, 2025

From the above at least the C++, go and .NET implementations are compatible (and I would argue correct)

The NET impl was the original reference impl I made. Both Delphi and Haxe were then tested against that reference (and each other of course). It's always a good idea to have an implementation ready to for testing that is known to work so maybe someone missed that ... ticket?

I am having trouble building Delphi as seen in the cross-test.sh script.

"It does not work" is not an adequate problem description. Because "it works for me". 😁. If you need help with this, we can continue that topic at the user or dev mailng list.

@CJCombrink
Copy link
Contributor Author

@EnigmaTriton

same language on both client and server side (I’m guessing so)

This was my starting point but since I have to integrate my server (c++) with a Java client I tested and found the issue. Same language client <-> server all seems fine, probably because implementation bias where the same mistake is made both sides presents as working code.

Wireshark should display the data

I have checked with Wireshark and this is what I see for the C++ Server and the .Net Client:
image

And for C++ server and Java Client:
image

The .Net client sends the data as expected
The Java client sends data not as expected: The bit-shifting here is breaking my brain bit it does appear to match what I see on the wire...

@EnigmaTriton
Copy link
Contributor

EnigmaTriton commented May 22, 2025

@CJCombrink
For the first one, you should read what is visible on frame 130 rather than 127 (data is sometimes transferred in several packets depending on the sender options, the last one will show the reassembly and properly display the data as Thrift data instead of raw bytes).

Anyway, the second capture confirm that the data on the network is in the wrong order (while the raw data is correct in the first capture).
With this, you should be able to confirm that your fix is working as expected.

Edit: I might have misread the image, maybe you should right-click on the packet and select "Decode as…" then select Thrift for the reassembly to work properly (if the client is sending unbuffered in frames 1xx to 128 and the server is sending the reply buffered in frame 130).

@CJCombrink
Copy link
Contributor Author

The question is what is the path forward?

  • I have shown that the Java and Swift implementations are not according to the specification
  • The .Net, go and C++ implementations are according to the specification
  • If the client and server is the same language, it works as expected
  • If the client or server is Java and interop is needed to any other language it will not work as expected

@fishy
Copy link
Member

fishy commented May 28, 2025

@CJCombrink path forward is likely to file bugs on jira about java's and swift's implementations of uuid are not according to the spec, and/or fix them via PRs.

@CJCombrink
Copy link
Contributor Author

fix them via PRs

This should be fairly easy to do but that would be a definite backwards compatibility break for users of the UUID field and I am not sure how thrift users would appreciate that. If the adoption of UUID is low it should not be an issue.

@fishy
Copy link
Member

fishy commented May 29, 2025

since there's already a divergence, there's no way to fix this without any braking changes.

and it's certainly better to break the ones not according to spec, vs. breaking things that's according to the spec.

@Jens-G
Copy link
Member

Jens-G commented Jun 2, 2025

would be a definite backwards compatibility break for users of the UUID field a

You don't want to break an incompatible implementation for compatibility reasons? 🤣

@CJCombrink
Copy link
Contributor Author

Is there a way to use the existing cross tests to test this compatibility?
I am trying to work through the existing test to see what it is testing but hoping someone might be able to tell me "oh if you do this and this it might work"

My current idea would be that each client must send the known pattern 00112233-4455-6677-8899-aabbccddeeff (and log it) and each server must then also log what is received. The test should then check that both the server and client should have logged that pattern for the test to pass. Is this something that might work?

@fishy
Copy link
Member

fishy commented Jun 3, 2025

We run a small subset of languages on the CI, the ones implemented uuid use ThriftTest.thrift while the ones didn't use the files under v0.16 which has the uuid fields removed.

but almost all languages, even the ones we don't currently run ci, have crossrunner setup there, and you can just use crossrunner locally to run the crosstests between them.

@CJCombrink
Copy link
Contributor Author

but almost all languages, even the ones we don't currently run ci, have crossrunner setup there, and you can just use crossrunner locally to run the crosstests between them.

Yes I am aware of this, my observation is that these tests does run compatibility tests, but not correctness tests.
I am lead to believe this since this incorrectness is not failing the tests

For example, when running the cross test with java server and cpp client (locally, on this branch)

> python3 test/test.py --skip-known-failures --server java --client cpp

Then looking at the following test result:

Server Client Protocol Transport Result Expected
java cpp binary buffered-ip success(0) yes

And in the logs for "Server": (java-cpp_binary_buffered-ip_server.log)

...
TServerEventHandler.processContext - connection #1 is ready to process next request
testUuid(8899aabb-ccdd-eeff-0011-223344556677)
...

The logs for "Client": (java-cpp_binary_buffered-ip_client.log)

testUuid(00112233-4455-6677-8899-aabbccddeeff) = (00112233-4455-6677-8899-aabbccddeeff)

Ultimately the test should fail, or an alternative test or method should be introduced to try and catch this.

I will see if I can't start the process to submit a fix for the Java

- Tries to run the debug build but it appears that the script builds release
@fishy
Copy link
Member

fishy commented Jun 4, 2025

yea I think you are right. I checked go's cross test server and client implementation, the server just response the same uuid it received back to the client, and the client just checks that the uuid it received back is the same one it sent out, so this would not fail the test if the server is using the wrong order. I would assume other languages' test client and test server are implemented similarly.

we would need to check the value on the server against a pre-defined value to catch this bug.

@Jens-G
Copy link
Member

Jens-G commented Jun 4, 2025

we would need to check the value on the server against a pre-defined value to catch this bug.

We could have three small files somewhere under /test that are known to have the right content, binary, json and compact respectively. Let me prepare sth ...

@Jens-G
Copy link
Member

Jens-G commented Jun 5, 2025

three small files somewhere under /test that are

Actually I think that road leads to no good. I hardcoded the testcases now and made it as simple as possible (even though the commit size may indicate otherwise)

@CJCombrink commits are 3e6be73 and a420a24

@CJCombrink
Copy link
Contributor Author

@Jens-G
Thanks I will take a look to do something similar in the C++ side once I can get #3167 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants