Skip to content

Conversation

@mattlord
Copy link
Member

@mattlord mattlord commented Dec 16, 2025

Description

This PR makes 2 meaningful quality of life improvements for VDiff:

  1. Don't intentionally time out a VDiff query on the source and target at the vreplication copy phase duration timeout, as we're not in the copy phase for a workflow when performing a VDiff. This unnecessarily forces N VDiff resume cycles when diffing a table takes longer than the timeout (default is 1h). This is made potentially more problematic when the table has a complex PK (e.g. N binary columns) and the diff is done across different database versions as this can result in errant mismatches reported due to different ordering of the results on each side.
  2. When reporting samples of mismatched rows, print binary column type values in hex so that the user can cut and paste the value as-is in order to further examine the row in question. We were previously printing binary columns as unicode strings and the value would likely have 1 or more invalid unicode chars (the question mark inside a diamond shape) and you would not have the data you need in order to easily identify the row by its primary key column values.

You can see that the query hint is no longer added using the local examples:

git fetch --all && git checkout vdiff_improvements
make build

cd examples/local

./101_initial_cluster.sh && ./201_customer_tablets.sh && ./202_move_tables.sh

vtctldclient --server=localhost:15999 vdiff create --target-keyspace customer --workflow commerce2customer

❯ grep "Streaming rows for query" /opt/vtdataroot/tmp/* | grep rowstreamer
/opt/vtdataroot/tmp/vttablet.INFO:I1217 01:46:29.875357   10853 rowstreamer.go:353] Streaming rows for query: select order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.INFO:I1217 01:46:29.929999   10853 rowstreamer.go:353] Streaming rows for query: select customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014611.9138:I1217 01:46:28.644546    9138 rowstreamer.go:353] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014611.9138:I1217 01:46:29.661692    9138 rowstreamer.go:353] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014611.9166:I1217 01:46:29.868963    9166 rowstreamer.go:353] Streaming rows for query: select order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014611.9166:I1217 01:46:29.925814    9166 rowstreamer.go:353] Streaming rows for query: select customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014626.10853:I1217 01:46:29.875357   10853 rowstreamer.go:353] Streaming rows for query: select order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-014626.10853:I1217 01:46:29.929999   10853 rowstreamer.go:353] Streaming rows for query: select customer_id, email from customer force index (`PRIMARY`) order by customer_id

The log messages that still DO have the MAX_EXECUTION_TIME query hint are the ones for the copy phase of the workflow that we are diffing (as they should).

Versus on main:

❯ grep "Streaming rows for query" /opt/vtdataroot/tmp/* | grep rowstreamer
/opt/vtdataroot/tmp/vttablet.INFO:I1217 01:35:53.199903   94401 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.INFO:I1217 01:35:53.261233   94401 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013533.92675:I1217 01:35:51.930562   92675 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013533.92675:I1217 01:35:52.951606   92675 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013533.92735:I1217 01:35:53.191411   92735 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013533.92735:I1217 01:35:53.256384   92735 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ customer_id, email from customer force index (`PRIMARY`) order by customer_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013548.94401:I1217 01:35:53.199903   94401 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ order_id, customer_id, sku, price from corder force index (`PRIMARY`) order by order_id
/opt/vtdataroot/tmp/vttablet.pslord.matt.log.INFO.20251217-013548.94401:I1217 01:35:53.261233   94401 rowstreamer.go:350] Streaming rows for query: select /*+ MAX_EXECUTION_TIME(3600000) */ customer_id, email from customer force index (`PRIMARY`) order by customer_id

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Dec 16, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 16, 2025
@github-actions github-actions bot added this to the v24.0.0 milestone Dec 16, 2025
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: VReplication Component: VDiff and removed NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsIssue A linked issue is missing for this Pull Request labels Dec 16, 2025
@mattlord mattlord force-pushed the vdiff_improvements branch 3 times, most recently from 7aefe71 to 8d1d085 Compare December 17, 2025 00:28
@mattlord mattlord removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Dec 17, 2025
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 71.42857% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.88%. Comparing base (749fe54) to head (2f7e3d9).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
go/vt/vttablet/tabletmanager/vdiff/report.go 60.00% 8 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #19044    +/-   ##
========================================
  Coverage   69.88%   69.88%            
========================================
  Files        1610     1610            
  Lines      215431   215679   +248     
========================================
+ Hits       150551   150732   +181     
- Misses      64880    64947    +67     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: Matt Lord <[email protected]>
@promptless
Copy link
Contributor

promptless bot commented Dec 17, 2025

📝 Documentation updates detected!

New suggestion: Add changelog entry for VDiff improvements

@mattlord mattlord removed the request for review from systay December 17, 2025 02:58
@mattlord mattlord removed the request for review from harshit-gangal December 17, 2025 02:58
@mattlord mattlord requested review from mhamza15 and removed request for frouioui, rohit-nayak-ps and shlomi-noach December 17, 2025 22:02
Copy link
Collaborator

@nickvanw nickvanw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me other than a test that I think needs updating. Other questions are around changes that look unrelated.

return false
}
return route.RoutingParameters.Opcode.IsSingleShard()
return route.Opcode.IsSingleShard()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be part of the change - if so, I'm not sure how it's part of VDiff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the linter complaining about it after I merged in origin/main.

@mattlord mattlord merged commit e48c2d5 into main Dec 18, 2025
106 of 109 checks passed
@mattlord mattlord deleted the vdiff_improvements branch December 18, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: VDiff Component: VReplication Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: VDiff Sample Rows For Binary Columns Are not Usable Feature Request: Don't add MAX_EXECUTION_TIME query hint to VDiff queries

4 participants