Skip to content

Conversation

@010Soham
Copy link
Contributor

What does this change do?

  • RestCatalog now passes its AuthManager into FileIO so downstream components can reuse a live token.
  • S3V4RestSigner now calls the AuthManager’s auth_header() when no static token is provided, ensuring the signer gets a fresh bearer token.
  • Added a unit test to verify the signer pulls the Authorization header from an AuthManager.

Why is this needed?

  • After the AuthManager refactor, the signer no longer received a token, causing remote signing to 401 for REST catalog users (e.g., Lakekeeper/MinIO). This restores token propagation and refresh.

How was this tested?

  • make lint
  • make test
  • uv run python -m pytest tests/io/test_fsspec.py -k auth_manager -v

Closes #2544

@Fokko
Copy link
Contributor

Fokko commented Dec 22, 2025

Thanks a lot @010Soham for picking this up, a lot of folks are eagerly waiting for this 👍

@c-thiel Are you able to check if this fixes the issue with the server signed urls?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fix. I commented on the original issue (#2544) to see if anyone can verify the solution.

I'll also try to reproduce the issue and try out this fix locally

Comment on lines 127 to +128
if token := self.properties.get(TOKEN):
signer_headers = {"Authorization": f"Bearer {token}"}
auth_header = f"Bearer {token}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can get rid of accessing TOKEN through properties here and just standardize on using the auth manager.

Copy link
Collaborator

@sungwy sungwy Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be out of scope for this bug-fix PR, but it looks like we’re tightly coupling AuthManager and authentication tokens which are RestCatalog concepts into FileIO, which should be Catalog type agnostic.

It might be worth revisiting this design in more detail in the future to ensure we don’t introduce fallback logic that’s driven by configuration properties rather than clearer separation of concerns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea great point. i think it would be better to split out the "rest signer" from fileio. there's a good example already in the REST catalog,

def _init_sigv4(self, session: Session) -> None:
from urllib import parse
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from requests import PreparedRequest
from requests.adapters import HTTPAdapter
class SigV4Adapter(HTTPAdapter):
def __init__(self, **properties: str):
super().__init__()
self._properties = properties
self._boto_session = boto3.Session(
region_name=get_first_property_value(self._properties, AWS_REGION),
botocore_session=self._properties.get(BOTOCORE_SESSION),
aws_access_key_id=get_first_property_value(self._properties, AWS_ACCESS_KEY_ID),
aws_secret_access_key=get_first_property_value(self._properties, AWS_SECRET_ACCESS_KEY),
aws_session_token=get_first_property_value(self._properties, AWS_SESSION_TOKEN),
)
def add_headers(self, request: PreparedRequest, **kwargs: Any) -> None: # pylint: disable=W0613
credentials = self._boto_session.get_credentials().get_frozen_credentials()
region = self._properties.get(SIGV4_REGION, self._boto_session.region_name)
service = self._properties.get(SIGV4_SERVICE, "execute-api")
url = str(request.url).split("?")[0]
query = str(parse.urlsplit(request.url).query)
params = dict(parse.parse_qsl(query))
# remove the connection header as it will be updated after signing
del request.headers["connection"]
aws_request = AWSRequest(
method=request.method, url=url, params=params, data=request.body, headers=dict(request.headers)
)
SigV4Auth(credentials, service, region).add_auth(aws_request)
original_header = request.headers
signed_headers = aws_request.headers
relocated_headers = {}
# relocate headers if there is a conflict with signed headers
for header, value in original_header.items():
if header in signed_headers and signed_headers[header] != value:
relocated_headers[f"Original-{header}"] = value
request.headers.update(relocated_headers)
request.headers.update(signed_headers)
session.mount(self.uri, SigV4Adapter(**self.properties))

it might also be easier to just pass in the request Session from the REST catalog to the Signer. So we dont need to recreate the auth header directly

but again, we can refactor this after the bug fix :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed - just leaving a comment so we don't forget 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we forget, #2862

Comment on lines +130 to +132
header = getattr(auth_manager, "auth_header", None)
if callable(header):
auth_header = header()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
header = getattr(auth_manager, "auth_header", None)
if callable(header):
auth_header = header()
auth_header = auth_manager.auth_header()

could we just call the function directly? this will fail if the auth_header function does not exist.
i think the current solution will fail silently, i.e. not add any auth header if the auth_header function does not exist

_ENV_CONFIG = Config()

TOKEN = "token"
AUTH_MANAGER = "auth.manager"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, could we move this to pyiceberg/catalog/rest/auth.py? with the other auth manager code.

@kevinjqliu
Copy link
Contributor

i was able to verify this locally by using the amazing repro script from @martyngigg
See #2544 (comment)


auth_header: str | None = None
if token := self.properties.get(TOKEN):
signer_headers = {"Authorization": f"Bearer {token}"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was partially mentioned but, should the auth manager take precedence over token when both are set?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FileIO properties missing TOKEN after AuthManager refactor

5 participants