Skip to content

Conversation

@uriyyo
Copy link
Member

@uriyyo uriyyo commented Nov 15, 2020

Literal equality no longer depends on the order of arguments.

Fix issue related to typing.Literal caching by adding typed parameter to typing._tp_cache function.

Add deduplication of typing.Literal arguments.

https://bugs.python.org/issue42345

Automerge-Triggered-By: GH:gvanrossum

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thank you for your PR to cpython! This looks very good to me, just one thing about de-duplicating literals pointed out in the review below.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that Literal should de-duplicate its __args__ similar to Union, although I now see the set approach won't work. Maybe you can call set on the _value_and_type_iter function, then extract out the parameters again, something like:

def Literal:
    ...
    parameters = [p for p, _ in set(_value_and_type_iter(parameters))]
    return _LiteralGenericAlias(self, parameters)

This probably needs more discussion, and Guido/Ivan's decision on what the expected behavior should be. Maybe we can ask on the bpo issue.

@uriyyo
Copy link
Member Author

uriyyo commented Nov 15, 2020

@Fidget-Spinner There is a problem with Literal caching.

Code like this will fail because of type caching:

>>> Literal[0] == Literal[False]
False

Code below will show the root cause of this problem:

>>> (Literal[0], Literal[False])
(Litera[0], Literal[0])

As a solution, we can disable caching for a Literal.

What do you think about that?

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 15, 2020

Code like this will fail because of type caching:

One possible workaround for that is to pass typed=True for lru_cache in tp_cache, then it will differentiate 0 and False.

The only problem is I realized that set will still fail for passing in dict Literal.

A possible workaround is to just loop through them and check the other args one by one (not particularly efficient but I don't expect Literals to be too long, and with cache it should offset some of it), something like what _remove_dups_flatten for Union does, but instead you can't use set due to non-hashable args. Something like:

# more code here to keep track and store the unique elements
...
for index, i_elem in enumerate(parameters):
    is_duplicate = False
    for j_elem in parameters[index:]:
        if i_elem == j_elem:
            is_duplicate = True
            break
    ...
    # more code for logic to handle the unique elements

I think before we progress further, we should wait for a core dev to respond on the bpo. They might think just having comparisons work is sufficient, and that de-duplicating args isn't needed. So let's save our effort for now 😉 .

Edit:
I just remembered that _tp_cache will just default to non-cache behavior for dict literals, so it's fine to just leave it mostly untouched.

@uriyyo
Copy link
Member Author

uriyyo commented Nov 15, 2020

@Fidget-Spinner I forget about lru_cache typed parameter, thanks for the reminder

I totally agree with you, let wait for a decision from Guido and Ivan regarding Literal args de-duplicating.

As for me, we can skip args de-duplication because Literal equals method will convert everything to set so duplicates won't be a problem.

@gvanrossum
Copy link
Member

Please change the title and news text. We don’t use the style that describes the new state, we describe the action of the PR as a change to the behavior. For examples, just look at the commit log.

@uriyyo uriyyo changed the title bpo-42345: Literal equality no longer depends on order of arguments bpo-42345: Fix typing.Literal equals method to ignore the order of arguments Nov 15, 2020
@uriyyo
Copy link
Member Author

uriyyo commented Nov 15, 2020

I have updated the title and news.

Sorry for that, I didn't notice this convention. My bad.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, except for one nit.

You might mention in the description (not the title) and the news file that you also fixed an issue where incorrectly Literal[0] == Literal[False] -- good catch, that!

@uriyyo
Copy link
Member Author

uriyyo commented Nov 16, 2020

@gvanrossum I have seen your message regarding Literal args deduplication, so I have implemented it in the scope of this PR.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close :). Thank you for your patience.

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and sorry for the weird tangents I went off on in some places!

@uriyyo
Copy link
Member Author

uriyyo commented Nov 16, 2020

@Fidget-Spinner thanks for the review and all comments 🙂

What will be the next step regarding this PR?

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 16, 2020

@Fidget-Spinner thanks for the review and all comments 🙂

What will be the next step regarding this PR?

@uriyyo A core dev like Guido/Ivan will take a look, once they're satisfied and approve it, it will be merged into 3.10. And probably backported to 3.9 and 3.8 too (done automatically via bots). This may take some time depending on how busy the core devs are.

I'm looking forward to see your first commit to cpython :).

@gvanrossum
Copy link
Member

I will look at it some time this week.

@Dominik1123
Copy link
Contributor

Should the docs of typing.get_args be updated to mention Literal as well (similar to #23254)?

@bedevere-bot
Copy link

GH-23335 is a backport of this pull request to the 3.9 branch.

@uriyyo
Copy link
Member Author

uriyyo commented Nov 17, 2020

@gvanrossum Looks like I did it🙂

@Dominik1123
Copy link
Contributor

Should the docs of typing.get_args be updated to mention Literal as well (similar to #23254)?

Sure, but I would make a minimal change, e.g. changing "Union" in that paragraph to "Union or Literal".

@uriyyo What about updating the docs of typing.get_args?

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 18, 2020

Should the docs of typing.get_args be updated to mention Literal as well (similar to #23254)?

Sure, but I would make a minimal change, e.g. changing "Union" in that paragraph to "Union or Literal".

@uriyyo What about updating the docs of typing.get_args?

@Dominik1123, not sure if this is a bug, but hash() of Literal is dependent on args order in the current code, so the type caching problem won't happen. Currently, it's still only Union affected.

>>> hash(Literal[1, 2, 3])
3136223724409791019
>>> hash(Literal[3, 2, 1])
4438910888190981000

@uriyyo
Copy link
Member Author

uriyyo commented Nov 18, 2020

@Fidget-Spinner Looks like we should not update docs because Literal can't be parametrized with generics so this problem will not exist for Literal. Please correct me if I miss smth.

@gvanrossum
Copy link
Member

But Literal can occur inside other types, e.g. Tuple[str, str, Literal[0, 1, 2]].

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 18, 2020

@uriyyo yeah, sorry if I wasn't clear in my original statement - I was in support of not updating the docs. Although my reasoning is slightly different. It's not about generics, the behavior Dominik mentioned happens when Union's nested in anything. The real reason why we don't need a docs update is because the __hash__ of Literal in your patch is order-dependent.

Edit:
Eg:

# Union with __args__ not preserving order when nested due to type caching.
>>> Tuple[Union[int, str]].__args__
(typing.Union[int, str],)
>>> Tuple[Union[str, int]].__args__
(typing.Union[int, str],)

# Literal doesn't have that issue
>>> Tuple[Literal[1, 2]].__args__
(typing.Literal[1, 2],)
>>> Tuple[Literal[2, 1]].__args__
(typing.Literal[2, 1],)

@uriyyo
Copy link
Member Author

uriyyo commented Nov 18, 2020

@Fidget-Spinner Thanks for the explanation it's clear to me now.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Nov 18, 2020

@uriyyo Awesome! I'm glad I was able to help :).

Oops, forgot to say this: Congrats to your first commit to CPython!

@Dominik1123
Copy link
Contributor

Should the docs of typing.get_args be updated to mention Literal as well (similar to #23254)?

Sure, but I would make a minimal change, e.g. changing "Union" in that paragraph to "Union or Literal".

@uriyyo What about updating the docs of typing.get_args?

@Dominik1123, not sure if this is a bug, but hash() of Literal is dependent on args order in the current code, so the type caching problem won't happen. Currently, it's still only Union affected.

>>> hash(Literal[1, 2, 3])
3136223724409791019
>>> hash(Literal[3, 2, 1])
4438910888190981000

@Fidget-Spinner @uriyyo I see, I didn't notice before. But isn't this very unusual in terms of behavior? So we can have two objects that compare equal but their hashes are different. AFAIK nowhere else in Python this is the case. The closest that comes to it in terms "being equal, but not having the same hash" is set and frozenset but in that case at least set will raise. Now for Literal it's yet another level, since the hashes are actually computed, and it's even the same type.

To quote the Python data model on __hash__:

The only required property is that objects which compare equal have the same hash value;

So according to that it's a bug and the hash should not be order dependent.

@uriyyo
Copy link
Member Author

uriyyo commented Nov 18, 2020

@Dominik1123 Agree with you, looks like it's a bug related to Literal hash implementation, basically, we should change implementation from:

    def __hash__(self):
        return hash(tuple(_value_and_type_iter(self.__args__)))

to:

    def __hash__(self):
        return hash(frozentset(_value_and_type_iter(self.__args__)))

@Fidget-Spinner should we create issue regarding this bug?

@Dominik1123
Copy link
Contributor

@Dominik1123 Agree with you, looks like it's a bug related to Literal hash implementation, basically, we should change implementation from:

    def __hash__(self):
        return hash(tuple(_value_and_type_iter(self.__args__)))

to:

    def __hash__(self):
        return hash(frozentset(_value_and_type_iter(self.__args__)))

I'm not sure if frozenset is too restrictive in terms of the values it can work with. Right now PEP 586 disallows literals of mutable types, but only at type check time. The PEP also contains the following comment:

[...] the actual implementation of typing.Literal will not perform any checks at runtime. [...] This is partly to help us preserve flexibility in case we want to expand the scope of what Literal can be used for in the future [...]

Using frozenset would take away on that flexibility regarding literals of mutable types. Not sure if that's relevant though. I think we should wait for @gvanrossum to comment on this.

@gvanrossum
Copy link
Member

Whoa, this is a bug. It not allowed to have two objects that compare equal but have a different hash. So since Literal[1, 2] == Literal[2, 1], those two should have the same hash. I'm sorry I didn't catch this!

Using set() or frozenset() here is totally fine, we already use those in __eq__. I'd use set().

@uriyyo
Copy link
Member Author

uriyyo commented Nov 18, 2020

@gvanrossum Should we create a separate issue or just open a new PR that will point to the same issue as this PR?

@gvanrossum
Copy link
Member

gvanrossum commented Nov 18, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants