-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pylibcudf polars date from str #16306
base: branch-24.08
Are you sure you want to change the base?
Pylibcudf polars date from str #16306
Conversation
@@ -811,6 +812,9 @@ def do_evaluate( | |||
else prefix.obj, | |||
) | |||
) | |||
elif self.name == pl_expr.StringFunction.Strptime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wence- I'm a little unsure if what's being passed from rust here is correct. The docs list many options for this api but self.options
is empty here. the rust code only passes the enum value out, is this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we didn't send that through in pola-rs/polars#16221.
Please check if pola-rs/polars#17702 works for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be working, with that PR I get this:
(Pdb++) self.options
('%Y-%m-%d', True, True, True)
Polars seems to enforce some kind of handling for ambiguous datetimes through this API, we can either error, raise, or replace them but all seem to rely on identifying if they exist and where. This is problematic for us short term in the very general case because we'd likely have to write some careful logic and tests to do this properly. Are there any conditions we can think of that preclude the possibility of this occurring, like maybe requiring that the format specify UTC for now? |
Paging @mroeschke who has done a load of this in cudf-classic for the pandas compatibility. If we think the answer is "it's too complicated", let's just not support unconditionally and write up properly as an issue. |
Both As you alluded to, a |
Thanks @mroeschke , please correct me if I am wrong, but I think this means we should at least be able to proceed with the case of |
Internally within cudf we can assume this. From polars I'm not sure if this can be assumed, especially if it's directly from user input. e.g.: df = pl.DataFrame({"a": ["2024-03-10 02:30:00 US/Pacific"]})
df.select(pl.col("a").str.to_datetime(format="%Y-%m-%d %H:%M:%S %Z")) should raise as |
This PR adds datetime/timestamp parsing from string columns in pylibcudf and cudf-polars.
Closes #16174
WIP