-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
requests.timestamp_day cannot be converted to default timezone #4
Comments
Can this be accomplished via an existing embulk filter plugin? I'm not sure what plugins anyone is using, but I've got a couple I've used for importing some pretty messy data. I'm about 90% sure I've got one timestamp field that's partially parsed from character data. I'll take a whack at it. I'm not sure that's really a viable option for everyone, but I can document it as an option at least -- assuming it actually works like I remember. |
I've played around with this a little bit and I'm not sure I really understand what the problem is. My current config for the
This gets me the timestamp in my local time. I haven't cross-checked vs. daylight savings and I do have some other plugins installed. I don't think anything is mutating the workflow, but I haven't tried to run a vanilla instance either. |
The problem only exists in the |
I have some time today, I'll play with this one, https://github.com/sonots/embulk-filter-timestamp_format |
That's actually one of the plugins I have (for a different problem). It's
worked pretty well for handle the edge cases I've encountered.
…On Fri, Dec 13, 2019 at 2:22 PM Robert Carroll ***@***.***> wrote:
I have some time today, I'll play with this one,
https://github.com/sonots/embulk-filter-timestamp_format
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAGE35CEEWVYEMSSMK3EFATQYPVGRA5CNFSM4JK5WEHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEG3EX7Y#issuecomment-565595135>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGE35H4INTU6MR4Q7E7LVTQYPVGRANCNFSM4JK5WEHA>
.
|
Here's a better representation of the problem, for detail, not for explanation. Importing a single days request files, 1 sequence from cd:cli fetch commandUsing
|
timestamp_day | first | last | total |
---|---|---|---|
2019-12-09 | 2019-12-09 16:00:00.0090000 | 2019-12-10 15:59:59.9070000 | 3467752 |
Resulting in 3,467,752 rows returned for 12/9
Using CONVERT(date, timestamp)
SELECT
CONVERT(date, timestamp)
, MIN(timestamp) AS [first]
, MAX(timestamp) AS [last]
, COUNT(*) total
FROM requests
GROUP BY CONVERT(date, timestamp)
ORDER BY [last] DESC
While timestamp_day indicates 12/9, with UTC to America/Los_Angeles conversion, many of those rows are for 12/10.
CONVERT(date, timestamp) | first | last | total |
---|---|---|---|
2019-12-10 | 2019-12-10 00:00:00.2560000 | 2019-12-10 15:59:59.9070000 | 2674451 |
2019-12-09 | 2019-12-09 16:00:00.0090000 | 2019-12-09 23:59:59.8540000 | 793301 |
3467752 |
Resulting in 2,674,451 rows returned for 12/9, but this is still not complete.
Driving it home, importing 2 sequences of requests*.gz
CONVERT(date, timestamp) | first | last | total |
---|---|---|---|
2019-12-10 | 2019-12-10 00:00:00.2560000 | 2019-12-10 15:59:59.9070000 | 2674451 |
2019-12-09 | 2019-12-09 00:00:00.0150000 | 2019-12-09 23:59:59.8540000 | 3377973 |
2019-12-08 | 2019-12-08 16:00:00.0140000 | 2019-12-08 23:59:59.9990000 | 613354 |
With a complete dataset, there are actually 3,377,973 rows returned for 12/9
This field can't be converted to a local default timezone from UTC without a timestamp.
The field is stored in VARCHAR which makes some queries using this field problematic.
You can store the field using Embulk Output SQL type 'date', but there will be no conversion.
Best solution would be to load the table to staging and convert that value in production locally...
or just use timestamp and convert to create a new value from timestamp.
Example: #4 (comment)
The text was updated successfully, but these errors were encountered: