-
Notifications
You must be signed in to change notification settings - Fork 193
implement a MVT winner deployer #546
Comments
@livlab @TylerFisher -- @dannydb and I have some questions about this! We understand the issue is that we haven't re-deployed when there's a clear winner for an MVT. Can you tell us a bit about why that is? What do you think will close the feedback loop? There's a lot of tricky details in implementing something like this and we want to hone in on the most important part of the problem. |
The trickiest part of this is picking a winner in time since we don't get a full readout on event stats until the next day. Basically, in a simple A/B test, we need to figure out if the control or hypothesis scenario was more successful within a reasonable confidence interval. For that, we need the raw number of possible conversions and the raw number of conversions in each scenario We could sit and look at the live events tracker and have some way of tabulating on our end, but that seems pretty difficult to me. As far as having a fab command, we would need a way of injecting code from a flag of sorts. Every test is a little different -- shows and hides different divs, logs different analytics events, might trigger some sort of animation, so it has to be really general to handle all of those cases. |
Can we get the full event numbers from Google Analytics via the API in a My inclination is to focus on closing the information loop, and leaving it There's a further issue about the fab command: memorializing the winner. If On Mon, Apr 6, 2015 at 11:07 AM, Tyler Fisher [email protected]
David Eads | http://recoveredfactory.net "Medical statistics will be our standard of measurement: we will weigh life |
Hm. The trick is getting full numbers, not a sample, from the API. I think there might be a way, but I don't know if they'll get you full numbers in time. I think maybe an argument in the fab command isn't the right pattern. It's probably a flag in app_config that you set once and then gets committed. |
Based on our discussion with the Insights team, we could move to our own GA instance, then, given our volume of traffic, our numbers would be reported fully not sampled. We lose a lot of other things with that, obviously (relationship with all other NPR things). I don't know if there is any other way to get the full dataset via API, all answers so far point to no. (We definitely should not make test decisions based on a sample, I think everyone is already in agreement on that but writing down for the record!). If we can figure out the above issue, then we also need to build in the math that we usually do to calculate the confidence level. We do this manually so if it's done programmatically, it would need to happen (maybe on carebot?) as a monitoring option: if confidence reaches over 95% only then should a decision be made about promoting a winner to production (regardless of whether that's automatic or requires a human decision). After that, also everything else Tyler said about how the test is coded. |
Can we track in two GA instances. One that's ours, for better data, and one On Mon, Apr 6, 2015 at 11:46 AM, Livia Labate [email protected]
|
You can. Note: if you are using Universal Analytics you can have multiple (analytics.js) per page, but if you are still on Classic (ga.js) only one of those per page. If I recall correctly from our conversation with Dan, they were still on Classic Analytics with plans to move to Universal. Do you remember this Tyler? So, just something to check if you wanted to implement redundant tracking. |
That's correct, they have been planning to move to universal, but it hasn't happened yet, so we're stuck with a single tracking code for the moment. 👎 |
Well, not sure how GA accounts would be procured here but, what if "our" new instance was started in Universal? A single ga.js classic can co-exist with a new analytics.js universal. Possible, you think? |
That could work, although I'm a little leery of the overhead of double-implementing every event... |
On the plus side, we could wire this into analytics.js, which would make it less programming work. It's just the network overhead we'd have to worry about. Esp. on single-pipe mobile devices like iPhone. |
It's definitely worth testing the overhead to see if it's an issue. The On Mon, Apr 6, 2015 at 3:24 PM, Christopher Groskopf <
David Eads | http://recoveredfactory.net "Medical statistics will be our standard of measurement: we will weigh life |
would love to be able to make a choice about a test while a story is still hot -- not just wait for the next project.
sort of like this:
it seems like, on a hot piece, we could hit our confidence level pretty quickly.
The text was updated successfully, but these errors were encountered: