Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify reducing reservation function to reflect rho effect in weight-based phase. #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bspark8
Copy link
Collaborator

@bspark8 bspark8 commented Jul 15, 2016

We modified reducing reservation tags procedure to reflect rho effect in the distributed environment during a weight-based phase.
And, this modification has also an effect that reduces IOPs fluctuation in the whole of distributed environment system.

Thank you.

@ivancich
Copy link
Member

Is this change related to the delayed tag calculation? It seems to be working to address the same performance issue.

@bspark8
Copy link
Collaborator Author

bspark8 commented Jul 26, 2016

Sorry for late replay. And yes, you are right.

First, this change intended to reduce Rho times multiple increment of reservation tag in the distributed environment during a weight-based phase.
And as you said, this change addressed the last same performance fluctuation issue without the Delayed Tag Calculation at default dmc_sim_100th configuration.
But, we found another case, that is about relation between Delayed Tag Calculation and Reflect Rho Effect.

If we used higher server_iops (see below Ex.) than default dmc_sim_100th configuration, we can obtain the following combination table.

table_1

The result means that the current default DTC only option is a better way among above options in distributed environment.

By supplement, the dmClock paper’s Rho, Delta based distribution method incurs some issues.
For example, current running simulator shows higher reservation phase count than theoretical reservation phase count because current each server’s received Rho value is smaller than expected value.
In other words, if we have four servers in simulator, a client’s requests will be distributed to four servers and our expected each server’s received overall Rho value is 4. And current is not 4. about 2.494.
Currently, we are looking into workaround way for Rho, Delta based distribution method issue and it would be great to let you know if we can define or find that way.

Thus, I couldn't decide whether this Pull Request is a missing part for Enabling DO_NOT_DELAY_TAG_CALC case.
If you don’t think so, I will close this Pull Request.

Thanks.

[global]
server_groups = 1
client_groups = 2
server_random_selection = false
server_soft_limit = false

[client.0]
client_count = 99
client_wait = 0
client_total_ops = 1500
client_server_select_range = 10
client_iops_goal = 120
client_outstanding_ops = 100
client_reservation = 20.0
client_limit = 80.0
client_weight = 1.0

[client.1]
client_count = 1
client_wait = 10
client_total_ops = 1500
client_server_select_range = 10
client_iops_goal = 120
client_outstanding_ops = 100
client_reservation = 20.0
client_limit = 80.0
client_weight = 1.0

[server.0]
server_count = 100
server_iops = 200
server_threads = 1

@ivancich
Copy link
Member

ivancich commented Aug 3, 2016

Thank you for that detailed explanation. I'm very interested in your understanding of the situation and want to make sure I'm fully understanding the points you are making. So I hope you won't mind my asking additional questions.

Does "RRE" refer to the new code in this pull request? And if so, does this code therefore increase the standard deviation of # of IOPS? And if so, is that good or bad from your perspective?

I'm also trying to understand another aspect of the argument. You say, "For example, current running simulator shows higher reservation phase count than theoretical reservation phase count because current each server’s received Rho value is smaller than expected value." But it seems like the core effect of this pull request is to reduce the reservation tags even further in the reduce_reservation_tags function. So would this change in code increase the reservation phase count?

Thank you!

@TaewoongKim
Copy link
Collaborator

In my opinion, I think bspark found something missing point about reducing resv tag value. But that was a part of the problem. So, even he fixed wrong reducing problem but there was worse result in some cases because dmclock has another problem too.

The another problem is low rho value. low rho value make more IOs be serviced by reservation because reservation tag increase slower than being expected. In the paper & code, the rho value is counted with just IOs completed by reservation phase. But the counting interval is between with every IOs(not IOs that will be serviced in reservation phase). I think it's not fair ratio calculation and it makes rho value be low. I think rho value counting method needs some changes or rho need to be removed and use delta instead of rho.

bspark's code(RRE) reduce reservation tag but if wrong rho calculation is fixed it will be compensated and I believe there will be better result. First of all, we need to find some example configuration that can prove that current rho value counting method is wrong. If I find one I will notice it.
(By the way, bspark is in his vacation until this weekend.)

@ivancich
Copy link
Member

ivancich commented Aug 4, 2016

Thank you for clarifying your understanding. Your argument that the rho values are too small makes sense. I'll hold off on merging this for the time being since it does exacerbate the problem. But if we can fix rho values, this code will likely be of value.

@bspark8 bspark8 force-pushed the wip_reduce_resevation_tag branch from 659942d to d366392 Compare February 21, 2017 06:11
@ivancich
Copy link
Member

@bspark8 Where do we stand on this PR now that we've merged the PR with the borrowing tracker (#39)? Is this PR still of value? Will it need re-analysis?

@bspark8
Copy link
Collaborator Author

bspark8 commented Oct 10, 2017

Sorry for the late reply. Last week was a holiday.
As you said, the current PR may need to be reanalyzed due to the merged of the borrowing tracker (#39).
Apart from the above issue, to see the bad effect of wrong reducing reservation, we think we should assume a special situation.
For example, if I/O goes to a weight-phase and suddenly changes to a reservation-phase.
It would be great if we can share result of re-analysis soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants