(Go: >> BACK << -|- >> HOME <<)

Page MenuHomePhabricator

orchestration of compositions is too slow
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

*At time of writing the only test to pass is https://www.wikifunctions.org/view/en/Z13493 which has an Orchestration Duration: 9034 ms
*All the others fail due to exceeding 10000ms. Including very simple cases like: https://www.wikifunctions.org/view/en/Z13490

What should have happened instead?:

*Can we orchestrate faster than this for a composition of a few very simple tests on some very simple strings?
*This issue is not specific to this composition or this test. I'll try to add some other simple-but-failing examples to the thread later.
*PS sorry I'm not across the details of what orchestration has to do, so apologies if I'm asking for something hard.

Details

TitleReferenceAuthorSource BranchDest Branch
Do not resolve or validate languages on Z17s.repos/abstract-wiki/wikifunctions/function-orchestrator!149apineapine-dont-resolve-languagemain
Customize query in GitLab

Event Timeline

Here's some others which seem excessively slow.

DVrandecic changed the task status from Open to In Progress.Mar 21 2024, 4:46 PM
DVrandecic triaged this task as High priority.
DVrandecic moved this task from To triage to In Progress on the Abstract Wikipedia team board.

I think one reason these are slow is that we're resolving all the labels (Z11, Z12, etc.) inside of the argument internals. We never use this information in the orchestartor, so we can just stop resolving those labels. I tested with your not(not(input)) example; with this change, we save 7 calls to the wikilambda_fetch API. There's a lot of other stuff causing slowdown, but this was a really obvious source of latency. Let's see if the performance is better once that MR is merged and deployed.

The immediate (possible) fix is now merged. I have moved this task to "To Deploy."

Because the task description is open-ended, I don't know what the completion criteria should be: system performance is something we'll probably always be tweaking. The current fix should address one source of slowness. I have added another performance-optimization task, as well.

Thanks for your work, it sounds like a good improvement. I look forward to testing it out.

Change #1017060 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-03-05-140533 to 2024-04-04-132719

https://gerrit.wikimedia.org/r/1017060

Change #1017060 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade orchestrator from 2024-03-05-140533 to 2024-04-04-132719

https://gerrit.wikimedia.org/r/1017060

I see that not(not()) now takes about 6000ms, which is definitely better, but still not what I would hope for. I also note that currently all the tests on https://www.wikifunctions.org/view/en/Z13499 timeout. But these are pretty arbitrary standards/expectations, so it's hard to say if there are still vast improvements possible.

Jdforrester-WMF subscribed.

Let's tackle this in more specific tasks.