(Go: >> BACK << -|- >> HOME <<)

Page MenuHomePhabricator

Remove models with poor evaluation metrics from the published datasets repo
Closed, ResolvedPublic

Description

aswiki, ganwiki, and krcwiki had poor evaluation metrics as shown in T343374. As we work to improve their performance in T309263, they should not be deployed and have to be removed from the published datasets repo: https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/

@MGerlach and I had a look at the GrowthExperiments config that shows deployed add-a-link models and found:

  1. 'aswiki' => true in both wgGENewcomerTasksLinkRecommendationsEnabled and wgGELinkRecommendationsFrontendEnabled
  2. 'ganwiki' => false in wgGENewcomerTasksLinkRecommendationsEnabled
  3. 'krcwiki' => false in wgGELinkRecommendationsFrontendEnabled

Our understanding is that ganwiki, and krcwiki have not yet been enabled/deployed. We are going to prepare and remove them from the published datasets repo.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 950168 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[operations/mediawiki-config@master] GrowthExperiments: turn off AddLink in aswiki

https://gerrit.wikimedia.org/r/950168

@kevinbazira aswiki users have already interacted with the task so we would want to inform the community of the temporary turn off before going ahead. I will ping you back here once the dataset can be safely removed.

Thank you for the follow-up, @Sgs. We are going to go ahead and remove ganwiki and krcwiki as we wait for aswiki.

ganwiki and krcwiki datasets have been removed from the published datasets repo.

Change 950168 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: turn off AddLink in aswiki

https://gerrit.wikimedia.org/r/950168

Mentioned in SAL (#wikimedia-operations) [2023-08-22T07:09:47Z] <sgimeno@deploy1002> Started scap: Backport for [[gerrit:950168|GrowthExperiments: turn off AddLink in aswiki (T344319)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-22T07:11:22Z] <sgimeno@deploy1002> sgimeno: Backport for [[gerrit:950168|GrowthExperiments: turn off AddLink in aswiki (T344319)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-22T07:21:28Z] <sgimeno@deploy1002> Finished scap: Backport for [[gerrit:950168|GrowthExperiments: turn off AddLink in aswiki (T344319)]] (duration: 11m 41s)

@kevinbazira AddLink has been completely turn-off in aswiki. You can proceed with removing the datasets.

Thank you for the follow-up, @Sgs. We are going to go ahead and remove ganwiki and krcwiki as we wait for aswiki.

I noticed ganwiki and krcwiki disappeared from https://analytics.wikimedia.org/published/datasets/one-off/research-mwaddlink/, but not from wikis.txt. I filled this issue as T344686 and T344711.

Mentioned in SAL (#wikimedia-operations) [2023-08-22T14:59:45Z] <kevinbazira> tools.stashbot stat1008: Remove aswiki from /srv/published/datasets/one-off/research-mwaddlink/wikis.txt (T344319)

Mentioned in SAL (#wikimedia-operations) [2023-08-22T15:03:58Z] <kevinbazira> stat1008: Remove aswiki from the published datasets repo /srv/published/datasets/one-off/research-mwaddlink (T344319)

Thank you @Sgs and @Urbanecm_WMF. aswiki has been removed from the published datasets repo and wikis.txt.

kevinbazira closed this task as Resolved.EditedAug 31 2023, 3:18 PM
kevinbazira claimed this task.

To prevent mishaps like T344319#9109329 in the future, we have automated the unpublishing process of add-a-link datasets using this script: https://github.com/wikimedia/research-mwaddlink/blob/main/unpublish-datasets.sh
Moving forward, to unpublish a given wiki's datasets, one can run the following command:

WIKI_ID=<WIKI_ID> ./unpublish-datasets.sh