(Go: >> BACK << -|- >> HOME <<)

Page MenuHomePhabricator

Deploy "add a link" to 18th round of wikis (en.wp and de.wp)
Open, HighPublic1 Estimated Story Points

Description


https://wikitech.wikimedia.org/wiki/Add_Link#Enabling_on_a_new_wiki


English Wikipedia's specificities

English Wikipedia has a strict enforcement of https://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking#What_generally_should_not_be_linked.
Folly Mox gave us a very nice summary of how things are done at that wiki.

A script exists to track and remove what is considered overlinking. Users get a button in their toolbar that will automatically remove common terms. Also, they can connect that script to AWB.

This script could be used to improve the model.

Details

Event Timeline

The training pipelines of the two biggest wikis run for a really long time and got stuck a couple of times but they have finally completed and generated models for the 18th round.

Model evaluation has been completed and below are the backtesting results:

Precision@0.5Recall@0.5
dewiki0.790.48
enwiki0.810.45

All languages have passed the evaluation and will be deployed.

kevinbazira added a subscriber: kostajh.

@kostajh, we published datasets for all models that passed the evaluation in this round.

elukey moved this task from In Progress to Watching on the Machine-Learning-Team board.
elukey added a subscriber: kevinbazira.
Trizek-WMF renamed this task from Deploy "add a link" to 18th round of wikis to Deploy "add a link" to 18th round of wikis (en.wp and de.wp).Oct 5 2023, 8:26 AM
Trizek-WMF updated the task description. (Show Details)

@kevinbazira, I just learned that Engish Wikipedia has a script to track and remove what is considered overlinking. This script could be used to improve the model, and then fit the community's common practices. It would help the deployment at this wiki a lot. More details at:

@Trizek-WMF, thank you so much for sharing this script that helps to curb overlinking. I am looping in @MGerlach, since he will work on improving add-a-link model performance, this might interest him.

We will work on this task at the beginning of 2024.

We will work on this task at the beginning of 2024.

I thought we were aiming to enable this wikis before the end of 2023. Is there a particular reason to do it in early 2024?

We have to make proper community engagement, which is not doable at the moment as I'm working on T346108: [EPIC] IP Masking: StructuredDiscussions (Flow)/LiquidThreads Community discussion.

Trizek-WMF moved this task from Triaged to Up Next on the Growth-Team board.
KStoller-WMF updated the task description. (Show Details)
KStoller-WMF updated the task description. (Show Details)
KStoller-WMF set the point value for this task to 1.
Trizek-WMF set Due Date to Tue, May 21, 4:00 PM.
Trizek-WMF raised the priority of this task from Medium to High.Tue, May 14, 2:12 PM

We aren't ready to run the script to populate the suggestions. given our backlog, we are moving this task to our next sprint.

Trizek-WMF changed Due Date from Tue, May 21, 4:00 PM to Tue, Jun 4, 4:00 PM.Fri, May 17, 5:23 PM
Trizek-WMF updated the task description. (Show Details)
Trizek-WMF moved this task from Inbox to Up Next on the Growth-Team board.

Change #1033889 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[operations/mediawiki-config@master] [Growth] enwiki: Enable AddLink backend

https://gerrit.wikimedia.org/r/1033889

As far as I can see, this task asks for a "stealth" (or "dark mode") deployment of add a link to enwiki/dewiki, so that the quality of the recommendations can be reviewed. This means we will be maintaining a task pool of recommendations, without necessarily showing those recommendations to any users.

I reviewed this task today, in order to determine whether this is possible without any code changes. As of now, we have the following two variables in GrowthExperiments:

  • GENewcomerTasksLinkRecommendationsEnabled: which turns on the backend (and ensures the task pool is ready for the users to use),
  • GELinkRecommendationsFrontendEnabled: which makes the task available to users (assuming it is also enabled via CommunityConfiguration, of course).

If we turn on the first one, but keep the second one turned off, then the task pool should get populated (and maintained), but the task will not be visible to the end user. This is what we normally do as part of our preparation of a deployment, to ensure the task pool is ready when the first users visit their homepage after the task gets enabled.

It turns out that on dewiki, this is already the case. The backend of Add Link is enabled there (and the task pool gets refreshed periodically as well), but in Community configuration, it appears as "Disabled in site configuration", which appears to be what this task asks for. Assuming we want enwiki to be in the same state as dewiki (task pool maintained, but unused), then that should be easy to do. If we do that, then the Growth team would need to be involved for showing Add Link to users (to set the frontend flag on), but the involvement would be minimal (~30 mins for an engineer).

If the goal is to ensure admins can enable Add Link at those two wikis at any time (w/o any further involvement from our team), then things get a little bit more tricky. When dewiki asked for Add Link to be turned off on their wiki, we originally did that via CommunityConfiguration, but for some reason, users were still getting it (and saving tasks). Related conversation about this is at T288420 and T294712.

Since this was nearly 3 years ago, I'm not sure what exactly happened from the top of my head. If we do want to make it possible for admins to enable AddLink via CommunityConfiguration at any time, we would need to figure that out (or at least, figure out whether it is going to be a problem again). Theoretically, whatever happened back then might not be a problem at this point, as it did happen when the A/B testing code for structured AddLink was in place (see removal patch from Nov 2021).

Back then, enablement of AddLink depended not only on site configuration, but also on the user in question (due to the A/B testing). Since we no longer have the A/B testing code in place, availability of structured AddLink should depend only on site configuration (whether server-side or in CommunityConfiguration) at this point. If the bug that affected us in 2021 was in the A/B testing code for Add Link, then the bug should no longer appear. But, that is something we would need to figure out (and test).

Summary: As of now, it is very easy to enable the backend (as requested here), but leave the frontend switch off on the server side. This will mean later deployment of Add Link would be much easier (as the pool would be ready already), but it wouldn't be possible to enable Add Link via CommunityConfiguration (we would still need to be involved). If we want admins to be able to self-serve this, we would need to do the investigation I described above, and release only after the investigation is done. Because we need to enable the backend either way, I'll do that part, and leave the rest for later.

@Trizek-WMF @KStoller-WMF Would you mind clarifying what would be the intended end result here?

Change #1033889 merged by jenkins-bot:

[operations/mediawiki-config@master] [Growth] enwiki: Enable AddLink backend

https://gerrit.wikimedia.org/r/1033889

Mentioned in SAL (#wikimedia-operations) [2024-05-20T08:02:45Z] <urbanecm@deploy1002> Started scap: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]]

Mentioned in SAL (#wikimedia-operations) [2024-05-20T08:05:14Z] <urbanecm@deploy1002> urbanecm: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-05-20T08:19:52Z] <urbanecm@deploy1002> Finished scap: Backport for [[gerrit:1033889|[Growth] enwiki: Enable AddLink backend (T308144)]] (duration: 17m 07s)

As far as I can see, this task asks for a "stealth" (or "dark mode") deployment of add a link to enwiki/dewiki, so that the quality of the recommendations can be reviewed. This means we will be maintaining a task pool of recommendations, without necessarily showing those recommendations to any users.

I reviewed this task today, in order to determine whether this is possible without any code changes. As of now, we have the following two variables in GrowthExperiments:

  • GENewcomerTasksLinkRecommendationsEnabled: which turns on the backend (and ensures the task pool is ready for the users to use),
  • GELinkRecommendationsFrontendEnabled: which makes the task available to users (assuming it is also enabled via CommunityConfiguration, of course).

If we turn on the first one, but keep the second one turned off, then the task pool should get populated (and maintained), but the task will not be visible to the end user. This is what we normally do as part of our preparation of a deployment, to ensure the task pool is ready when the first users visit their homepage after the task gets enabled.

It turns out that on dewiki, this is already the case. The backend of Add Link is enabled there (and the task pool gets refreshed periodically as well), but in Community configuration, it appears as "Disabled in site configuration", which appears to be what this task asks for. Assuming we want enwiki to be in the same state as dewiki (task pool maintained, but unused), then that should be easy to do. If we do that, then the Growth team would need to be involved for showing Add Link to users (to set the frontend flag on), but the involvement would be minimal (~30 mins for an engineer).

If the goal is to ensure admins can enable Add Link at those two wikis at any time (w/o any further involvement from our team), then things get a little bit more tricky. When dewiki asked for Add Link to be turned off on their wiki, we originally did that via CommunityConfiguration, but for some reason, users were still getting it (and saving tasks). Related conversation about this is at T288420 and T294712.

Since this was nearly 3 years ago, I'm not sure what exactly happened from the top of my head. If we do want to make it possible for admins to enable AddLink via CommunityConfiguration at any time, we would need to figure that out (or at least, figure out whether it is going to be a problem again). Theoretically, whatever happened back then might not be a problem at this point, as it did happen when the A/B testing code for structured AddLink was in place (see removal patch from Nov 2021).

Back then, enablement of AddLink depended not only on site configuration, but also on the user in question (due to the A/B testing). Since we no longer have the A/B testing code in place, availability of structured AddLink should depend only on site configuration (whether server-side or in CommunityConfiguration) at this point. If the bug that affected us in 2021 was in the A/B testing code for Add Link, then the bug should no longer appear. But, that is something we would need to figure out (and test).

Summary: As of now, it is very easy to enable the backend (as requested here), but leave the frontend switch off on the server side. This will mean later deployment of Add Link would be much easier (as the pool would be ready already), but it wouldn't be possible to enable Add Link via CommunityConfiguration (we would still need to be involved). If we want admins to be able to self-serve this, we would need to do the investigation I described above, and release only after the investigation is done. I started filling the task pool, and I'll wait with any further action for clarification.

@Trizek-WMF @KStoller-WMF Would you mind clarifying what would be the intended course of action here?

Thanks for the detailed explanation!

Ideally we want communities to be able to easily disable and enable this task independently; with no involvement from WMF.
However, given that we are nearing the CommunityConfiguration Extension release, I don't think it makes sense to investigate the underlying issue in Special:EditGrowthConfig, and instead wait until the new extension is released and then take the time to follow up.

I've attempted to summarize the issue and next steps with the enwiki community here.

Wikimedia Deutschland is handling the communication with dewiki, so I'll also reach out to our contacts there with a summary of the current state of this release.