(Go: >> BACK << -|- >> HOME <<)

Page MenuHomePhabricator

Define translatable namespaces for CX
Closed, ResolvedPublic

Description

There are valid and important use case to translate non-Main namespace articles. For example T198178. However, opening up translation for all namespaces does not make sense either. For example, Talk, Template, File... such namespaces are not translatable in Content Translation.

So we need to define a way for Content Translation to allow the selection of namespaces which contain supported content while preventing the selection of others with special content that does not work with Content Translation. Ideally we want the solution to require as little maintenance as possible.

Some possible solutions to explore:

  • Some way to detect the kind of content present in a certain namespace (is there information on the namespace configurations, metadata, Wikidata, etc. that makes this possible?).
  • Configuration list in CX of blocked namespaces (leaning on the side of allowing translation for all contents unless we have a good reason not to)
  • Configuration list in CX of allowed namespaces (leaning on the side of allowing translation only for contents that are verified to work with the tool)

Event Timeline

Pginer-WMF moved this task from Needs Triage to Enhancements on the ContentTranslation board.

We don't usually don't try to define what makes sense with arguably arbitrary restrictions.

We don't usually don't try to define what makes sense with arguably arbitrary restrictions.

It need not be considered as arbitrary restriction- if certain content structure is not supported or not tested by CX, we need not start translation of those. IMO, in general the supported namespaces should be inclusive as much as we can support.

I think the natural solution is to allow Mainspace and Wikipedia-space. Policies, guidelines and essays are often useful for translation. They may have little impact on article translation metrics, but can have a large impact on the Wikipedias. This would not be arbitrary because it is supported by community and anyone likely to translate a Wikipedia-space page is not going to do this by accident. As I see it the call to block template, module, talk, etc. is mainly to prevent accidental translations that are useless or where the tool is not useful, not to restrict vandalism, because that is just as likely to occur through translations of Mainspace articles.

For an example of a useful translation of a Wikipedia-space page see en-wiki to sv-wiki Guidenline on Reliable sources in Medicine, partly translated into Swedish through the use of the tool:

One other usecase is translation of tempalte documentation, but since the tool is not at all adapted to translating:

There is little benefit from allowing it at this time, and allowing these would require allowing all */doc filenames under Template-space for translation, which is more likely to result in faulty translations.

This comment by a translator seems to refer to the limitation captured in the current ticket. In particular it seems the user is trying to use a page under their user namespace as source for the translation.

I think translating Templates using CX should also be considered, as there are Wikipedias that fork templates from enwiki.
Translating modules and MediaWiki: pages, however, can be disallowed as their syntax are too hard to be understand able to newcomers.

Similarly to @Nikerabbit, instead of allowing particular namespaces, I'd go for excluding namespaces that don't make sense and allowing everything else. Not just because avoiding arbitrary restrictions is usually the right thing to do, but also because some wikis add extra namespaces, and they usually should be translatable: Draft and Portal spaces exist in several wikis; the Txiki space on the Basque Wikipedia is practically the same as the article space, and similar spaces may be created in other languages; etc.

These should probably be excluded:

  • Special, if it's relevant at all
  • All talk and Flow namespaces: Spontaneous discussions don't need translation. However, don't exclude the Project namespace (a.k.a. Wikipedia) - discussions are often held there, but people sometimes want to translate policy pages, essays, etc.
  • File: It often makes sense to make a copy of a fair-use file and to translate the descriptions, but they are usually short and technical, so we shouldn't make an effort to support the otherwise special File space.
  • MediaWiki: It's already a kind of a translation interface.
  • Template, Module, Gadget: They should be translatable, but not like this :)

Perhaps Wikidata adds special technical namespaces that shouldn't be translatable, but CX is unlikely to be installed on a Wikibase wiki any time soon.

The Category space is also somewhat special, but if it's possible to translate it, I suggest not to limit it. I remember at least one request for translating a Category page.

All other spaces should be translatable: User, Project (Wikipedia), Draft, Portal, etc.

Similarly to @Nikerabbit, instead of allowing particular namespaces, I'd go for excluding namespaces that don't make sense and allowing everything else.

I tried to capture the different options discussed in the ticket description to explore when the time comes. I think it makes more sense to add exclusions of namespaces, rather than the other way around. It would be also great if there were a way to identify the type of namespace without having to manually list them, but I don't know if there is any metadata that "regular wikitext" namespaces share.

When exposing other namespaces we may want the suggestions shown when searching to still give preferences to the main namespace, unless the prefix of another namespace is explicitly written in the search query. That is, typing "Signature" will show results only for the main namespace. Typing "Wikipedia:Signature" will show the results from the Wikipedia namespace.

Change 508790 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Allow searching for titles in any namespace, if query has it

https://gerrit.wikimedia.org/r/508790

https://gerrit.wikimedia.org/r/508790 allows search in all namespaes. Note that this patch did not implement any exclusion or inclusion checks.

image.png (434×1 px, 46 KB)

@Petar.petkovic commented in this patch:

Presence of ':' in article title does not guarantee it is in non-main namespace. Take en:Terminator_2:_Judgment_Day for example.

I think this is acceptable. There is not much to gain from adding lot of code to validate the prefix when the search is based on prefixes(for example, refer the search results from "Terminator:")

Implementing a validator by using namespace cache and fetching namespaces from sourcewiki is the way to be strict about these values, but that strictness to handle a smallset of titles with ':' in it is not worth IMO. The context here is a source selector and it is the user who select these titles only if that title exists in source wiki.

Basically, we just remove the namespace constraint when the value user typed is having a ":".

@Pginer-WMF confirm.

Change 509018 had a related patch set uploaded (by Santhosh; owner: Santhosh):
[mediawiki/extensions/ContentTranslation@master] Define excluded namespaces for Content translation

https://gerrit.wikimedia.org/r/509018

@Pginer-WMF confirm.

Ok so we are restricting the search to the main namespace when there is no ":" and searching everywhere where there is a ":". That sounds good, but I'd like to confirm this would lead to the following behaviour:

  • Typing "Signature" will show among the results the Signature article but not the "Wikipedia:Signatures" documentation page.
  • Typing "Wikipedia:Signature" will show among the results "Wikipedia:Signatures" documentation page.
  • Typing "Terminator:" will show among the results "Terminator: Dark Fate" article, and potentially also pages under an hypothetical "Terminator" namespace.

This sounds good to me. Are the examples above correct?

@Pginer-WMF confirm.

Ok so we are restricting the search to the main namespace when there is no ":" and searching everywhere where there is a ":". That sounds good, but I'd like to confirm this would lead to the following behaviour:

  • Typing "Signature" will show among the results the Signature article but not the "Wikipedia:Signatures" documentation page.

Yes

  • Typing "Wikipedia:Signature" will show among the results "Wikipedia:Signatures" documentation page.

Yes

  • Typing "Terminator:" will show among the results "Terminator: Dark Fate" article, and potentially also pages under an hypothetical "Terminator" namespace.

Yes

Awesome, then I think the provided support is enough for the user needs.

Change 508790 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Allow searching for titles in any namespace, if query has it

https://gerrit.wikimedia.org/r/508790

Change 509018 merged by jenkins-bot:
[mediawiki/extensions/ContentTranslation@master] Define excluded namespaces for Content translation

https://gerrit.wikimedia.org/r/509018

Notes for QA:

  • All talk namespaces are excluded and cannot be found by searching in "New translation" dialog, including Flow boards.
  • Other namespaces excluded from translating:
    • File
    • Gadget
    • Gadget definition
    • MediaWiki
    • Module
    • Template
    • Translations
  • Extra namespaces that wikis can define are not excluded, but their talk pages are.

Different namespaces appear in the search results and the translation page loads them.
I noticed that when translating a page of another namespace the target destination is still the main namespace. The expectation may be to use the equivalent namespace, although that may bring also some complications. Based on observations we'll decide whether a follow-up ticket is worth it.