Wikipedia:Overcategorization

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Categorization is a useful tool to group articles for ease of navigation, and correlating similar information. However, not every verifiable fact (or the intersection of two or more such facts) in an article requires an associated category. For lengthy articles, this could potentially result in hundreds of categories, most of which aren't particularly relevant. This may also make it more difficult to find any particular category for a specific article. Such overcategorization is also known as "category clutter".

To address these concerns, this page lists types of categories that should generally be avoided. Based on existing guidelines and previous precedent at Wikipedia:Categories for discussion, such categories, if created, are likely to be deleted.

Non-defining characteristics[edit]

See also: Wikipedia:Categorization of people § Categorize by defining characteristics and Wikipedia:Defining

One of the central goals of the categorization system is to categorize articles by their defining characteristics:

A central concept used in categorizing articles is that of the defining characteristics of a subject of the article. A defining characteristic is one that reliable sources commonly and consistently define[1] the subject as having—such as nationality or notable profession (in the case of people), type of location or region (in the case of places), etc.

Categorization by non-defining characteristics should be avoided. It is sometimes difficult to know whether or not a particular characteristic is "defining" for any given topic, and there is no one definition that can apply to all situations. However, the following suggestions or rules-of-thumb may be helpful:

  • a defining characteristic is one that reliable, secondary sources commonly and consistently define, in prose, the subject as having. For example: "Subject is an adjective noun ..." or "Subject, an adjective noun, ...". If such examples are common, each of adjective and noun may be deemed to be "defining" for subject.
  • if the characteristic would not be appropriate to mention in the lead portion of an article, it is probably not defining;
  • if the characteristic falls within any of the forms of overcategorization mentioned on this page, it is probably not defining.

Often, users can become confused between the standards of notability, verifiability, and "definingness". Notability is the test that is used to determine whether a topic should have its own article. This test, combined with the test of verifiability, is used to determine whether particular information should be included in an article about a topic. Definingness is the test that is used to determine whether a category should be created for a particular attribute of a topic. In general, it is much easier to verifiably demonstrate that a particular characteristic is notable than to prove that it is a defining characteristic of the topic. In cases where a particular attribute about a topic is verifiable and notable but not defining, or where doubt exists, creation of a list article is often the preferred alternative.

In disputed cases, the categories for discussion process may be used to determine whether a particular characteristic is defining or not. For example, there is consensus that places should not be categorised as established in the year of the earliest surviving historical record of the place.

Small with no potential for growth[edit]

Example: The Beatles' wives, Husbands of Elizabeth Taylor, Catalan-speaking countries

Avoid categories that, by their very definition, will never have more than a few members, unless such categories are part of a large overall accepted sub-categorization scheme, such as subdividing songs in Category:Songs by artist or flags in Category:Flags by country.

Note also that this criterion does not preclude all small categories; a category which does have realistic potential for growth, such as a category for holders of a notable political office, may be kept even if only a small number of its articles actually exist at the present time. Also, subcategories of Category:Works by creator may be created even if they include only one page.

Narrow intersection[edit]

Example: Pre-1933 two-digit Virginia state highways

If an article is in "category A" and "category B", it does not follow that a "category A and B" has to be created for this article. Such intersections tend to be very narrow, and clutter up the page's category list. Even worse, an article in categories A, B and C might be put in four such categories "A and B", "B and C", "A and C" as well as "A, B and C", which clearly isn't helpful.

In general, intersection categories should only be created when both parent categories are very large and similar intersections can be made for related categories.

Mostly overlapping categories[edit]

Example: 1971 National League All-Stars, 1852 religious leaders

If two or more categories have a large overlap (e.g. because many athletes participate in multiple all-star games, and religious leadership does not radically change from year to year), it is generally better to merge the subjects to a single category, and create lists to detail the multiple instances.

Arbitrary inclusion criteria[edit]

Examples: School districts at the top 7% in Pennsylvania on Pennsylvania standardized tests, Locations with per capita incomes over $30,000, Category:100th episodes

There is no particular reason for choosing "7%", "$30,000", or the 100th episode as cutoff points in these cases. Likewise, a school district with 3,800 students is not meaningfully different from one with 4,100 students. A better way of representing this kind of information is to put it in an article such as "List of school districts in (region) by size". Note that Wikipedia allows a table to be made sortable by any column.

Categorization by year, decade, century, or other well-defined time period (such as historical era), as a means of subdividing a large category, is an exception to this. When you create a categorization by time period, you should state the inclusion criteria clearly at the top of the category (e.g. This category is for politicians who were active in the 19th century is not the same as This category is for politicians who were born in the 19th century)

Miscellaneous categories[edit]

Examples: People of the Moravian Church miscellaneous, Brass bands of other countries, Uncategorised songs

Do not categorize articles into "miscellaneous", "other", "not otherwise specified" or "remainder" categories. It is not necessary to completely empty every parent category into its subcategories. If there are some articles that don't fit appropriately into any of the standard subcategories, leave the articles in the parent category. The articles categorized together as "other" or "miscellaneous" generally will have little in common and therefore should not be categorized together in a dedicated "miscellaneous" category.

Eponymous categories for people[edit]

See also: Wikipedia:Eponymous categorization, Wikiproject:BLP categorization Examples: Tim Halperin, Jena Irene, Clement Meadmore

Eponymous categories named after people should not be created unless enough directly related articles or subcategories exist. Individual works by a person should not be included directly in an eponymous category but should instead be in a (sub)category such as Category:Novels by Agatha Christie. As with all categories a choice has to be made whether it is a "people" category (only containing biographical articles) or not (not containing a single biography beyond the main article) to keep people categories separate. Practically, even most notable people lack enough directly related articles or subcategories to populate eponymous categories effectively but Category:Barack Obama, Category:John Maynard Keynes and Category:Albert Einstein are some exceptions. Fans of celebrities should be cautious to avoid adding clutter to eponymous categories.

People associated with[edit]

Examples: People associated with John McCain, People associated with Pope Pius XI, People associated with Madonna, People associated with the hippie movement

The problem with vaguely-named categories such as this is determining what degree or nature of "association" is necessary to qualify for inclusion in the category. The inclusion criteria for these "associated with X" categories are usually left unstated, which fails WP:OC#SUBJECTIVE; but applying some threshold of association may fail WP:OC#ARBITRARY. While this is most commonly used for people, the same thing applies to other things "associated with" someone or something, such as films associated with Generation X, places associated with The Beatles, or hospitals and medical institutions associated with the 2019–20 coronavirus pandemic.

However, it may be appropriate to have categories whose title clearly conveys a specific and defined relationship to another person, such as Category:Obama family or Category:Obama administration personnel.

Unrelated subjects with shared names[edit]

Examples: Ice-named rappers, Churches named for St. Dunstan, Fictional Misters

Avoid categorizing by a subject's name when it is a non-defining characteristic of the subject, or by characteristics of the name rather than the subject itself.

For example, a category for unrelated people who happen to be named "Jackson" is not useful. However, a category may be useful if the people, objects, or places are directly related—for example, a category grouping subarticles directly related to a specific Jackson family, such as Category:Jackson family (show business).

When confronted with subjects that share a name, a disambiguation page might be a possible solution.

Intersection by location[edit]

Examples: Roman Catholic bishops of Ohio, Quarterbacks from Louisiana, Male models from Dallas, Texas

Geographical boundaries may be useful for dividing subjects into regions that are directly related to the subjects' characteristics (for example, Roman Catholic Bishops of the Diocese of Columbus, Ohio or New Orleans Saints quarterbacks).

In general, avoid subcategorizing subjects by geographical boundary if that boundary does not have any relevant bearing on the subjects' other characteristics. For example, quarterbacks' careers are not defined by the specific state that they once lived in (unless they played for a team within that state).

However, location may be used as a way to split a large category into subcategories. For example, Category:American writers by state.

Trivial characteristics or intersection[edit]

Example: Celebrity Gamers, Red haired kings, Bald People, Famous redheads, Deaths by age, Mirrors in fiction

Avoid categorizing topics by characteristics that are unrelated or wholly peripheral to the topic's notability.

For biographical articles, it is usual to categorize by such aspects as their career, origins, and major accomplishments. In contrast, someone's tastes in food, their favorite holiday destination, or the number of tattoos they have would be considered trivial. Such things may be interesting information to include in an article, but not useful for categorization. If something could be easily left out of a biography, it is likely that it is a trivial characteristic.

Note that this form of overcategorization also applies to grouping people by trivial circumstances of their deaths, such as categorizing people by the age at which they died or the place of death or by whether they still had unreleased or unpublished work at the time of their death. Even though such categories may be interesting to some people, they aren't particularly encyclopedic.

Subjective inclusion criteria[edit]

Examples: Obese people, Cult actors, Mysterious musicians, Outstanding Canadians, Wars France lost, Racist people

Adjectives which imply a subjective, vague, or inherently non-neutral inclusion criterion should not be used in naming/defining a category. Examples include subjective descriptions (famous, notable, great), any reference to relative size (large, small, tall, short), relative distance (near, far), or character trait (beautiful, evil, friendly, greedy, honest, intelligent, old, popular, ugly, young).

Non-notable intersections by ethnicity, religion, or sexual orientation[edit]

Example: Jewish mathematicians, LGBT murderers, Sportspeople by religion

Dedicated group-subject subcategories, such as Category:LGBT writers or Category:African-American musicians, should only be created where that combination is itself recognized as a distinct and unique cultural topic in its own right. If a substantial and encyclopedic head article (not just a list) cannot be written for such a category, then the category should not be created. Please note that this does not mean that the head article must already exist before a category may be created, but that it must at least be reasonable to create one.

Likewise, people should only be categorized by ethnicity or religion if this has significant bearing on their career. For instance, in sports, a Roman Catholic athlete is not treated differently from a Lutheran or Methodist. Similarly, in criminology, a person's actions are more important than their race or sexual orientation. While "LGBT literature" is a specific genre and useful categorization, "LGBT quantum physics" is not.

Opinion about a question or issue[edit]

Example: Cat lovers, Iraq liberation opposition, Star Trek fans

Avoid categorizing people by their personal opinions, even if a reliable source can be found for the opinions. This includes supporters or critics of an issue, personal preferences (such as liking or disliking green beans), and opinions or allegations about the person by other people (e.g. "alleged criminals"). Please note, however, the distinction between holding an opinion and being an activist, the latter of which may be a defining characteristic (see Category:Activists).

Potential candidates and nominees[edit]

Example: Potential 2008 Republican U.S. Presidential Candidates (deleted in November 2006)

Wikipedia is not a crystal ball. A candidate not yet nominated for public office, the possible next CEO of a certain corporation, a potential member of a sports team, an actor on the "short list" to play a role, or an award nominee (just to name a few examples) should not be grouped by category. Lists may sometimes be appropriate for such groupings, especially after the passage of the events to which they relate.

Award recipients[edit]

Example: Category:MTV Movie Award winners, Category:Honorary citizens of Berlin, Category:People who have received honorary degrees from Harvard University

A category of award recipients should exist only if receiving the award is a defining characteristic for the large majority of its notable recipients. A recipient of an award should be added to a category of award recipients only if receiving the award is a defining characteristic of the recipient.

Per Wikipedia:Categories, lists, and navigation templates, the existence of lists and categories is determined by separate criteria. So regardless of whether a category is created, a list of the recipients may be created if the list meets the notability criteria. If both a category and a list are viable on the same topic, such a list may make a suitable main article for the category, indicated with the {{Cat main}} template.[2]

Published list[edit]

Example: Rolling Stone's 500 Greatest Albums

Magazines and books regularly publish lists of the "top 10" (or some other number) in any particular field. Such lists tend to be subjective and may be somewhat arbitrary. Some particularly well-known and unique lists such as the Billboard charts may constitute exceptions, although creating categories for them may risk violating the publisher's copyright or trademark.

Venues by event[edit]

Example: WrestleMania venues, Republican National Convention venues, Democratic National Convention venues

There is no encyclopedic value in categorizing locations by the events or event types that have been held there, such as arenas that have hosted specific sports events or concerts, convention centers that have hosted specific conventions or meetings, or cities featured in specific television shows that film at multiple locations.

Likewise, avoid categorizing events by their hosting locations. Many notable locations (e.g. Madison Square Garden) have hosted so many sports events and conventions over time that categories listing all such events would not be readable.

However, categories that indicate how a specific facility is regularly used in a specific and notable way for some or all of the year (such as Category:National Basketball Association venues) may sometimes be appropriate.

See also #Performers by series or performance venue.

Performers by performance[edit]

Avoid categorizing performers by their performances. Examples of "performers" include (but are not limited to) actors/actresses (including pornographic actors), comedians, dancers, models, orators, singers, etc.

This includes categorizing a production by performers' performances. For example, just as we shouldn't categorize a performer by action or appearance, we shouldn't categorize a production by a performer's action or appearance in that production.

Performers by action or appearance[edit]

Examples: Actresses who have appeared veiled, Anal porn actress, Musicians who play left-handed. Saxophonists who are capable of circular breathing

Avoid categorizing performers by some action they may have performed (such as a "pirouette", a "runway walk", a "spit take", a "sword fight", "anal sex", etc.); some method of performance (such as while standing on their head, left-handed, etc.); or how they may have chosen to appear (such as bald, veiled, etc.)

Performers by role or composition[edit]

  • Performers who have portrayed <character name>
  • Performers who have portrayed <a type of character>
  • Performers who have performed <a specific work>
Examples: American dramatic actors, Actors that portrayed heroes or villains, Jim Steinman artists, Actresses who portrayed Lois Lane, Actors who have played serial killers, Actors who have played gay characters, Actors who played HIV-positive characters, Actors who have played the President of the United States, and Actors who have played Doctor Who.

Avoid categories which categorize performers by their portrayal of a role. This includes portraying a specific character (such as Darth Vader, or Hamlet). This also includes voicing animated characters (such as Donald Duck), or doing "impressions"; portraying a "type" of character (such as wealthy, poor, religious, homeless, gay, female, politician, Scottish, dead, etc.); or performing a specific work (such as Amazing Grace, "Waltz of the swans" from Swan Lake, "To be or not to be" from Hamlet (the play), "Why did the chicken cross the road?" (a joke), etc.).

Similarly, avoid categorizing artists based on producers, film directors or other artists they have worked with (such as "George Martin musicians" or "Steven Spielberg actors"). Performers are defined by their body of work, not by the people they have associated with professionally. For example, Tom Hanks is distinguished by his performances as an actor, not by the fact that he has appeared in Steven Spielberg's films.

Performers by series or performance venue[edit]

  • Performers who have performed at <location>
  • Performers who have performed on <production>
Examples: Artists who played Coachella, Saturday Night Live musical guests, Ozzfest performers, Celebrity Poker Showdown players, Entertainers who performed for troops during the Vietnam War, and Actors by series

Avoid categorizing performers by an appearance at an event or other performance venue. This also includes categorization by performance—even for permanent or recurring roles—in any specific radio, television, film, or theatrical production (such as The Jack Benny Program, M*A*S*H, Star Wars, or Phantom of the Opera).

Note also that performers should not be categorized into a general category which groups topics about a particular performance venue or production (e.g. Category:Star Trek), when the specific performance category would be deleted (e.g. Category:Star Trek script writers).

See also #Venues by event.

Role or composition by performer[edit]

  • <Characters> who have been portrayed by a specific performer
  • <Types of characters> which have been portrayed by a specific performer
  • <Works> which have been portrayed by a specific performer
Examples: Fictional characters by actor, Characters portrayed by Johnny Depp, Characters Portrayed by Leslie Nielsen, Fictional characters portrayed by Peter Dinklage, Fictional characters portrayed by Christopher Lee, Films by star, Films starring Jim Carrey

Avoid categorizing characters or specific works by the performers who have portrayed them or appeared in them. A typical film or television series has many actors in various roles, so categorizing by actor results in needless clutter. Similarly, some roles, particularly animated ones like Donald Duck and historical/mythological figures like Zeus, have been performed by multiple actors, and being performed by a particular actor is seldom a defining trait for such roles.

Notes[edit]

  1. ^ in prose, as opposed to a tabular or list form
  2. ^ Per this RfC

See also[edit]