Page MenuHomePhabricator

Create a tool to display and filter data from Schema:ExternalLinksChange
Closed, ResolvedPublic


Once implemented, Schema:ExternalLinksChange will provide a large dataset on editor linking and referencing behaviour. This can be pulled together into a tool for use by the Wikipedia community as well as The Wikipedia Library.

The community would primarily find this useful for processes such as anti-spam efforts; being able to see a full log of link additions is of considerable use when tracking down sockpuppet accounts spamming links, not just for being able to go straight to the spamming user, but also for seeing link additions that have already been reverted (Special:LinkSearch only shows existing links).

As part of the access donation program with TWL we currently send partners quarterly metrics updates detailing the total number of external links to their resources on Wikipedia. The aim is to show how many links have been added as a result of the TWL partnership and thus demonstrate its success (or failure!). However this process is imperfect in many ways. The primary drawback is that we rely on Special:LinkSearch, which shows the total number of links as added by all users, and only on a specific language Wiki. We therefore have to infer whether a set of resources is worth continuing; while this can be relatively easy if the partner had very few links present to begin with (we can assume the vast majority of any growth is because of TWL), for larger partners the TWL additions are dwarfed by the overall noise of standard non-TWL user citations. This schema will help us narrow down which partnerships are successful, where links are being added to partner resources, and what proportion of editors are actually making use of their access on Wikipedia, among plenty of other interesting data points and trends.

These two cases share requirements for most of the same data and processes, though there will be some additional requirements for each, primarily in how the data is presented and filtered.

Related Objects


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Sadads set Security to None.
Sadads added a subscriber: edsu.
Sadads removed Husky as the assignee of this task.Aug 19 2015, 3:26 PM
Sadads updated the task description. (Show Details)

@Halfak created a schema at that would improve Special:LinkSearch so that we can keep track of external link changes in Event Logging, which would make the visualization and development of this functionality easiser.

IMPORTANT: This is a message posted to all tasks under "Need Discussion" at Possible-Tech-Projects. Wikimedia has been accepted as a mentor organization for GSoC '16. If you want to propose this task as a featured project idea, we need a clear plan with community support, and two mentors willing to support it.

@Milimetric , @Halfak , @Sadads, @Abit this looks like a well-scoped project and you're listed as mentors for the same. Can we feature this for the current round of GSoC '16/Outreachy -12. Note the requirements of a project for GSoC '16/Outreachy-12
The task should not take more than 2-3 weeks for a senior developer and should have two mentors, one primary and one co-mentor confirmed.
Let us know! :)

Hm... I think some of the reporting on this data could totally be done in 2-3 weeks, yeah. But that's up to @Sadads to define what's useful. I can mentor a GSoC participant through the actual coding.

@Sumit & @Milimetric I would be happy to have this as part of GSOC . There is also a few tweaks that need to be completed by @Legoktm to T115119 to make sure that this is collecting data properly.

@Sumit what is the timeline for when GSOC participants can work on their projects?

@Sumit & @Milimetric I would be happy to have this as part of GSOC . There is also a few tweaks that need to be completed by @Legoktm to T115119 to make sure that this is collecting data properly.

Thats great news! I'll be moving this to featured for GSoC/Outreachy in that case. Please edit the task description accordingly if you feel anything needs to be modified/added/removed in light of GSoC/Outreachy.

@Sumit what is the timeline for when GSOC participants can work on their projects?

Please see GSoC timeline. Students work for a period of approximately four months from community bonding to coding, which will begin on 22nd April if this project goes ahead. The application period starts on 14th March.

@Milimetric @Sadads Also it'd be great if we could have some microtasks on which students can work to make them strong candidates for the project.


I am interested in the project “Expand Link Metric capability for reporting to Wikipedia Library partners”.

I am new to this project. Could you guide me to contribute to this project, please? Which is the best appropriate issue to start?


@Znbiz this issue is waiting on T115119. Once that's done, @Sadads should define what the first reports should look like. Then I can walk you through how to create those reports. I can make tasks for you, but the problem is just not completely defined right now.

@Znbiz this issue is waiting on T115119. Once that's done, @Sadads should define what the first reports should look like. Then I can walk you through how to create those reports. I can make tasks for you, but the problem is just not completely defined right now.

@Milimetric this project is Featured for GSoC and the application deadline for proposals is on 25th March. Some microtasks(maybe not completely related to the project) to start would greatly help new students and evaluation of the proposal.

@Sumit, @Milimetric, which microtask do you recommend? I want to participate in GSoC.

@Znbiz as a starter go through to get yourself familiar with Mediawiki in general. Then you can try to fix some easy bugs from in parallel with project discussions that follow.

@Sumit and @Znbiz: This task entirely revolves around having data that @Sadads & team finds useful. I'll try to ping him separately.

@Znbiz and @Sumit -- the main need is the a series of reports using the Special:ExternalLinkChange to reflect the needs of , Our metrics coordinator @Samwalton9 is going to create some phabricator requests for report types to help you define those to be more useful. In general, we are trying to create as many useful visualizations and other reports as can be used by both publishers, and other community organizers who need to know what is being linked to on Wikipedia.

I've created T130987, T131035, and T130988 which cover the three primary reports we're looking for. Let me know what you think of the workload for these tasks; I can definitely come up with more if that won't take long, or if it's too much then the list is in order of priority :)

@Sumit & @Milimetric I see that GSOC students are supposed to be starting in a couple of days - is this going ahead as a project?

Let me know if I can be of any help.

@Sumit & @Milimetric I see that GSOC students are supposed to be starting in a couple of days - is this going ahead as a project?

Let me know if I can be of any help.

@Samwalton9 this project did not receive proposals during this round of GSoC. We're saving it for the next round of Outreachy in September. Thanks for the help!

@Milimetric @Astinson you were mentioned as mentors of this task for GSoC 16'. Are you ready to still mentor this one for upcoming Outreachy-13(Dec-6 to March-6, 2017).
Also, @Samwalton9 you volunteered to help with this, do you want to step up as a mentor for Outreachy-13?

@Sumit There's probably no need for this to be part of Outreachy now. Once T115119 is fixed @MusikAnimal has offered to look at building the front end we need.

ggellerman removed a subscriber: Halfak.
ggellerman added a subscriber: Halfak.
ggellerman subscribed.

removing Research and Data backlog. @DarTar is still subscribed

Samwalton9-WMF renamed this task from Expand Link Metric capability for reporting to Wikipedia Library partners to Create a tool to display and filter data from Schema:ExternalLinksChange.Dec 3 2016, 5:58 PM
Samwalton9-WMF updated the task description. (Show Details)
Samwalton9-WMF raised the priority of this task from Medium to High.Dec 3 2016, 7:53 PM
Samwalton9-WMF moved this task from Backlog to TWL Metrics on the The-Wikipedia-Library board.