Main
2018-01-05

Up2date software versions for Wikidata

I'm a supporter of Wikidata and free Software. So naturally I care about Wikidata's items about free software. There are at least 17.000 of them and their quality is (as always on Wikis) very different. Lots of them came due to corresponding Wikipedia-articles, but others were created by imports from for example Gentoo's Portage.

One aspect where Wikidata could really shine is version numbers of software. If a new version of Firefox is released, traditionally the version number has to be updated in all 120 languages versions of Wikipedia which have an article about Firefox. With Wikidata this is not necessary anymore – update the version number once on Wikidata and all Wikipedias can show the newest version number instantly. Sadly this is still not reality. A lot of Wikipedia communities are still skeptical about Wikidata and so version numbers are still often edited on local Wikipedias instead of getting them by Wikidata.

One key aspect to improve this situation is to improve the data quality of Wikidata. Up till now too many of Wikidatas items about free software have outdated version numbers, which is not surprising, since if they are not used in Wikipedia they don't get updated by Wikipedians.

There are a few promising ways to improve this. Github-wiki-bot by Konstin is one – it imports version numbers from Github. But that only works for some projects. Since a short while I've been working on another way to improve this: Checking version numbers against those in the repository-database of Arch Linux. I used Arch since it contains very fresh versions of software – so most of the time the versions in the Arch repos are the newest versions available.

To check the versions, I need the Arch Package identifier (P3454) to be present in Wikidata – so I first wrote a script to help me add those. For every Software that runs on Linux, I search if the Arch repository contains one with the same name and website. With that I could add the Arc Package identifier to about roughly 600 packages.

Then I wrote a second script, that checks the version numbers of those items, against those in the Arch repositories. If the version available for Arch is newer than the newest version we have in Wikidata, I print it to a website, sorted by the size of the difference in the version numbers. You can find this list here – it's updated a few times per day.

The list contained a few items with terribly outdated versions (like 3 years old and two digits older in the major version number!). In the last days I updated several hundred items from this list by hand. Starting with those where the major version number was out of date and then all where the minor version number was out of date. The list now only contains items outdated in the third or fourth version number – all first and second version numbers are up to date. And I hope that I can keep the version numbers at least this much up to date.

This is not finished, of course. From the 17.000 items about free software only less than 700 have a reference to the corresponding Arch package! The Arch repos contain 10.000 packages. So even without knowing how large exactly the overlap between Wikidata and the Arch repos is – it's for sure much bigger than what we currently got!

Your help is needed!

wikidata floss en


Creative Commons License Up2date softwareversions for Wikidata by Michael F. Schönitzer is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.