Nudin's BlogBlog by Michael F. Schönitzer2022-11-27T14:37+01:00Michael F. Schönitzermichael at schoenitzer dot dehttps://www.schoenitzer.de/blog/Shorter Hacks 20: Git push previous commit2022/Shorter Hacks 21 Git push previous commit2022-11-272022-11-27T14:37:16+01:00
<div class="editable">
<div id="Shorter Hacks 21: Git push previous commit"><h1 id="Shorter Hacks 21: Git push previous commit" class="header">Shorter Hacks 21: Git push previous commit</h1></div>
<p>
You can not only push the current state of a branch to its remote branch, you
can push any commits to any remote branch. This way you can for example push
all your commit except the latest with:
</p>
<pre>
git push remotename @~:branchname
</pre>
<p>
I regularly use this when my current commit is not finished yet, but I want to push
the already finished work done in previous commits. How does this work? The
first argument to git push is the remote to which you want to push code (usually
<code>origin</code>), the second specifies which commit to push to which remote branch. The
<code>@</code> is short for <code>HEAD</code> – the current commit, and the <code>~</code> selects its parent
commit. You can use <code>@~~</code> or <code>@~2</code> if you want to skip the last two commits and
so on. After the colon, you specify the branch name on the target on that you want to push.
</p>
<p>
<br><br>
Shorter Hacks 20: IPython history2022/Shorter Hacks 20 IPython history2022-11-202022-11-20T15:38:31+01:00
<div class="editable">
<div id="Shorter Hacks 20: IPython history"><h1 id="Shorter Hacks 20: IPython history" class="header">Shorter Hacks 20: IPython history</h1></div>
<p>
In IPython you can print all commands of the current session with (in a
copy-paste able version) by <code>%hist</code>. If you want to display the history of the
previous time you used IPython you can do that by <code>%hist ~1/</code>. To search the
full history for a keyword, use <code>%hist -g foobar</code>.
</p>
<p>
<br><br>
Shorter Hacks 19: Terminate hanging ssh sessions2022/Shorter Hacks 19 Terminate hanging ssh sessions2022-10-222022-10-22T13:03:33+02:00
<div class="editable">
<div id="Shorter Hacks 19: Terminate hanging ssh sessions"><h1 id="Shorter Hacks 19: Terminate hanging ssh sessions" class="header">Shorter Hacks 19: Terminate hanging ssh sessions</h1></div>
<p>
When the network connection breaks while you work in an ssh session or the ssh
server terminates – maybe due to a reboot – the ssh session freezes and stays
frozen for a long time. There is an ssh keybinding to stop such a frozen
session: First press <code>return</code>, then <code>~</code> and finally <code>.</code> – this will immediately
terminate the connection and exit the client.
</p>
<p>
<br><br>
Shorter Hacks 18: Filter lines in less2022/Shorter Hacks 18 Filter lines in less2022-10-162022-10-16T00:54:10+02:00
<div class="editable">
<div id="Shorter Hacks: Filter lines in less"><h1 id="Shorter Hacks: Filter lines in less" class="header">Shorter Hacks: Filter lines in less</h1></div>
<p>
The pager <code>less</code> has an option to filter displayed lines by regex pattern.
Basically a built-in grep. Press <code>&</code> and then type your filter and only
matching lines will be shown. A <code>!</code> at the start of the pattern or <code>Ctr+N</code>
instead of the ampersand will turn it to an inverse filter.
</p>
<p>
<br><br>
Shorter Hacks 17: SSH Purge host from known_hosts file2022/Shorter Hacks 17 SSH Purge host from known hosts2022-09-282022-09-28T00:02:27+02:00
<div class="editable">
<div id="SSH Purge host from known_hosts file"><h1 id="SSH Purge host from known_hosts file" class="header">SSH Purge host from known_hosts file</h1></div>
<p>
When the host key of a computer you want to ssh to has changed for a valid
reason, ssh will block attempts to connect to it to avoid man-in-the-middle
attacks. Deleting the key from the <code>known_hosts</code> file with an editor can be
annoying, especially if the files is "hashed". Therefore, <code>ssh-keygen</code> offers a
feature to do that:
</p>
<pre>
ssh-keygen -R hostname
</pre>
<p>
<br><br>
Shorter Hacks 16: IPython Autoreload2022/Shorter Hacks 16 IPython Autoreload2022-09-182022-09-18T13:56:44+02:00
<div class="editable">
<div id="Shorter Hacks 16: IPython Autoreload"><h1 id="Shorter Hacks 16: IPython Autoreload" class="header">Shorter Hacks 16: IPython Autoreload</h1></div>
<p>
When developing some python code and testing it in IPython, I love the
<code>autoreload</code> feature of IPython. When enabled it will reload imported modules
automatically. So you will always use the newest version of your code. It even
patches modifications on class methods into existing class instances. To enable
it first load it with <code>%load_ext autoreload</code>, then enable it with <code>%autoreload 2</code>
– See the <a href="https://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html">documentation</a> for explanations of the options and how to autoreload
only selected imports. Here is a code sniped to demonstrate its power:
</p>
<pre python>
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: from test import Foo
In [4]: foo = Foo()
In [5]: foo.bar()
1
# Edit the code
In [6]: foo.bar()
2
</pre>
<p>
You can put the following into your ipython_config¹ to load and enable the
autoreload extension automatically on startup (I only use the first line):
</p>
<pre python>
c.InteractiveShellApp.extensions.append("autoreload")
c.InteractiveShellApp.exec_lines = ["%autoreload 2"]
</pre>
<p>
<br>
<br>
</p>
<p>
1: Ether
</p>
<ul>
<li>
<code>~/.ipython/profile_default/ipython_config.py</code> or
<li>
<code>~/.config/ipython/profile_default/ipython_config.py</code>
</ul>
<p>
Shorter Hacks 15: Dig output format2022/Shorter Hacks 15 Dig output format2022-09-092022-09-11T14:24:50+02:00
<div class="editable">
<div id="Dig output format"><h2 id="Dig output format" class="header">Dig output format</h2></div>
<p>
You probably use <code>dig</code> to query dns records. The default output of dig is
rather verbose, but you can configure it. Try the following for a succinct
output:
</p>
<pre>
dig +nostats +nocomments +nocmd +noquestion +identify +multiline @1.1.1.1 schoenitzer.de
</pre>
<p>
You can set this as default output by adding the options to the file <code>~/.digrc</code>.
</p>
<p>
Shorter Hacks 14: Get external IP behind NAT2022/Shorter Hacks 14 Get external IP2022-04-082022-09-09T11:27:27+02:00
<div class="editable">
<div id="Shorter Hacks 14: Get external IP"><h1 id="Shorter Hacks 14: Get external IP" class="header">Shorter Hacks 14: Get external IP</h1></div>
<p>
You are behind a NAT and need your "outer" IP address? You can use curl with
one of several handy websites: For example: <code>curl icanhazip.com</code> or <code>curl ifconfig.me</code>.
If you want to self-host such a service with nginx it's as simple as:
</p>
<pre>
location / {
default_type text/plain;
return 200 "$remote_addr\n";
}
</pre>
<p>
Shorter Hacks 13: Git exclude files2022/Shorter Hacks 13 git exclude2022-04-022022-04-02T15:22:45+02:00
<div class="editable">
<div id="Shorter Hacks 13: git exclude files"><h1 id="Shorter Hacks 13: git exclude files" class="header">Shorter Hacks 13: git exclude files</h1></div>
<p>
If you want to ignore a file without putting it into the public <code>.gitignore</code> you
can list it in <code>.git/info/exclude</code> – works the same way but is local to you
computer and not version controlled.
</p>
<p>
Shorter Hacks 12: Open editor from within less2022/Shorter Hacks 12 open editor from less2022-03-292022-03-29T03:24:02+02:00
<div class="editable">
<div id="Shorter Hacks 12: open editor from within less"><h1 id="Shorter Hacks 12: open editor from within less" class="header">Shorter Hacks 12: open editor from within less</h1></div>
<p>
Opened a file in less but then realise you have to edit it. The key <code>v</code> (as in
<code>vim</code>?) opens up your editor, as defined in <code>EDITOR</code> or <code>VISUAL</code>.
</p>
<p>
Shorter Hacks 11: journalctl list boots2022/Shorter Hacks 11 journalctl list boots2022-03-142022-03-29T03:24:56+02:00
<div class="editable">
<div id="Shorter Hacks 11: journalctl list boots"><h1 id="Shorter Hacks 11: journalctl list boots" class="header">Shorter Hacks 11: journalctl list boots</h1></div>
<p>
<code>journalctl --list-boots</code> list your last boots with start and shutdown date, as
far as the journal dates back. You can use <code>journalctl -b num</code> to view the logs
of previous boots. Beside that I sometimes use it to check "at what time did I
went to bed yesterday?"
</p>
<p>
Shorter Hacks 10: disown2022/Shorter Hacks 10 disown2022-03-202022-03-20T14:17:03+01:00
<div class="editable">
<div id="Shorter Hacks 10: disown"><h1 id="Shorter Hacks 10: disown" class="header">Shorter Hacks 10: disown</h1></div>
<p>
Some long running command is still active in your ssh session and you need to
leave; but you don't want to stop it and therefore regret that you didn't execute it in
screen/tmux or with nohup? You can still use disown. Stop the process by <C-Z>,
unpause it in the background with <code>bg</code> and run <code>disown</code> so that you can end you
ssh session without killing it.
</p>
<p>
Shorter Hacks 9: git autostash2022/Shorter Hacks 9 git autostash2022-03-142022-03-14T13:59:38+01:00
<div class="editable">
<div id="Shorter Hacks 9: Git autostash"><h1 id="Shorter Hacks 9: Git autostash" class="header">Shorter Hacks 9: Git autostash</h1></div>
<p>
You want to rebase, merge or pull but have uncommitted changes? Use the option
<code>--autostash</code>. Example: <code>git pull --autostash</code> is equivalent to `git stash; git
pull; git stash pop`
</p>
<p>
Shorter Hacks 8: limit strace to syscalls with files2022/Shorter Hacks 8 stace file2022-02-282022-03-14T13:55:56+01:00
<div class="editable">
<div id="Shorter Hacks 8: limit strace to syscalls with files"><h1 id="Shorter Hacks 8: limit strace to syscalls with files" class="header">Shorter Hacks 8: limit strace to syscalls with files</h1></div>
<p>
I often use strace to see which files a program uses and often even more
interesting: wich (non-existing) files it fails to open. With the flag <code>-y</code> strace will
decode file descriptors, so you see the path of the files and with
<code>-e %file</code> strace will only print syscalls that operate on a file:
<code>open</code>, <code>stat</code>, <code>chmod</code>, <code>unlink</code>, … Please note that this does not include syscalls
operating with a file descriptor like read or write – you can add them:
<code>-e %file,read,write</code> but keep in mind there are multiple syscalls capable to
reading from or writing to file descriptors.
</p>
<p>
Shorter Hacks 7: better search in man pages2022/Shorter Hacks 7 better search in man pages2022-02-062022-02-06T17:56:10+01:00
<div class="editable">
<div id="Shorter Hacks 7: better search in man pages"><h1 id="Shorter Hacks 7: better search in man pages" class="header">Shorter Hacks 7: better search in man pages</h1></div>
<p>
I love the <code>man</code> page system on Linux/Unix and frequently look something up.
But one thing drove me crazy: Since the description of flags is written in a
new line below the flag, most of the time you have to scroll up again after
every search to see the flag you've been searching for. If you don't understand
what I mean, search for "symbolic link" in the man page of <code>grep</code> to find the
flag <code>-R</code>. Since you're most likely reading man pages with <code>less</code> as pager,
there is a way to make this much more convenient: Less has the option <code>-j</code> that
will move the matches of a search not in the first line but in the n'th line.
Either set <code>MANPAGER='less -j2'</code> (or a higher number), or add the option to the
environment variable <code>LESS</code> to enable it everywhere.
</p>
<p>
Shorter Hacks 6: pipe stderr2022/Shorter Hacks 6 pipe stderr2022-01-292022-01-29T15:46:17+01:00
<div class="editable">
<div id="Shorter Hacks 6: pipe stderr"><h1 id="Shorter Hacks 6: pipe stderr" class="header">Shorter Hacks 6: pipe stderr</h1></div>
<p>
Normal pipes in bash/zsh only redirect stdout to the target process. If you,
for example want to grep the output of strace you need to also redirect stderr
to the target. You can do this by <code>strace ls |& grep open</code>. On older versions
of bash you need the less handy <code>strace ls 2>&1 | grep open</code>.
</p>
<p>
Shorter Hacks 5: git word diff2022/Shorter Hacks 5 git word-diff2022-01-232022-01-23T15:45:49+01:00
<div class="editable">
<div id="Shorter Hacks 5: git word diff"><h1 id="Shorter Hacks 5: git word diff" class="header">Shorter Hacks 5: git word diff</h1></div>
<p>
If a diff is hard to read because the changes are small, try `git diff
--word-diff`. While the default diffing algorithm works line wise, this will
show you changes within a line.
</p>
<p>
Shorter Hacks 4: strace failed-only2021/Shorter Hacks 4 strace failed-only2022-01-142022-01-14T22:39:16+01:00
<div class="editable">
<div id="Shorter Hacks 4: Strace only failed syscalls"><h1 id="Shorter Hacks 4: Strace only failed syscalls" class="header">Shorter Hacks 4: Strace only failed syscalls</h1></div>
<p>
Want to see why some program is failing? Strace often has the information, but
the output is too much to read. You can use the option <code>-Z</code> to limit output to
failed syscalls only. <code>-z</code> on the other hand limits to successful syscalls.
</p>
<p>
Shorter Hacks 3: grep ps2021/Shorter Hacks 3 grep ps2022-01-022022-01-02T14:16:26+01:00
<div class="editable">
<div id="Shorter Hacks 3: grep ps"><h1 id="Shorter Hacks 3: grep ps" class="header">Shorter Hacks 3: grep ps</h1></div>
<p>
I was always annoyed that if you grep the output of <code>ps</code> you will always also
get the process of grep itself. Then I found this trick: <code>ps aux | grep [h]top</code>
</p>
<p>
Shorter Hacks 2: get local ip with hostname2021/Shorter Hacks 2 get local ip with hostname2021-12-242022-03-20T14:23:58+01:00
<div class="editable">
<div id="Shorter Hacks 2: get ip using hostname"><h1 id="Shorter Hacks 2: get ip using hostname" class="header">Shorter Hacks 2: get ip using hostname</h1></div>
<p>
To get the hosts ip address I used to use <code>ip addr</code> – but the output is noisy
and the address hard to spot. Use <code>hostname -i</code> instead, to get exactly what you
need.
</p>
<p>
Shorter Hacks 1: git dash2021/Shorter Hacks 1 Git dash2021-12-192021-12-19T01:39:16+01:00
<div class="editable">
<div id="Shorter Hacks 1: git dash"><h2 id="Shorter Hacks 1: git dash" class="header">Shorter Hacks 1: git dash</h2></div>
<p>
You probably know <code>cd -</code> to move back into the previous directory. Git offers
the same handy shortcut to refer to the last checked-out branch. So,
you can conveniently switch between two branches by <code>git checkout -</code>, or merge the
previous branch by <code>git merge -</code>, etc.
</p>
<p>
Arch Packages You Might Want to Install2021/Arch Packages2021-11-022021-12-08T03:00:42+01:00
<div class="editable">
<div id="Arch Packages You Might Want to Install"><h1 id="Arch Packages You Might Want to Install" class="header">Arch Packages You Might Want to Install</h1></div>
<p>
After installing a fresh Arch, the system ist pretty bare and you might spend a long
time thinking about which packages you might need. But even after installing
all packages you came up with, the installation is likely followed by a long period
where you regularly discover missing tools.
To help to speed things up, here is a opinionated list of arch packages that
most people might want to install.
It's not meant as a list to blindly copy-paste, but rather as a checklist. For
some packages, that won't be relevant for everyone, I added some comments.
</p>
<pre bash>
# Hardware related and drivers
alsa-firmware
alsa-utils
bluez-utils
pulseaudio
pulseaudio-bluetooth
linux-firmware
v4l-utils
wireless_tools
wpa_supplicant
networkmanager
intel-ucode # If you use an Intel processor
sof-firmware # Firmware for many sound cards
linux-zen # Alternative kernel
# Admin tools
awk
bash-completion
base-devel
curl
dnsutils
git
gnupg
inetutils
iproute2
iptables
iw
iwd
jq
less
lsof
man
net-tools
nmap
openssh
strace
sudo
unzip
usbutils
whois
# More Advanced Debugging Tools
bcc
bcc-tools
bpf
bpftrace
tcpdump
wireshark
# Basic terminal applications
bc
neomutt # enhanced version of the mutt mail client
neovim # enhanced successor of vim
python-neovim
screen
tmux
wget
# Convenience terminal tools
bat
dust
fd
lynx
ncdu
ripgrep
zoxide
wl-clipboard # Command-line copy/paste utilities for Wayland
xclip # Command-line copy/paste utilities for X11
xsel # Command-line copy/paste utilities for X11
# Replace by your favorite terminal emulator
# If you haven't give kitty a try.
kitty
konsole
# X and/or wayland + a display environment + display manager
# Replace by your favorite choice
wayland
xorg-xwayland
xorg
sddm
plasma-wayland-session
# Common Desktop Applications
chromium
firefox
firefox-i18n-de # Replace by your language
thunderbird
thunderbird-i18n-de # Replace by your language
keepassxc
vlc
meld
mpv
youtube-dl
slack
hunspell-de # Replace by your language
languagetool # More advanced spell and grammar checking
# If you use a Yubikey
libfido2
pcsclite
hopenpgp-tools
yubico-pam
yubikey-manager
yubikey-manager-qt
yubikey-personalization
</pre>
<p>
Audio volume in Anki2021/Audio volume in Anki2021-01-032021-01-03T22:05:48+01:00
<div class="editable">
<div id="Audio volume in Anki"><h1 id="Audio volume in Anki" class="header">Audio volume in Anki</h1></div>
<p>
TL;DR: Install mpv and put <code>volume=80</code> into <code>~/.local/share/Anki2/mpv.conf</code> to
set the audio volume of Anki to 80%.
</p>
<p>
Audio volume handling is pretty high in my list of "annoying things I can't
believe are still an issue". Whenever I listen to music or watch video there's
a high chance, I have to either turn up the volume to hear something. Not much
later some other application plays audio with a much higher volume resulting in
me worrying whether I just woke up the neighbors. Sadly a lot of applications
don't even have a volume setting…
</p>
<p>
One of those applications is <a href="https://apps.ankiweb.net/">Anki</a>. Since I use it a lot I decided to fix this
and a few other things that bug me by writing according plugins. Since I
searched the internet several times for any existing ways to set the volume I
was surprised when while writing a plugin for it, I found that there is an easy
– but undocumented – way to set the playback volume for Anki. At least if you
use Linux and have <a href="https://mpv.io/">mpv</a> installed.
Mpv is one of a hand full of multimedia players that Anki can use for playback.
Anki will run a mpv instance as daemon in the background and control it via
IPC. MPV is started with the option <code>--config-dir=<ankidir></code> this way mpv will
ignore you default configuration file and search in the Anki directory instead.
So you can create a mpv configuration file in this option and in that among other
things set the volume or enable audio filters. I set the volume to a level
compatible with my other applications. If that's not enough – for example
because the audio files in you Anki decks have different volumes – you could
enable a dynamic range compression filter.
</p>
<p>
I still consider writing a simple Anki plugin to set the volume since I suggest
more people could use that. For now, I am satisfied with the existing solution
and can next try to find a solution to set the volume for websites like
Duolingo that also miss a volume setting…
</p>
<p>
PS: I also considered writing a PulseAudio module to fix this problem in
general, but the documentation is spare and is seemed like a bigger project.
</p>
<p>
A better button for my Tomu2020/A better button for my Tomu2020-10-172020-10-17T23:22:45+02:00
<div class="editable">
<div id="A better button for my Tomu"><h1 id="A better button for my Tomu" class="header">A better button for my Tomu</h1></div>
<p>
A while ago I ordered a <a href="https://tomu.im/">Tomu</a>. More specific an "original" Tomu. And not only one
but 60… but not all of them for myself. ;)
Until recently I used my Tomus only sparely. The mayor issue that prevented me
so far was that the "button" that has to be pressed at every authorisation was
pretty hard to touch. The origin of that problem is that the buttons are just
two tiny conductive areas on the board and the case has a fixture right next to
it. Often touching it was no issue at all but sometimes it didn't work for the
first five times — and even though rare, this was too annoying.
</p>
<p>
Recently I decided to fix this. I soldered two simpel pins on the button, so
now it was enough to touch these to pins, that can even be done blindly. This
worked great but there was a new annoyance: when touching it carelessly the pins
they pricked. So I used the soldering iron one more time to add to tiny balls
of solder on their tips. I'm pretty pleased with the result.
</p>
<p>
Here is what the result looks like (not pretty but functional):
</p>
<p>
<img src="https://schoenitzer.de/blog/2020/tomu_soldered_web.jpg" />
</p>
<p>
Arbitary virtual memory usage with numpy2020/Arbitary virtual memory usage with numpy2020-09-252020-09-25T01:48:46+02:00
<div class="editable">
<div id="Arbitrary virtual memory usage with numpy"><h1 id="Arbitrary virtual memory usage with numpy" class="header">Arbitrary virtual memory usage with numpy</h1></div>
<p>
When recently hacking on htop, there was a bug about large memory sizes being
displayed wrong. But to test for it you need a process that uses 98GiB
of memory. Luckily any type of memory – so virtual memory is enough. How to get
a process to have an fixed, arbitrary large amount of virtual? Sure I could
write a few lines of C to do it, but with numpy anyway present on my system it's
way easier:
</p>
<pre sh>
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
$ python
>> import numpy
>> x=numpy.empty([1024**3//8, 98]])
</pre>
<p>
First we tell the kernel to always accept malloc's, even if the size is way
over the available (1 = unlimited overcommitment). Then in Numpy we create an
empty matrix with the right size to use the desired amount of space. Since the
floats are 8 bytes large, we use 1024^3/8 as one dimension and can then set the
number of GiB as the second dimension.
</p>
<p>
The advantage over a static compiled malloc in a C script is that you can
change the size on the fly for free: just overwrite x with a new empty of the
new desired size…
</p>
<p>
We you're finished restore default setting for overcommitment, in that the
kernel will use some heuristics to determine if it should accept a memory
allocation:
</p>
<pre sh>
$ echo 0 | sudo tee /proc/sys/vm/overcommit_memory
</pre>
<p>
Happy hacking.
</p>
<p>
Javascript: Country Code to Flag2020/Country Code to Flag2020-07-112020-07-11T15:42:26+02:00
<div class="editable">
<div id="Javascript: Country Code to Flag"><h1 id="Javascript: Country Code to Flag" class="header">Javascript: Country Code to Flag</h1></div>
<p>
Here's a fun little Javascript function:
</p>
<pre>
function getflag(langcode) {
var first = langcode.charCodeAt(0) + 127397;
var second = langcode.charCodeAt(1) + 127397;
var flag=`&#${first};&#${second};`;
return flag;
}
getflag("DJ") // 🇩🇯
getflag("DE") // 🇩🇪
getflag("SE") // 🇸🇪
</pre>
<p>
How does this work? Instead of adding the flags of every country to the Unicode
standard, Unicode Defines 26 special characters 🇦 to 🇿, that can be combined
according ISO 3166 to form a flag. So combining 🇵 with 🇪 will result in the
flag of Peru (PE): 🇵🇪.
While Unicode otherwise often uses _zero_width_joiner_ to merge characters
(concatenating the five characters 👩 Woman, <em>Zero Width Joiner</em>, 👩 Woman,
<em>Zero Width Joiner</em> and 👧 Girl results in 👩👩👧) in this case its
even enough to write the two characters next to each.
Since the 26 special characters 🇦 to 🇿 are ordered the same way as the
"normal" ASCII Letters A to Z, adding a constant offset is enough.
Using above function will generate the HTML Entities for the characters. If you
add those to the HTML document, you will get the countries flag.
</p>
<p>
So Much Nothing2020/So Much Nothing2020-06-202020-06-20T20:27:08+02:00
<div class="editable">
<div id="So Much Nothing"><h1 id="So Much Nothing" class="header">So Much Nothing</h1></div>
<p>
If you know about sparse files, this is of no relevance for you, but might
still amuse you. If you don't know about sparse files, I recommend changing
that because they are not only fun, but also very useful — for example when
handling filesystem images.
</p>
<pre>
$ dd if=/dev/null of=hugeemptyfile bs=4M count=0 seek=2000G
$ ls -l hugeemptyfile
-rw-r--r-- 1 michi users 7,9E 11. Mai 01:45 hugeemptyfile
</pre>
</div>
<br/>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://licensebuttons.net/l/by-sa/4.0/80x15.png" /></a>
<span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">So Much Nothing</span> by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName">Michael F. Schönitzer</span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
Convert images to PDF with ImageMagic2019/Convert images to PDF with ImageMagic2019-06-032019-06-03T21:39:09+02:00
<div class="editable">
<div id="Convert images to PDF with ImageMagic"><h1 id="Convert images to PDF with ImageMagic">Convert images to PDF with ImageMagic</h1></div>
<p>
I often have to join multiple images into one PDF file. My way to do this was
always to convert the images to PDF files using ImageMagick and then concatenate the PDF files
with pdftk.
</p>
<p>
Lately this failed with the following error:
</p>
<pre>
convert: attempt to perform an operation not allowed by the security policy `PDF' @ error/constitute.c/IsCoderAuthorized/408.
</pre>
<p>
The reason is a change in the default policies that forbids to convert files to
PDF. The motivation for this is that ImageMagick uses Ghostscript as PDF
back end and Ghostscript had several security issues.
</p>
<p>
You can find the policies files and the policies set by <code>convert -list policy</code>.
You could open the policy file and disable the line that states
</p>
<pre>
<policy domain="coder" rights="none" pattern="{PS,PS2,PS3,EPS,PDF,XPS}" />
</pre>
<p>
to be able to write PDF files again. Be aware this would get overwritten on you
next ImageMagick-update. Better create a policy file in your home directory. On
Linux this is most likely `<code>${HOME}/.config/ImageMagick/policy.xml</code><sup><a href="Convert images to PDF with ImageMagic.html#1">1</a></sup>. In that you
can enable PDF reading and writing support by a simpel policy. The policy file
should look like this:
</p>
<pre>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
<!ELEMENT policymap (policy)+>
<!ATTLIST policymap xmlns CDATA #FIXED ''>
<!ELEMENT policy EMPTY>
<!ATTLIST policy xmlns CDATA #FIXED '' domain NMTOKEN #REQUIRED
name NMTOKEN #IMPLIED pattern CDATA #IMPLIED rights NMTOKEN #IMPLIED
stealth NMTOKEN #IMPLIED value CDATA #IMPLIED>
]>
<policymap>
<policy domain="coder" rights="read | write" pattern="PDF" />
</policymap>
</pre>
<p>
Now you can again create you pdf files the good old way
</p>
<pre>
$ convert foo.jpg foo.pdf
$ convert bar.png bar.pdf
$ pdftk foo.pdf bar.pdf cat output foobar.pdf
</pre>
<p>
Of course you should think about the security implications for your system.
</p>
<hr />
<ul>
<li>
<span id="Convert images to PDF with ImageMagic-1"></span><strong id="1">1</strong>: If that's not working for you check out <code>convert -debug configure logo: null:</code>
to find in which folders ImageMagick searches for config files on your system.
</ul>
<p>
Asking Gender in Surveys – doing it right2019/Gender in Surveys2019-01-082019-06-03T21:34:14+02:00
<div class="editable">
<div id="Asking Gender in Surveys — doing it right"><h1 id="Asking Gender in Surveys — doing it right" class="header">Asking Gender in Surveys — doing it right</h1></div>
<p>
When taking surveys or registering for events I often get annoyed by the
“gender-question”. It's pretty easy to do it properly but people still come up
with surprisingly many ways of doing it wrong. So here I want to present you
how not to do it and how to do it.
</p>
<p>
Let's start with the worst form that you can probably find:
<form>
<fieldset>
<legend>Gender: *</legend>
<input type="radio" name="gender" value="male" checked> Male<br>
<input type="radio" name="gender" value="female"> Female<br>
</fieldset>
</form>
</p>
<p>
This form has only two options and is mandatory (as indicated by the <code>*</code>).
That's awful and no one in the 21st century should do this. If you never heard
about non-binaries or intersexuality you should look those terms up now. With a
question like this, you will force people to either stop filling out your
survey or deny their gender identity. Making the question optional is just
slightly better. Having one gender preselected (of course it's always the male
one) is so ridiculously stupid, that you can only wonder why anyone would ever
think that's a good idea.
</p>
<p>
A somewhat better — but still bad — far spread option is the following:
</p>
<p>
<form>
<fieldset>
<legend>Gender: *</legend>
<input type="radio" name="gender" value="male"> Male<br>
<input type="radio" name="gender" value="female"> Female<br>
<input type="radio" name="gender" value="other"> Other / won't state
</fieldset>
</form>
</p>
<p>
This allows for people who don't identify as male nor female as well as people
valuing their privacy to choose an alternative option. Why may this still
discriminate people? Intersexuals and (other) people who identify as non-binary
have to fight a lot for gaining attention in our society. A big part of our
society is still not aware of their existence or underestimate the number of
people not fitting into the binary categories of male/female. Even worse there
are people who actively deny their existence or their right to identify them
self as neither male nor female. This option throws them in one box with
people who fear about their privacy or don't want to specify their gender for
any other reason. This way you “hide” other genders in the statistics and make
it easier for people to forget or deny their existence. Having no separate
option for non-binaries will also send a signal to people that you don't care
about their gender if it's not male nor female. It signals that you will have a
look at the differences between what males and females answered in the other
questions but don't care about the answers from people of other genders, since
you either don't consider them real, or important or wide spread enough in the
first place.
For that reason I strongly discourage to join these two completely different
things in one answer.
</p>
<p>
Another mistake that I saw multiple times already is the following:
</p>
<p>
<form>
<fieldset>
<legend>Gender: *</legend>
<input type="radio" name="gender" value="male"> Male<br>
<input type="radio" name="gender" value="female"> Female<br>
<input type="radio" name="gender" value="male"> Transmale<br>
<input type="radio" name="gender" value="female"> Transfemale<br>
<input type="radio" name="gender" value="other"> Other<br>
<input type="radio" name="gender" value="wontstate"> Prefer not to say
</fieldset>
</form>
</p>
<p>
This version is more inclusive for non-binaries and offers more options but in
it's phrasing there's a message hidden that will hurt trans* people:
By listing <code>Transmale</code> as well as <code>Male</code> you are implying that trans males are
not “real” males and analogous for trans females. If you need to know if a person is
trans or not you should add a separate question asking the person whether they
identify as trans<sup><a href="Gender in Surveys.html#1">1</a></sup>.
</p>
<p>
After these three bad examples here is finally an example of how to properly
ask for gender in a survey or registration form:
</p>
<p>
<form>
<fieldset>
<legend>What is your Gender? *</legend>
<input type="radio" name="gender" value="male"> Male<br>
<input type="radio" name="gender" value="female"> Female<br>
<input type="radio" name="gender" value="other"> Other:
<input type="text" name="gender" placeholder="(optional)"><br>
<input type="radio" name="gender" value="wontstate"> Prefer not to say
</fieldset>
</form>
</p>
<p>
In this version everyone can state their gender as precise as they want and no
one is forced to specify more than they want. I also changed the question to
actually be a real question, since I think that's nicer in general — but that
might be a matter of taste, so is the exact framing of the question and
options.
</p>
<p>
When publishing the results of your survey you should think twice about whether
someone can be deanonymised by their gender or if someones gender can be figured
out by the published results. But that is a topic for another blog post…
</p>
<p>
Some people might be tempted to not ask the gender at all in a survey, arguing
“you can't ask wrong if you don't ask at all” — please don't do this!
Without asking the gender you will never have a chance to find out if…
</p>
<ul>
<li>
your survey has a bias towards one gender
<li>
there are significant differences in answers between participants of the
different genders that might hint to the existence of discrimination (imagine
for example in your survey the male attendees being statistically more
satisfied than other genders)
<li>
there is a conflict of interests between different genders (different genders
preferring different options)
<li>
and many more…
</ul>
<p>
Gender is an important demographic key. If you do a survey, a registration form
or something similar, ask for the gender — but do it the proper way! No one will
be mad at you for not using their favorite terms, but please avoid the
mentioned mistakes. Let me summarize the important points:
</p>
<ul>
<li>
Add (at least one) “third option” for people neither male nor female
<li>
Add a <span id="Asking Gender in Surveys — doing it right-separate"></span><strong id="separate">separate</strong> option for people who do not want to specify their gender
<li>
Add an <span id="Asking Gender in Surveys — doing it right-optional"></span><strong id="optional">optional</strong> free text field for other genders
<li>
Do not select anything by default
<li>
Do not ask for <code>sex</code> but for <code>gender</code><sup><a href="Gender in Surveys.html#2">2</a></sup>
</ul>
<p>
It's not that hard — is it?
</p>
<hr />
<ul>
<li>
<span id="Asking Gender in Surveys — doing it right-1"></span><strong id="1">1</strong>: You could also use the options <code>cis female</code> and <code>trans female</code>, etc. But
then title of the question would not be correct anymore, since the gender of
a trans male and the gender of a cis male is both male. Further more
survey-experts in general strongly recommend splitting up questions instead
of packing two different aspects into one.
<li>
<span id="Asking Gender in Surveys — doing it right-2"></span><strong id="2">2</strong>: Unless you really need to know the biological sex, e.g. if you are a
doctor.
</ul>
<hr />
<p>
How to read a vimscript stacktrace2018/How to read a vimscript stacktrace2018-08-312018-08-31T23:05:20+02:00
<div class="editable">
<div id="Reading vimscript stacktraces"><h2 id="Reading vimscript stacktraces">Reading vimscript stacktraces</h2></div>
<p>
When you get an ERROR in vimscript, vim will show you a
stacktrace, as you might also know it from other languages. But reading them
is not trivial and I haven't found any documentation of it so far. When you see
them the first time, you might interpret them wrong and search the error at
the wrong location.
</p>
<p>
Let's look at a stacktrace from vimwiki:
</p>
<pre>
Error detected while processing function vimwiki#base#follow_link[58]..vimwiki#base#open_link[30]..vimwiki#base#edit_file:
line 21:
E325: ATTENTION
Error detected while processing function vimwiki#base#follow_link[58]..vimwiki#base#open_link:
line 30:
E171: Missing :endif
Error detected while processing function vimwiki#base#follow_link:
line 58:
E171: Missing :endif
</pre>
<p>
The most relevant information is in the first three lines:
</p>
<p>
The first lines tell us that the error occurred in the function
<code>vimwiki#base#edit_file</code>, wich was called by the function
<code>vimwiki#base#open_link</code>, which was called by the function
<code>vimwiki#base#follow_link</code>. From the names of the functions we learn in which
file we will find them: <code>autoload/vimwiki/base.vim</code>.
</p>
<p>
The second line of the stacktrace tells us the line number where the bug
occurred: line 21. But, here's the catch: all line numbers are relative to the
function. So the bug occurs 21 lines below the definition of the function
<code>vimwiki#base#edit_file</code>. The numbers in the square brackets are also relative
line numbers, of where in the functions the next function was called.
</p>
<p>
The third line tells us the error that occurred. In this case the error
is called <code>ATTENTION</code> and has the error code <code>E325</code>. You can look it up with
<code>:help E325</code> or <code>:help ATTENTION</code>.
</p>
<p>
The rest of the lines show how the error propagates though the callers. They are
seldom useful (at least to me).
</p>
<p>
Introducing Hewa2018/Introducing Hewa2018-08-012018-08-01T17:48:01+02:00
<div class="editable">
<div id="Introducing Hewa"><h2 id="Introducing Hewa">Introducing Hewa</h2></div>
<p>
While reading, typos and spelling mistakes in texts can be annoying. I myself
make a lot of typos and spelling mistakes — but that doesn't mean that they
annoy me any less when I stumble over one while reading. I personally already
use vims built in spell checker as well as <a href="https://languagetool.org/">LanguageTool</a> to check my texts —
but that doesn't find everything. Asking someone to proofread every text improves
the situation greatly but some mistakes will still remain. And more
importantly: you might not have someone willing to proof-read everything you
write. I thought it would be nice to have an easy way to report small
mistakes. Others use the comment section for that — but my blog
hasn't comments — on purpose — and most people won't write an email for
submitting a typo they found while reading.
</p>
<p>
For this reason I wrote <code>hewa</code>. It adds an edit-button to every entry of my
blog, that allows you to edit the page in the browser and send me your
suggestion. The edit mode opens instantly, no loading of any sort of editor.
Simply change the text, click save and I'll receive the patch.
</p>
<p>
Today I'm activating the first version of hewa in my blog. It's still not very
mature and might not run with proprietary browsers like IE. It's written in
JavaScript — without jQuery or any other libraries — and Python with Flask. I
will of course release the source code under a free license. But I still need
to clean up the code before I can do that.
</p>
<p>
This is also an experiment: I'm not sure if people will use it or if
I'll end up receiving more spam and vandalism than actual corrections. Spam and
vandalism is also the reason that I don't apply the changes to the text but
rather save the diffs. I have a tool that allows me to look at those patches and
apply or drop them.
</p>
<p>
Linux Raw Sockets2018/Linux Raw Sockets2018-03-182018-04-27T15:09:54+02:00
<div class="editable">
<div id="Linux Raw Sockets"><h2 id="Linux Raw Sockets">Linux Raw Sockets</h2></div>
<p>
Recently I did a userspace implementation of the
<a href="https://tools.ietf.org/html/rfc7401">Host Identity Protokoll (HIPv2, RFC 7401)</a> with the upcoming
<a href="https://tools.ietf.org/html/draft-ietf-hip-dex-06">Diet Exchange (HIP DEX, IETF draft 6)</a>. Doing so, I've learnt a lot about raw
socktet programing under Linux and here I want to share a few things with you.
</p>
<p>
So, I assume you have already worked with network sockets before – if not, don't
fear, it's not that hard and there are plenty of nice introductions out there. I
can for example recommend Beej's Guide to Network Programming. For this
article I'll start with a normal UDP/TCP based socket and work my way down the
layers. So we open a traditional socket by:
</p>
<pre>
sockfd = socket(AF_INET, SOCK_DGRAM, 0);
</pre>
<p>
This will open a UDP based datagram socket via IPv4. The first argument of
<code>socket()</code> specifies the <code>domain</code> of your socket in our case that's Internet
Protocol. Sometimes you will see here <code>AF…</code> and sometimes <code>PF…</code>, this doesn't
matter, they are the same. While PF stands for protocol family, AF is short for
address family. Historically it was thought that in the future there might be
multiple protocol families sharing the same address family – but this never
happend. So the correct way would be to use <code>PF_INET</code> in the socket call and
<code>AF_INET</code> in your <code>struct sockaddr_in</code>, but most people nowadays use the
address family everywhere. With the second argument <code>type</code> we specify if we
want to use a connection-based protocol like TCP (<code>SOCK_STREAM</code>) or a protocol
without connections like UDP (<code>SOCK_DGRAM</code>). The third argument <code>protocol</code>
specifies which protocol we actually want to use – we could set UDP or TCP here
(<code>IPPROTO_UDP</code>, <code>IPPROTO_TCP</code>) but setting 0 works too: this sets the
protocol to the default protocol for the combination of the domain and type
field – for <code>AF_INET</code> and <code>SOCK_DGRAM</code> the default is UDP and for <code>SOCK_STREAM</code>
it's TCP. You might also see <code>IPPROTO_IP</code> as protocol which is simply by
definition 0. But the above variant seems to be the most common one.
</p>
<p>
But hey, we have the year 2018 – why the heck should be limit us to IPv4?
Luckily it's easy enough to support IPv6: just replace <code>AF_INET</code> by <code>AF_INET6</code> and
it will work with both IPv4 and IPv6! So don't you dare to ever use <code>AF_INET</code>
anymore without a good excuse.
By the way: if you want IPv6 only you can set the socket option <code>IPV6_V6ONLY</code>.
</p>
<p>
But we don't want to talk about ordinary TCP/UDP sockets here! So lets dig down
in the mysterious world of raw sockets.
</p>
<p>
The first thing I want to note is: you'll need super user rights for creating a
raw socket or more precisely the <code>CAP_NET_RAW</code> <a href="https://github.com/Nudin/Capabilities">capability</a> otherwise you'll get
the error ”Operation not permitted.” (EPERM).
</p>
<pre>
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
sockfd = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);
</pre>
<p>
The first kind of Raw-Socket we look at is what you get by setting <code>type</code> to
<code>SOCK_RAW</code> but still set <code>protocol</code> to TCP or UDP. You will still only receive
the type of packet specified (here UDP), but this time you will not only
receive the data but also the layer 4 (TCP/UDP) header and you're also
responsible to set the layer 4 header yourself.
</p>
<p>
Contrary to above, here the choice of <code>domain</code> does matter a lot. First of all
here <code>AF_INET6</code> will only receive IPv6 and not both! Second what you get if you
read from the socket differs: if you read from the first variant with <code>AF_INET</code>
you will get the IPv4 header, the UDP/TCP header and the data; in the second
variant your read will instead result in only the UDP/TCP header and data but
not the IPv6-Header!
</p>
<p>
The third important difference between <code>AF_INET</code> and <code>AF_INET6</code> for raw sockets
is the endianness: unlike IPv4 raw sockets, all data sent via IPv6 raw sockets
must be in the network byte order and all data received via raw sockets will be
in the network byte order.
</p>
<p>
If you want to send something through the socket, your packet has to include
the Layer 4-Header but not the IP-Header. (Note: this is unspecified in POSIX,
but I focus on Linux here.) So but what if we want to change something in the
IP-Header? For IPv4 there are two options: you can set the desired
field(s) via calls to <code>setsockopt</code> or if you want to do the full header on your
own, you can use the socket option <code>IP_HDRINCL</code> to tell that you will
construct the header and write both header and payload to the socket:
</p>
<pre>
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
int on = 1;
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on));
</pre>
<p>
Even if you use this you won't have to deal with Source Address and Packet ID –
the kernel will fill them in for you if you leave them all zero. The fields for
the IP checksum and the length field will be set by the kernel if you want or
not.
</p>
<p>
What's important here: IPv6 doesn't have <code>IP_HDRINCL</code> or a direct equivalent,
as per RFC 3542 section 3. You can, however, also set various parameters via
<code>setsocketopt</code>. Alternatively the IPv6 advanced socket API employs another
framework called “ancillary data”. For outgoing packages one can set the
majority of the fields in the header as well as supported header extensions via
ancillary data and for received packages the majority of the fields and header
extensions can be read with the same framework. A description of ancillary data
is out of the scope of this article but the basic idea is you specify which
values you want to set via a call of <code>setsockopt</code> then you write the value for
the header fields and the actual data into a <code>struct msghdr</code> and send this via
<code>sendmsg()</code>.
</p>
<p>
If you want to send data with a transport protocol which has no user interface
you can set the <code>protocol</code> field to raw too:
</p>
<pre>
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
</pre>
<p>
This will automatically set <code>IP_HDRINCL</code> and allow you to send your data with
arbitrary layer 4 protocols. Most commons use: sending ICMP packets. Receiving
of data is however not possible with this type of socket!
</p>
<p>
So far we got full control over layer 4 and partial control over layer 3. It's
time to step down one further level into the dungeon.
</p>
<pre>
sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETHERTYPE_IPV6));
</pre>
<p>
This is called a packet socket, it allows you to receive and send raw
packets at the device driver level (layer 2). In the above version we used the
protocol to specify that we only want to receive IPv6 packets. We can drop this
requirement to receive all packets no matter if it's IPv4, IPv6 or something
else:
</p>
<pre>
sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL));
</pre>
<p>
By default, a packet socket will receive all packets matching the protocol.
You can use bind() to bind the packet socket to an interface.
</p>
<p>
The field type set to <code>SOCK_DGRAM</code> results in the cooked mode: when reading
from the socket you will read the packet without MAC-header but you can get the
MAC-addresses comfortable by using <code>recvfrom()</code> and likewise you can use the
<code>sendto()</code> to specify the destination by the <code>sockaddr_ll</code> struct.
Alternatively we can set type to <code>SOCK_RAW</code>:
</p>
<pre>
sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
</pre>
<p>
This is the lowest we can get: this way ethernet frames are passed from the
device driver without any changes to your application, including the full level 2
header. Likewise, when writing to the socket the user-supplied buffer hast to
contain all the headers of layer 2 to 4.
</p>
<p>
This is the deepest we can go in userspace – at this point we have full control
of the complete ethernet frame. I hope you enjoyed our journey into the rabbit
hole.
</p>
<hr />
<p>
Sources and further readings:
</p>
<ul>
<li>
<a href="http://beej.us/guide/bgnet/">Beej's Guide to Network Programming</a>
<li>
socket(7)
<li>
raw(7)
<li>
packet(7)
<li>
sendto(2), recvfrom(2)
<li>
<em>UNIX Network Programming, Volume 1</em> by W. Richard Stevens
<li>
<em>IPv6 Core Protocols Implementation</em> by Qing Li Tatuya Jinmei Keiichi Shima
<li>
<em>IPv6 Socket API Extensions: Programmer's Guide</em> by Qing Li Tatuya Jinmei Keiichi Shima
<li>
<a href="https://elixir.bootlin.com/linux/latest/source/kernel">Linux Kernel source code</a>
</ul>
<p>
Awesome way to debug python2018/Awesome way to debug python2018-02-082019-01-10T00:42:41+01:00
<div class="editable">
<div id="Awesome way to debug python"><h2 id="Awesome way to debug python">Awesome way to debug python</h2></div>
<p>
This it so awesome. The probably most foolproof method to debug python after
printf-debugging:
</p>
<pre>
import code
code.interact(local=dict(globals(), **locals()))
</pre>
<p>
You can add this at any position in your code you like. Whenever the second
line is executed the python-read-evaluate-print-loop (REPL) is started and you
can interact with your program live: you can check the values of your variables
and run arbitrary code. After you exit the shell, your code continues to run
normally.
</p>
<p>
But it gets even better. You can use this also if you don't know where in the
code the problem is occurring, you can even use it preventive before you know
about bugs (but make sure to remove before shipping to customers). Add this to
the to of your code:
</p>
<pre>
import signal
import code
def debug_handler(signum, frame):
code.interact(local=globals())
signal.signal(signal.SIGUSR1, debug_handler)
</pre>
<p>
This will register a signal handler for the signal SIGUSR1. Your program will
run completely normal but whenever you run into a bug and want to debug, simply
send a corresponding signal to the process:
</p>
<pre>
$ kill -SIGUSR1 <pid>
</pre>
<p>
This will open a REPL and you can look at any global variables and run code —
you can even modify the content of variables or rewrite whole functions! So
you could try out a potential bugfix live without restarting the program.
</p>
<p>
Actually you can also edit variables and functions with the first way (without a
signal) — but you have to decide if you want to have access at only the
global or only the local scopes. The compounding of the globals and locals with
the dict copies the values, so that you won't modify the original anymore. If
you really need to access and modify local and global variables you could use a
little trick:
</p>
<pre>
code.interact(local={'gvar': globals(), 'lvar': locals()})
</pre>
<p>
But you then have to access the variables and functions by <code>gvar['varname']</code> —
much less easy and elegant… In that cases you might want to switch to something
more advanced like pdb anyway.
</p>
<p>
There are two additional tricks: First: You can use the parameters <code>banner=</code> and
<code>exitmsg=</code> to overwrite pythons default message when starting the REPL and to
show something when exiting the REPL. Second: If you have IPython installed — it has
the same functionality available, so then use the following instead to get a
colorful more interactive shell:
</p>
<pre>
import IPython
IPython.embed()
</pre>
<p>
Only downside of IPython: I haven't found a way to change variables.
</p>
<p>
Happy Hacking!
</p>
<p>
Up2date softwareversions for Wikidata2018/Up2date softwareversions for Wikidata2018-01-052018-07-29T01:49:04+02:00
<div class="editable">
<div id="Up2date software versions for Wikidata"><h2 id="Up2date software versions for Wikidata">Up2date software versions for Wikidata</h2></div>
<p>
I'm a supporter of <a href="https://wikidata.org/">Wikidata</a> and free Software. So
naturally I care about Wikidata's items about free software. There are at
least <a href="http://tinyurl.com/y92895lh">17.000</a> of them and their quality is (as always on Wikis)
very different. Lots of them came due to corresponding
Wikipedia-articles, but others were created by imports from for example
Gentoo's Portage.
</p>
<p>
One aspect where Wikidata could really shine is version numbers of software.
If a new version of Firefox is released, traditionally the version number has
to be updated in all 120 languages versions of Wikipedia which have an article
about Firefox. With Wikidata this is not necessary anymore – update the version
number once on Wikidata and all Wikipedias can show the newest version number
instantly. Sadly this is still not reality. A lot of Wikipedia communities are
still skeptical about Wikidata and so version numbers are still often edited on
local Wikipedias instead of getting them by Wikidata.
</p>
<p>
One key aspect to improve this situation is to improve the data quality of
Wikidata. Up till now too many of Wikidatas items about free software have
outdated version numbers, which is not surprising, since if they are not used
in Wikipedia they don't get updated by Wikipedians.
</p>
<p>
There are a few promising ways to improve this. <a href="https://www.wikidata.org/wiki/User:Github-wiki-bot">Github-wiki-bot</a> by Konstin is
one – it imports version numbers from Github. But that only works for some
projects. Since a short while I've been working on another way to improve
this: Checking version numbers against those in the repository-database of Arch
Linux. I used Arch since it contains very fresh versions of software – so most
of the time the versions in the Arch repos are the newest versions available.
</p>
<p>
To check the versions, I need the Arch Package identifier (<a href="https://www.wikidata.org/wiki/Property:P3454">P3454</a>) to be present
in Wikidata – so I first wrote a script to help me add those. For every
Software that runs on Linux, I search if the Arch repository contains one with
the same name <span id="Up2date software versions for Wikidata-and"></span><strong id="and">and</strong> website. With that I could add the Arc Package identifier
to about roughly 600 packages.
</p>
<p>
Then I wrote a second script, that checks the version numbers of those items,
against those in the Arch repositories. If the version available for Arch is
newer than the newest version we have in Wikidata, I print it to a website,
sorted by the size of the difference in the version numbers. You can find this
<a href="https://tools.wmflabs.org/wdvaliditycheck/softwareversions.html">list here</a> – it's updated a few times per day.
</p>
<p>
The list contained a few items with terribly outdated versions (like 3 years
old and two digits older in the major version number!). In the last days I
updated several hundred items from this list by hand. Starting with those where
the major version number was out of date and then all where the minor version
number was out of date. The list now only contains items outdated in the third
or fourth version number – all first and second version numbers are up to date.
And I hope that I can keep the version numbers at least this much up to date.
</p>
<p>
This is not finished, of course. From the 17.000 items about free software
only less than 700 have a reference to the corresponding Arch package! The Arch
repos contain 10.000 packages. So even without knowing how large exactly the
overlap between Wikidata and the Arch repos is – it's for sure much bigger than
what we currently got!
</p>
<p>
Your help is needed!
</p>
<p>
Vim revelation of the day: wildmode2018/Vim revelation of the day: wildmode2018-01-032018-01-04T23:33:40+01:00
<div class="editable">
<div id="Vim revelation of the day: wildmode"><h2 id="Vim revelation of the day: wildmode">Vim revelation of the day: wildmode</h2></div>
<p>
How often had I been annoyed by Vims command and file autocompletion. I was
sure I had looked into this in the past – was that just imagination or was I
searching the wrong terms? Now I finally stumbled over the right options:
</p>
<pre>
set wildmenu
set wildmode=longest:full
set wildignore+=*.a,*.o,*.hi
set wildignore+=*.pdf,*.gz,*.aux,*.out,*.nav,*.snm,*.vrb
</pre>
<p>
Wildmenu shows a nice list with the completions available. With wildmode you
can configure what it should do if multiple commands match. And most important
wildignore can make him ignore files that you won't ever want to open (for
example binary files) by file extension.
</p>
<p>