Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 30 min 3 sec ago

Colin Charles: Speaking in June 2016

Tue, 2016-05-31 19:02

I have a few upcoming speaking engagements in June 2016:

  • Nerdear.la – June 9-10 2016 – Buenos Aires, Argentina – never been to this event but MariaDB Corporation are sponsors and I’m quite excited to be back in Buenos Aires. I’m going to talk about the MySQL ecosystem in 2016.
  • SouthEast LinuxFest – June 10-12 2016 – Charlotte, NC, USA – I have a few talks here, a bit bummed that I’m going to be missing the speaker dinner, but I expect this to be another great year. Learn about MariaDB Server/MySQL Security Essentials, the MySQL ecosystem in 2016, and about distributions from the view of a package.
  • NYC MySQL Meetup – June 27 2016 – New York, USA – I’m going to give a talk on lessons you can learn from other people’s database failures. I did this at rootconf.in, and it was well received so I’m quite excited to give this again.

So if you’re in Buenos Aires, Charlotte or New York, I’m looking forward to seeing you to talk all things databases and open source.

Binh Nguyen: Is Western Leadership Required? More Social Systems, and More

Tue, 2016-05-31 06:42
Western Leadership stuff:From time to time you hear things about Western leadership being 'required'. Some of it sounds borderline authoritarian/dictatorial at times. I wanted to take a look further at this:- examples of such quotes include "Instead, America should write the rules. America should call the shots. Other countries should play by the rules that America and our partners set, and not

Maxim Zakharov: plsh2

Tue, 2016-05-31 01:04

PL/sh - is a nice extension to PostgreSQL allowing to write stored procedures in an interpreted language, e.g. bash, python, perl, php, etc.

I found it useful though having a major drawback that the amount of data you can pass via arguments of such procedures may hit command line limitations, i.e. no more 254 spaces and no more 2MB (or even less).

So I have made a change that the value of the first argument is passed via stdin to the script implementing the stored procedure, the rest of arguments is passed as $1, $2, $3, etc. This change is allow to overcome above mentioned limitations in case when big amount of data is passed via one parameter.

Here is a tiny example I have added to the test suite with new functionality:

CREATE FUNCTION perl_concat2(text, text) RETURNS text LANGUAGE plsh2 AS ' #!/usr/bin/perl print while (<STDIN>); print $ARGV[0]; '; SELECT perl_concat2('pe', 'rl');

You may get modified PL/sh in my repository on GitHub: github.com/Maxime2/plsh. It has been implemented as a new procedural language plsh2, so you do not need to change anything in already created procedures/functions using plsh (and you can continue use it as before).

Chris Smart: How to find out which process is listening on a port

Mon, 2016-05-30 17:03

Say that you notice UDP port 323 is open (perhaps via netstat -lun) and you’ve no idea what that is!

With lsof it’s easy to find out which process is guilty:


[15:27 chris ~]$ sudo lsof -i :323
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
chronyd 1044 chrony 1u IPv4 19197 0t0 UDP localhost:323
chronyd 1044 chrony 2u IPv6 19198 0t0 UDP localhost:323

In this case, it’s chrony, the modern time keeping daemon.

David Rowe: Codec 2 Masking Model Part 5

Mon, 2016-05-30 15:04

In the last post in this series I was getting close to a fully quantised 700 bit/s codec. However as I pushed through I discovered a bug in the post-filter. I was accidentally cheating and using some of the encoder information in the decoder. When I corrected the bug the quality dropped significantly. I’ve hit these sorts of bugs before – the simulation code is complex and it’s easy to “declare victory” prematurely.

So I have abandoned the AbyS approach for now. Oh well, that’s “research and disappointment” for you. Plenty of new ideas though….

For the last few months I have been working on another solution that vector quantises a “fixed rate” version of the spectrum. The masking functions are still used to smooth the spectrum before sampling at the fixed rate. Much like we low pass filter time domain samples before sampling, the masking functions reduce the “bandwidth” and hence sample “rate” we need to represent the spectrum. Here is a block diagram of the current “700C” candidate codec:

The bit allocation is pitch (Wo) 6 bits, 1 bit for voicing, 16 bits for the amplitude VQ, 4 bits for energy and 1 bit spare. All updated every 40ms. The new work is in the “Decimate in Frequency” block, expanded here:

As the pitch of the speech varies, the number of harmonics used to represent the speech, L, varies. The goal is take a vector of L amplitude samples, vector quantise, and send them over a channel. To vector quantise them we need fixed length vectors. So a Discrete Fourier Transform (DFT) is used to resample the L amplitude samples to fixed vectors of length 20 (I have chosen k=10).

BTW a DFT is the generic form of a Fast Fourier Transform (FFT). A FFT is a computationally efficient (fast) way of computing a DFT.

The steps are similar to sampling a time domain signal. The bandwidth of the signal is limited by using the masking function to smooth the variations in the amplitude envelope. The use of masking functions means the smoothing matches the response of the ear, and no perceptually important information is lost.

I’ve recently been playing with OFDM modems, so I used a “cyclic suffix” to further smooth the DFT coefficients. DFTs like cyclic signals. If you have a DFT of an 8kHz signal, the sample at 3900Hz is the “close” to the sample at 0 Hz. If there is a step jump in amplitude – you get a lot of high frequency information in the DFT coefficients which is harder to quantise. So I throw away the last 500Hz of the speech signal (3500-4000 Hz), and replace it with a curve that ensures a smooth match between 3500 Hz and 0 Hz.

Yeah, I don’t know how I dream this stuff up either …… do I use the Force? Too much red wine or espresso? Experience? A life mispent on computers? Subconscious innovation? Plagiarism?

In the past I’ve tried to resample and VQ the spectrum of sinusoidal codecs a few times, without much success. Jean Marc also suggested something similar a few posts back. Anyhoo, getting somewhere this time around.

Here are some plots that show the algorithm in action for a frame of female speech:

Here are the amplitude samples (red crosses). The blue line has the cyclic suffix, note how it meets the first amplitude sample near 0Hz.

This figure shows the difference in the DFT coefficients with (blue) and without (green) the cyclic suffix:

Here is the cumulative energy of DFT coefficients, note that with the cyclic suffix (blue) low frequency energy dominates:

This figure shows a typical 2k=20 length vector that we vector quantise. Note it has zero mean – we extract the DC coefficient and separately quantise this as the frame energy.

Samples

Sample 1300 700C Candidate hts1a Listen Listen hts2a Listen Listen forig Listen Listen ve9qrp_10s Listen Listen mmt1 Listen Listen vkqi Listen Listen cq_ref Listen Listen

Through a couple of years of on-air operation we have established that the 1300 bit/s codec (as used in FreeDV 1600 with 300 bit/s of FEC) has acceptable speech quality for HF. So the goal of this work is similar quality at 700 bit/s.

For some samples above (e.g. hts1a and mmt1a), 1300 is superior to the current 700C candidate. For others (e.g. hts2a and vk5qi) 700 sounds a little better. So I think I’m in the ball park.

There’s a bit of clipping at the start of cq_ref, and some level variations between the two modes on some samples. The 700C candidate has a few problems with unvoiced sounds, e.g. the intake of breath on ve9qrp_10, and the “ch” sound at the start of chicken in hts2a. Not sure why.

The cq_ref_1300 sample is a bit poor as the LPC technique used for spectral amplitudes falls over when the spectral dynamic range is high. In this sample the LF energy has much higher energy than the HF, i.e. a strong “Low Pass Filter” effect or spectral slope.

Next step is some refactoring – the Octave code is an untidy mess of 6 months of dead ends and false starts. A mirror of real world R&D I guess. Creating something new is not a tidy process. At least in my head. So many aspects of this algorithm that I could explore but I’d rather get this on the air and see if we really have something here. Would love to have some help with a port from Octave to C. Contact me if you’d like to work in this area.

Michael Still: Oryx and Crake

Fri, 2016-05-27 21:00



ISBN: 9780385721677
LibraryThing
I bought this book ages ago, on the recommendation of a friend (I don't remember who), but I only just got around to reading it. Its a hard book to read in places -- its not hopeful, or particularly fun, and its confronting in places -- especially the plot that revolves around child exploitation. There's very little to like about the future society that Atwood posits here, but perhaps that's the point.

Despite not being a happy fun story, the book made me think about things like genetic engineering in a way I didn't before and I think that's what Atwood was seeking to achieve. So I'd have to describe the book as a success.

Tags for this post: book margaret_atwood apocalypse genetic_engineering
Related posts: The Exterminator's Want Ad; Cyteen: The Vindication; East of the Sun, West of the Moon; The White Dragon; Runner; Cyteen: The Betrayal Comment Recommend a book

Francois Marier: Using OpenVPN on iOS and OSX

Fri, 2016-05-27 16:03

I have written instructions on how to connect to your own OpenVPN server using Network Manager as well as Android.

Here is how to do it on iOS and OSX assuming you have followed my instructions for the server setup.

Generate new keys

From the easy-rsa directory you created while generating the server keys, create a new keypair for your phone:

./build-key iphone # "iphone" as Name, no password

and for your laptop:

./build-key osx # "osx" as Name, no password Using OpenVPN Connect on iOS

The app you need to install from the App Store is OpenVPN Connect.

Once it's installed, connect your phone to your computer and transfer the following files using iTunes:

  • ca.crt
  • iphone.crt
  • iphone.key
  • iphone.ovpn
  • ta.key

You should then be able to select it after launching the app. See the official FAQ if you run into any problems.

iphone.ovpn is a configuration file that you need to supply since the OpenVPN Connect app doesn't have a configuration interface. You can use this script to generate it or write it from scratch using this template.

On Linux, you can also create a configuration file using Network Manager 1.2, use the following command:

nmcli connection export hafnarfjordur > iphone.ovpn

though that didn't quite work in my experience.

Here is the config I successfully used to connect to my server:

client remote hafnarfjordur.fmarier.org 1194 ca ca.crt cert iphone.crt key iphone.key cipher AES-256-CBC auth SHA384 comp-lzo yes proto udp tls-remote server remote-cert-tls server ns-cert-type server tls-auth ta.key 1 Using Viscosity on Mac OSX

One of the possible OpenVPN clients you can use on OSX is Viscosity.

Here are the settings you'll need to change when setting up a new VPN connection:

  • General
    • Remote server: hafnarfjordur.fmarier.org
  • Authentication
    • Type: SSL/TLS client
    • CA: ca.crt
    • Cert: osx.crt
    • Key: osx.key
    • Tls-Auth: ta.key
    • direction: 1
  • Options
    • peer certificate: require server nsCertType
    • compression: turn LZO on
  • Networking
    • send all traffic on VPN
  • Advanced
    • add the following extra OpenVPN configuration commands:

      cipher AES-256-CBC auth SHA384

Matthew Oliver: Adventures in Container Sharding – SQLite performance problem and the pivot point.

Fri, 2016-05-27 15:03

Hey world it’s been a while, turns out I’m not much of a blogger. But I know how useful for myself it is to do write-ups occasionally so I can actually find them later.. having said that. In my last post I mentioned I was an OpenStack Developer.. and it’s still true. I spend my time hacking and working on Openstack Swift the awesome OpenSource object storage cluster.

One thing I’ve been trying to tackle recently is container sharding in Swift, I will not go into full details as there is a Swift Spec that is relatively recent, and I’ve also gave a high level talk on it at LCA in Geelong.

The tl;dr being, Swift accounts and containers (or the metadata layer of Swift) are SQLite databases that get treated like objects themselves and replicated throughout the cluster. Which works amazingly well. Until you add millions and millions of objects to a container. And what I’m talking about here is container level object metadata, not the objects themselves. When this happens, SQLite being a file starts to have latency and locking issues, as one would expect. The solution to this is shard up these container databases throughout the cluster, which is what I’ve been working on.

At the last OpenStack summit in Austin, the awesome people at SwiftStack, whom I work quite closely with in the community gave me a container database they generated that has 700,000,000 objects in it (metadata again). This SQLite file is about 105G so not small. Plugging this into a small cluster I have to test my sharding implementation has been interesting to say the least.

When sharding a container down, we have a simple idea, split it in half. That is to say find someplace in the object table to pivot on. We can then keep pivoting giving us a list of ranges (which can be treated as a binary tree). The problem is finding the pivot point. In all my testing up til now I had what I thought was the perfect and simple way:

SELECT name
FROM object
WHERE deleted=0 ORDER BY name LIMIT 1 OFFSET (
SELECT object_count / 2
FROM policy_stat);

This did amazingly well in all my tests.. but I obviously never got big enough. This simple SQL statement would do plenty well if sharding in Swift was turned on from day dot, but the sharding plans for Swift is for once it’s solved in this POC to add to Swift as a beta which can be turned ‘on’ at a container by container basis when you want. After it graduates from beta but is still a switch. To finally once we are confident in it’s ability to have it on permanently. In the latter case container’s would never get big enough to worry about.. However, in the earlier stages a user would only turn it on when the container is _very_ slow.

Using the pivot SQL statement on the large container I was given ground to a halt, I’m sure it would have come back to be eventually, but I got tired of waiting after what seemed for ages.. there has to be a better way.

Turns out the OFFSET statement in SQLite, even when hitting an index still does a scan to find the offset.. This is slow when you get to a very large table size. Turns out under the hood, an Index is stored as a double-linked list, which I’m sure probably has optimisations but anyway I was struggling to think of a good way to find a good enough middle value that didn’t involve some table scanning. You can see from the SQL statement, we know have many objects we have in the container, but the problem is because swift is eventually consistent we need to temporally store objects that have been deleted. So randomly picking an index doesn’t help, and it wont necessarily be in name order.

So on really large containers OFFSET needs to be thrown out the window. Turns out the sharding implementation can deal with shrinking the number of shards, merging smaller ranges together, not just growing/splitting. This means we don’t actually need to be exact, we also don’t actually need to split on an existing object, just a name that would be somewhere in the middle and so long as it’s cutting down the large container then it’s good enough. So what can we do?

Turns out there is an optimisation in SQLite, because an index is a double-linked list and ordered by it’s index, it’s really quick if all we want to do is go to the first or last element. So that’s what I’ve done:

SELECT min(name) as name FROM object WHERE deleted = 0;
SELECT max(name) as name FROM object WHERE deleted = 0;

These two statements are blindingly fast due to the fact that we already have a compound index on name and delete (for cleaning up). Note however they have to be run as 2 separate commands, combine the two into one and you loose your optimisation and you’ll have to scan all elements. Having the min and max name is a good start, and even when dealing with already sharded containers, they are just smaller ranges so this still works. The question is now what?

In the perfect work we have an even distribution of objects between the min and max names, so we just need to find a middle name between the two to pivot on. Turns out even in a not evenly distributed container we will still be shrinking the container, even at worst case only be a few objects. But these will be cleaned up later (merged into a neighbour range by the implementation). And so long as the container gets smaller, eventually it’ll shrink small enough to be usable.

Next step is finding the middle value, to do this I just wrote some python:

from itertools import izip_longest import sys lower = unicode(sys.argv[1]) upper = unicode(sys.argv[2]) def middle_str(str1, str2): result = [] for l, u in izip_longest(map(ord, str1), map(ord, str2), fillvalue=0): result.append((l + u) // 2) return u''.join(map(unichr, result)) if __name__ == "__main__": print(middle_str(lower, upper))

What does it do. Calling middle_str(min, max) will grab the unicode versions of the strings, turn them into there interger values, find the middle and turn them back into a word. After matching the prefix that is. So:

$ python middle_str.py 'aaaaaaaaa' 'zzzzzzzzz'
mmmmmmmmm

$ python middle_str.py 'aaaaaaaaa' 'aazzzzzzz'
aammmmmmm

$ python middle_str.py 'DFasjiojsaoi' 'ZZsdkmfi084f'
OPjkjkjiQLQg

I am now plugging this into my implementation and lets tackle this large container again.

Binh Nguyen: More Social and Economic Systems, Music Stuff, and More

Fri, 2016-05-27 00:07
- modern economics just gibberish. St Louis Federal Reserve Database. Unemployment data distorted. One hour at soup kitchen (even volunteering) is 'employed'? International rule not Western centric. This rule (and others) developed in last 20 years. US still in sustained slump? Shadow stats. Neo-liberalism assumes that private sector will allocate more effectively? Using margin to control and

Joshua Hesketh: Third party testing with Turbo-Hipster

Mon, 2016-05-23 13:05

Why is this hipster voting on my code?!

Soon you are going to see a new robot barista leaving comments on Nova code reviews. He is obsessed with espresso, that band you haven’t heard of yet, and easing the life of OpenStack operators.

Doing a large OpenStack deployment has always been hard when it came to database migrations. Running a migration requires downtime, and when you have giant datasets that downtime could be hours. To help catch these issues Turbo-Hipster (http://josh.people.rcbops.com/2013/09/building-a-zuul-worker/) will now run your patchset’s migrations against copies of real databases. This will give you valuable feedback on the success of the patch, and how long it might take to migrate.

Depending on the results, Turbo-Hipster will add a review to your patchset that looks something like this:

What should I do if Turbo-Hipster fails?

That depends on why it has failed. Here are some scenarios and steps you can take for different errors:

FAILURE – Did not find the end of a migration after a start

  • If you look at the log you should find that a migration began but never finished. Hopefully there’ll be a traceroute for you to follow through to get some hints about why it failed.

WARNING – Migration %s took too long

  • In this case your migration took a long time to run against one of our test datasets. You should reconsider what operations your migration is performing and see if there are any optimisations you can make, or if each step is really necessary. If there is no way to speed up your migration you can email us at rcbau@rcbops.com for an exception.

FAILURE – Final schema version does not match expectation

  • Somewhere along the line the migrations stopped and did not reach the expected version. The datasets start at previous releases and have to upgrade all the way through. If you see this inspect the log for traceroutes or other hints about the failure.

FAILURE – Could not setup seed database. FAILURE – Could not find seed database.

  • These two are internal errors. If you see either of these, contact us at rcbau@rcbops.com to let us know so we can fix and rerun the tests for you.

FAILURE – Could not import required module.

  • This error probably shouldn’t happen as Jenkins should catch it in the unit tests before Turbo-Hipster launches. If you see this, please contact us at rcbau@rcbops.com and let us know.

If you receive an error that you think is a false positive, leave a comment on the review with the sole contents of recheck migrations.

If you see any false positives or have any questions or problems please contact us on rcbau@rcbops.com

Joshua Hesketh: git.openstack.org adventures

Mon, 2016-05-23 13:05

Over the past few months I started to notice occasional issues when cloning repositories (particularly nova) from git.openstack.org.

It would fail with something like

git clone -vvv git://git.openstack.org/openstack/nova . fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed

The problem would occur sporadically during our 3rd party CI runs causing them to fail. Initially these went somewhat ignored as rechecks on the jobs would succeed and the world would be shiny again. However, as they became more prominent the issue needed to be addressed.

When a patch merges in gerrit it is replicated out to 5 different cgit backends (git0[1-5].openstack.org). These are then balanced by two HAProxy frontends which are on a simple DNS round-robin.

+-------------------+ | git.openstack.org | | (DNS Lookup) | +--+-------------+--+ | | +--------+ +--------+ | A records | +-------------------v----+ +-----v------------------+ | git-fe01.openstack.org | | git-fe02.openstack.org | | (HAProxy frontend) | | (HAProxy frontend) | +-----------+------------+ +------------+-----------+ | | +-----+ +---+ | | +-----v------------------------------------v-----+ | +---------------------+ (source algorithm) | | | git01.openstack.org | | | | +---------------------+ | | +---| git02.openstack.org | | | | +---------------------+ | | +---| git03.openstack.org | | | | +---------------------+ | | +---| git04.openstack.org | | | | +---------------------+ | | +---| git05.openstack.org | | | | (HAProxy backend) | | | +---------------------+ | +------------------------------------------------+

Reproducing the problem was difficult. At first I was unable to reproduce locally, or even on an isolated turbo-hipster run. Since the problem appeared to be specific to our 3rd party tests (little evidence of it in 1st party runs) I started by adding extra debugging output to git.

We were originally cloning repositories via the git:// protocol. The debugging information was unfortunately limited and provided no useful diagnosis. Switching to https allowed for more CURL output (when using GIT_CURL_VERBVOSE=1 and GIT_TRACE=1) but this in itself just created noise. It actually took me a few days to remember that the servers are running arbitrary code anyway (a side effect of testing) and therefore cloning from the potentially insecure http protocol didn’t provide any further risk.

Over http we got a little more information, but still nothing that was conclusive at this point:

git clone -vvv http://git.openstack.org/openstack/nova . error: RPC failed; result=18, HTTP code = 200 fatal: The remote end hung up unexpectedly fatal: protocol error: bad pack header

After a bit it became more apparent that the problems would occur mostly during high (patch) traffic times. That is, when a lot of tests need to be queued. This lead me to think that either the network turbo-hipster was on was flaky when doing multiple git clones in parallel or the git servers were flaky. The lack of similar upstream failures lead me to initially think it was the former. In order to reproduce I decided to use Ansible to do multiple clones of repositories and see if that would uncover the problem. If needed I would have then extended this to orchestrating other parts of turbo-hipster in case the problem was systemic of something else.

Firstly I need to clone from a bunch of different servers at once to simulate the network failures more closely (rather than doing multiple clones on the one machine or from the one IP in containers for example). To simplify this I decided to learn some Ansible to launch a bunch of nodes on Rackspace (instead of doing it by hand).

Using the pyrax module I put together a crude playbook to launch a bunch of servers. There is likely much neater and better ways of doing this, but it suited my needs. The playbook takes care of placing appropriate sshkeys so I could continue to use them later.

--- - name: Create VMs hosts: localhost vars: ssh_known_hosts_command: "ssh-keyscan -H -T 10" ssh_known_hosts_file: "/root/.ssh/known_hosts" tasks: - name: Provision a set of instances local_action: module: rax name: "josh-testing-ansible" flavor: "4" image: "Ubuntu 12.04 LTS (Precise Pangolin) (PVHVM)" region: "DFW" count: "15" group: "raxhosts" wait: yes register: raxcreate - name: Add the instances we created (by public IP) to the group 'raxhosts' local_action: module: add_host hostname: "{{ item.name }}" ansible_ssh_host: "{{ item.rax_accessipv4 }}" ansible_ssh_pass: "{{ item.rax_adminpass }}" groupname: raxhosts with_items: raxcreate.success when: raxcreate.action == 'create' - name: Sleep to give time for the instances to start ssh #there is almost certainly a better way of doing this pause: seconds=30 - name: Scan the host key shell: "{{ ssh_known_hosts_command}} {{ item.rax_accessipv4 }} &gt;&gt; {{ ssh_known_hosts_file }}" with_items: raxcreate.success when: raxcreate.action == 'create' - name: Set up sshkeys hosts: raxhosts tasks: - name: Push root's pubkey authorized_key: user=root key="{{ lookup('file', '/root/.ssh/id_rsa.pub') }}"

From here I can use Ansible to work on those servers using the rax inventory. This allows me to address any nodes within my tenant and then log into them with the seeded sshkey.

The next step of course was to run tests. Firstly I just wanted to reproduce the issue, so in order to do that it would crudely set up an environment where it can simply clone nova multiple times.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" tasks: - name: Clone nova shell: "git clone http://git.openstack.org/openstack/nova"

By default Ansible runs with 5 folked processes. Meaning that Ansible would work on 5 servers at a time. We want to exercise git heavily (in the same way turbo-hipster does) so we use the –forks param to run the clone on all the servers at once. The plan was to keep launching servers until the error reared its head from the load.

To my surprise this happened with very few nodes (less than 15, but I left that as my minimum testing). To confirm I also ran the tests after launching further nodes to see it fail at 50 and 100 concurrent clones. It turned out that the more I cloned the higher the failure rate percentage was.

Now that I had the problem reproducing, it was time to do some debugging. I modified the playbook to capture tcpdump information during the clone. Initially git was cloning over IPv6 so I turned that off on the nodes to force IPv4 (just in case it was a v6 issue, but the problem did present itself on both networks). I also locked git.openstack.org to one IP rather than randomly hitting both front ends.

--- - name: Prepare servers for git testing hosts: josh-testing-ansible* serial: "100%" tasks: - name: Install git apt: name=git state=present update_cache=yes - name: remove nova if it is already cloned shell: 'rm -rf nova' - name: Clone nova and monitor tcpdump hosts: josh-testing-ansible* serial: "100%" vars: cap_file: tcpdump_{{ ansible_hostname }}_{{ ansible_date_time['epoch'] }}.cap tasks: - name: Disable ipv6 1/3 sysctl: name="net.ipv6.conf.all.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 2/3 sysctl: name="net.ipv6.conf.default.disable_ipv6" value=1 sysctl_set=yes - name: Disable ipv6 3/3 sysctl: name="net.ipv6.conf.lo.disable_ipv6" value=1 sysctl_set=yes - name: Restart networking service: name=networking state=restarted - name: Lock git.o.o to one host lineinfile: dest=/etc/hosts line='23.253.252.15 git.openstack.org' state=present - name: start tcpdump command: "/usr/sbin/tcpdump -i eth0 -nnvvS -w /tmp/{{ cap_file }}" async: 6000000 poll: 0 - name: Clone nova shell: "git clone http://git.openstack.org/openstack/nova" #shell: "git clone http://github.com/openstack/nova" ignore_errors: yes - name: kill tcpdump command: "/usr/bin/pkill tcpdump" - name: compress capture file command: "gzip {{ cap_file }} chdir=/tmp" - name: grab captured file fetch: src=/tmp/{{ cap_file }}.gz dest=/var/www/ flat=yes

This gave us a bunch of compressed capture files that I was then able to seek the help of my colleagues to debug (a particular thanks to Angus Lees). The results from an early run can be seen here: http://119.9.51.216/old/run1/

Gus determined that the problem was due to a RST packet coming from the source at roughly 60 seconds. This indicated it was likely we were hitting a timeout at the server or a firewall during the git-upload-pack of the clone.

The solution turned out to be rather straight forward. The git-upload-pack had simply grown too large and would timeout depending on the load on the servers. There was a timeout in apache as well as the HAProxy config for both frontend and backend responsiveness. The relative patches can be found at https://review.openstack.org/#/c/192490/ and https://review.openstack.org/#/c/192649/

While upping the timeout avoids the problem, certain projects are clearly pushing the infrastructure to its limits. As such a few changes were made by the infrastructure team (in particular James Blair) to improve git.openstack.org’s responsiveness.

Firstly git.openstack.org is now a higher performance (30GB) instance. This is a large step up from the previous (8GB) instances that were used as the frontend previously. Moving to one frontend additionally meant the HAProxy algorithm could be changed to leastconn to help balance connections better (https://review.openstack.org/#/c/193838/).

+--------------------+ | git.openstack.org | | (HAProxy frontend) | +----------+---------+ | | +------------------------v------------------------+ | +---------------------+ (leastconn algorithm) | | | git01.openstack.org | | | | +---------------------+ | | +---| git02.openstack.org | | | | +---------------------+ | | +---| git03.openstack.org | | | | +---------------------+ | | +---| git04.openstack.org | | | | +---------------------+ | | +---| git05.openstack.org | | | | (HAProxy backend) | | | +---------------------+ | +-------------------------------------------------+

All that was left was to see if things had improved. I rerun the test across 15, 30 and then 45 servers. These were all able to clone nova reliably where they had previously been failing. I then upped it to 100 servers where the cloning began to fail again.

Post-fix logs for those interested:
http://119.9.51.216/run15/
http://119.9.51.216/run30/
http://119.9.51.216/run45/
http://119.9.51.216/run100/
http://119.9.51.216/run15per100/

At this point, however, I’m basically performing a Distributed Denial of Service attack against git. As such, while the servers aren’t immune to a DDoS the problem appears to be fixed.

Joshua Hesketh: New Blog

Mon, 2016-05-23 13:05

Welcome to my new blog.

You can find my old one here: http://josh.opentechnologysolutions.com/blog/joshua-hesketh

I intend on back-porting those posts into this one in due course. For now though I’m going to start posting about my adventures in openstack!

Joshua Hesketh: Introducing turbo-hipster for testing nova db migrations

Mon, 2016-05-23 13:05

Zuul is the continuous integration utility used by OpenStack to gate patchsets against tests. It takes care of communicating with gerrit (the code review system) and the test workers – usually Jenkins. You can read more about how the systems tie together on the OpenStack Project Infrastructure page.

The nice thing is that zuul doesn’t require you to use Jenkins. Anybody can provide a worker to zuul using the gearman protocol (which is a simple job server). Enter turbo-hipster*.

“Turbo-hipster is a CI worker with pluggable tasks initially designed to test OpenStack’s database migrations against copies of real databases.”

This will hopefully catch scenarios where changes to the database schema may not work due to outliers in real datasets and also help find where a migration may take an unreasonable amount of time against a large database.

In zuuls layout configuration we are able to specify which jobs should be ran against which projects in which pipelines. For example, for nova we want to run tests when a patchset is created, but we don’t need to run tests against it (necessarily) once it is merged etc. So in zuul we specify a new gate (aka job) to test nova against real databases.

turbo-hipster then listens for jobs created on that gate using the gearman protocol. Once it receives a patchset from zuul it creates a virtual environment and tests the upgrades. It then compiles and sends back the results.

At the moment turbo-hipster is still under heavy development but I hope to have it reporting results back to gerrit patchsets soon as part of zuuls report summary. For the moment I have a separate zuul instance running to test new nova patches and email the results back to me. Here is an example result report:

<code>Build succeeded. - http://thw01.rcbops.com/logviewer/?q=/results/47/47162/9/check/gate-real-db-upgrade_nova_mysql/c4bc35c/index.html : SUCCESS in 13m 31s </code>

*The name was randomly generated and does not necessarily contain meaning.

Joshua Hesketh: LinuxCon Europe

Mon, 2016-05-23 13:05

After travelling very close to literally the other side of the world[0] I’m in Edinburgh for LinuxCon EU recovering from jetlag and getting ready to attend. I’m very much looking forward to my first LinuxCon, meeting new people and learning lots :-).

If you’re around and would like to catch up drop me a comment here. Otherwise I’ll see you at the conference!

[0] http://goo.gl/maps/JeJO2

Joshua Hesketh: OpenStack infrastructure swift logs and performance

Mon, 2016-05-23 13:05

Turns out I’m not very good at blogging very often. However I thought I would put what I’ve been working on for the last few days here out of interest.

For a while the OpenStack Infrastructure team have wanted to move away from storing logs on disk to something more cloudy – namely, swift. I’ve been working on this on and off for a while and we’re nearly there.

For the last few weeks the openstack-infra/project-config repository has been uploading its CI test logs to swift as well as storing them on disk. This has given us the opportunity to compare the last few weeks of data and see what kind of effects we can expect as we move assets into an object storage.

  • I should add a disclaimer/warning, before you read, that my methods here will likely make statisticians cringe horribly. For the moment though I’m just getting an indication for how things compare.
The set up

Fetching files from an object storage is nothing particularly new or special (CDN’s have been doing it for ages). However, for our usage we want to serve logs with os-loganalyze giving the opportunity to hyperlink to timestamp anchors or filter by log severity.

First though we need to get the logs into swift somehow. This is done by having the job upload its own logs. Rather than using (or writing) a Jenkins publisher we use a bash script to grab the jobs own console log (pulled from the Jenkins web ui) and then upload it to swift using credentials supplied to the job as environment variables (see my zuul-swift contributions).

This does, however, mean part of the logs are missing. For example the fetching and upload processes write to Jenkins’ console log but because it has already been fetched these entries are missing. Therefore this wants to be the very last thing you do in a job. I did see somebody do something similar where they keep the download process running in a fork so that they can fetch the full log but we’ll look at that another time.

When a request comes into logs.openstack.org, a request is handled like so:

  1. apache vhost matches the server
  2. if the request ends in .txt.gz, console.html or console.html.gz rewrite the url to prepend /htmlify/
  3. if the requested filename is a file or folder on disk, serve it up with apache as per normal
  4. otherwise rewrite the requested file to prepend /htmlify/ anyway

os-loganalyze is set up as an WSGIScriptAlias at /htmlify/. This means all files that aren’t on disk are sent to os-loganalyze (or if the file is on disk but matches a file we want to mark up it is also sent to os-loganalyze). os-loganalyze then does the following:

  1. Checks the requested file path is legitimate (or throws a 400 error)
  2. Checks if the file is on disk
  3. Checks if the file is stored in swift
  4. If the file is found markup (such as anchors) are optionally added and the request is served
    1. When serving from swift the file is fetched via the swiftclient by os-loganlayze in chunks and streamed to the user on the fly. Obviously fetching from swift will have larger network consequences.
  5. If no file is found, 404 is returned

If the file exists both on disk and in swift then step #2 can be skipped by passing ?source=swift as a parameter (thus only attempting to serve from swift). In our case the files exist both on disk and in swift since we want to compare the performance so this feature is necessary.

So now that we have the logs uploaded into swift and stored on disk we can get into some more interesting comparisons.

Testing performance process

My first attempt at this was simply to fetch the files from disk and then from swift and compare the results. A crude little python script did this for me: http://paste.openstack.org/show/122630/

The script fetches a copy of the log from disk and then from swift (both through os-loganalyze and therefore marked-up) and times the results. It does this in two scenarios:

  1. Repeatably fetching the same file over again (to get a good average)
  2. Fetching a list of recent logs from gerrit (using the gerrit api) and timing those

I then ran this in two environments.

  1. On my local network the other side of the world to the logserver
  2. On 5 parallel servers in the same DC as the logserver

Running on my home computer likely introduced a lot of errors due to my limited bandwidth, noisy network and large network latency. To help eliminate these errors I also tested it on 5 performance servers in the Rackspace cloud next to the log server itself. In this case I used ansible to orchestrate the test nodes thus running the benchmarks in parallel. I did this since in real world use there will often be many parallel requests at once affecting performance.

The following metrics are measured for both disk and swift:

  1. request sent – time taken to send the http request from my test computer
  2. response – time taken for a response from the server to arrive at the test computer
  3. transfer – time taken to transfer the file
  4. size – filesize of the requested file

The total time can be found by adding the first 3 metrics together.

 

Results Home computer, sequential requests of one file

 

The complementary colours are the same metric and the darker line represents swift’s performance (over the lighter disk performance line). The vertical lines over the plots are the error bars while the fetched filesize is the column graph down the bottom. Note that the transfer and file size metrics use the right axis for scale while the rest use the left.

As you would expect the requests for both disk and swift files are more or less comparable. We see a more noticable difference on the responses though with swift being slower. This is because disk is checked first, and if the file isn’t found on disk then a connection is sent to swift to check there. Clearly this is going to be slower.

The transfer times are erratic and varied. We can’t draw much from these, so lets keep analyzing deeper.

The total time from request to transfer can be seen by adding the times together. I didn’t do this as when requesting files of different sizes (in the next scenario) there is nothing worth comparing (as the file sizes are different). Arguably we could compare them anyway as the log sizes for identical jobs are similar but I didn’t think it was interesting.

The file sizes are there for interest sake but as expected they never change in this case.

You might notice that the end of the graph is much noisier. That is because I’ve applied some rudimentary data filtering.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 54.89516183 43.71917948 56.74750291 194.7547117 849.8545127 838.9172066 7.121600095 7.311125275 Mean 283.9594368 282.5074598 373.7328851 531.8043908 5091.536092 5122.686897 1219.804598 1220.735632

 

I know it’s argued as poor practice to remove outliers using twice the standard deviation, but I did it anyway to see how it would look. I only did one pass at this even though I calculated new standard deviations.

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 13.88664039 14.84054789 44.0860569 115.5299781 541.3912899 515.4364601 7.038111654 6.98399691 Mean 274.9291111 276.2813889 364.6289583 503.9393472 5008.439028 5013.627083 1220.013889 1220.888889

 

I then moved the outliers to the end of the results list instead of removing them completely and used the newly calculated standard deviation (ie without the outliers) as the error margin.

Then to get a better indication of what are average times I plotted the histograms of each of these metrics.

Here we can see a similar request time.
 

Here it is quite clear that swift is slower at actually responding.
 

Interestingly both disk and swift sources have a similar total transfer time. This is perhaps an indication of my network limitation in downloading the files.

 

Home computer, sequential requests of recent logs

Next from my home computer I fetched a bunch of files in sequence from recent job runs.

 

 

Again I calculated the standard deviation and average to move the outliers to the end and get smaller error margins.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 54.89516183 43.71917948 194.7547117 56.74750291 849.8545127 838.9172066 7.121600095 7.311125275 Mean 283.9594368 282.5074598 531.8043908 373.7328851 5091.536092 5122.686897 1219.804598 1220.735632 Second pass without outliers Standard Deviation 13.88664039 14.84054789 115.5299781 44.0860569 541.3912899 515.4364601 7.038111654 6.98399691 Mean 274.9291111 276.2813889 503.9393472 364.6289583 5008.439028 5013.627083 1220.013889 1220.888889

 

What we are probably seeing here with the large number of slower requests is network congestion in my house. Since the script requests disk, swift, disk, swift, disk.. and so on this evens it out causing a latency in both sources as seen.
 

Swift is very much slower here.

 

Although comparable in transfer times. Again this is likely due to my network limitation.
 

The size histograms don’t really add much here.
 

Rackspace Cloud, parallel requests of same log

Now to reduce latency and other network effects I tested fetching the same log over again in 5 parallel streams. Granted, it may have been interesting to see a machine close to the log server do a bunch of sequential requests for the one file (with little other noise) but I didn’t do it at the time unfortunately. Also we need to keep in mind that others may be access the log server and therefore any request in both my testing and normal use is going to have competing load.
 

I collected a much larger amount of data here making it harder to visualise through all the noise and error margins etc. (Sadly I couldn’t find a way of linking to a larger google spreadsheet graph). The histograms below give a much better picture of what is going on. However out of interest I created a rolling average graph. This graph won’t mean much in reality but hopefully will show which is faster on average (disk or swift).
 

You can see now that we’re closer to the server that swift is noticeably slower. This is confirmed by the averages:

 

  request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 32.42528982 9.749368282 245.3197219 781.8807534 1082.253253 2737.059103 0 0 Mean 4.87337544 4.05191168 39.51898688 245.0792916 1553.098063 4167.07851 1226 1232 Second pass without outliers Standard Deviation 1.375875503 0.8390193564 28.38377158 191.4744331 878.6703183 2132.654898 0 0 Mean 3.487575109 3.418433003 7.550682037 96.65978872 1389.405618 3660.501404 1226 1232

 

Even once outliers are removed we’re still seeing a large latency from swift’s response.

The standard deviation in the requests now have gotten very small. We’ve clearly made a difference moving closer to the logserver.

 

Very nice and close.
 

Here we can see that for roughly half the requests the response time was the same for swift as for the disk. It’s the other half of the requests bringing things down.
 

The transfer for swift is consistently slower.

 

Rackspace Cloud, parallel requests of recent logs

Finally I ran just over a thousand requests in 5 parallel streams from computers near the logserver for recent logs.

 

Again the graph is too crowded to see what is happening so I took a rolling average.

 

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 0.7227904332 0.8900549012 434.8600827 909.095546 1913.9587 2132.992773 6.341238774 7.659678352 Mean 3.515711867 3.56191383 145.5941102 189.947818 2427.776165 2875.289455 1219.940039 1221.384913 Second pass without outliers Standard Deviation 0.4798803247 0.4966553679 109.6540634 171.1102999 1348.939342 1440.2851 6.137625464 7.565931993 Mean 3.379718381 3.405770445 70.31323922 86.16522485 2016.900047 2426.312363 1220.318912 1221.881335

 

The averages here are much more reasonable than when we continually tried to request the same file. Perhaps we’re hitting limitations with swifts serving abilities.

 

I’m not sure why we have sinc function here. A network expert may be able to tell you more. As far as I know this isn’t important to our analysis other than the fact that both disk and swift match.
 

Here we can now see swift keeping a lot closer to disk results than when we only requested the one file in parallel. Swift is still, unsurprisingly, slower overall.
 

Swift still loses out on transfers but again does a much better job of keeping up.
 

Error sources

I haven’t accounted for any of the following swift intricacies (in terms of caches etc) for:

  • Fetching random objects
  • Fetching the same object over and over
  • Fetching in parallel multiple different objects
  • Fetching the same object in parallel

I also haven’t done anything to account for things like file system caching, network profiling, noisy neighbours etc etc.

os-loganalyze tries to keep authenticated with swift, however

  • This can timeout (causes delays while reconnecting, possibly accounting for some spikes?)
  • This isn’t thread safe (are we hitting those edge cases?)

We could possibly explore getting longer authentication tokens or having os-loganalyze pull from an unauthenticated CDN to add the markup and then serve. I haven’t explored those here though.

os-loganalyze also handles all of the requests not just from my testing but also from anybody looking at OpenStack CI logs. In addition to this it also needs to deflate the gzip stream if required. As such there is potentially a large unknown (to me) load on the log server.

In other words, there are plenty of sources of errors. However I just wanted to get a feel for the general responsiveness compared to fetching from disk. Both sources had noise in their results so it should be expected in the real world when downloading logs that it’ll never be consistent.

Conclusions

As you would expect the request times are pretty much the same for both disk and swift (as mentioned earlier) especially when sitting next to the log server.

The response times vary but looking at the averages and the histograms these are rarely large. Even in the case where requesting the same file over and over in parallel caused responses to go slow these were only in the magnitude of 100ms.

The response time is the important one as it indicates how soon a download will start for the user. The total time to stream the contents of the whole log is seemingly less important if the user is able to start reading the file.

One thing that wasn’t tested was streaming of different file sizes. All of the files were roughly the same size (being logs of the same job). For example, what if the asset was a few gigabytes in size, would swift have any significant differences there? In general swift was slower to stream the file but only by a few hundred milliseconds for a megabyte. It’s hard to say (without further testing) if this would be noticeable on large files where there are many other factors contributing to the variance.

Whether or not these latencies are an issue is relative to how the user is using/consuming the logs. For example, if they are just looking at the logs in their web browser on occasion they probably aren’t going to notice a large difference. However if the logs are being fetched and scraped by a bot then it may see a decrease in performance.

Overall I’ll leave deciding on whether or not these latencies are acceptable as an exercise for the reader.

Michael Still: Potato Point

Mon, 2016-05-23 13:00
I went to Potato Point with the Scouts for a weekend wide game. Very nice location, apart from the ticks!

                                       

See more thumbnails

Tags for this post: blog pictures 20160523 photo coast scouts bushwalk
Related posts: Exploring the Jagungal; Scout activity: orienteering at Mount Stranger

Comment

Richard Jones: PyCon Australia 2016: Registration Opens!

Mon, 2016-05-23 09:04

We are delighted to announce that online registration is now open for PyCon Australia 2016. The seventh PyCon Australia is being held in Melbourne, Victoria from August 12th – 16th at the Melbourne Convention and Exhibition Centre, will draw hundreds of Python developers, enthusiasts and students from Australasia and afar.

Starting today, early bird offers are up for grabs. To take advantage of these discounted ticket rates, be among the first 90 to register. Early bird registration starts from $60 for full-time students, $190 for enthusiasts and $495 for professionals. Offers this good won’t last long, so register right away.

We strongly encourage attendees to organise their accommodation as early as possible, as demand for cheaper rooms is very strong during the AFL season.

PyCon Australia has endeavoured to keep tickets as affordable as possible. Financial assistance is also available: for information about eligibility, head to our financial assistance page and apply. We are able to make such offers thanks to our Sponsors and Contributors.

To begin the registration process, and find out more about each level of ticket, visit our registration information page.

Important Dates to Help You Plan

  • 22 May: Registration opens - ‘Early bird’ prices for the first 90 tickets
  • 17 June: Last day to apply for financial assistance
  • 26 June: Last day to purchase conference dinner tickets
  • 9 July: Last day to order conference t-shirts
  • 12 August: PyCon Australia 2016 begins!

About PyCon Australia

PyCon Australia is the national conference for the Python programming community. The seventh PyCon Australia will be held on August 12-16 2016 in Melbourne, bringing together professional, student and enthusiast developers with a love for programming in Python. PyCon Australia informs the country’s developers with presentations by experts and core developers of Python, as well as the libraries and frameworks that they rely on.

To find out more about PyCon Australia 2016, visit our website at pycon-au.org, follow us at @pyconau or e-mail us at contact@pycon-au.org.

PyCon Australia is presented by Linux Australia (www.linux.org.au) and acknowledges the support of our Platinum Sponsors, DevDemand.co and IRESS; and our Gold sponsors, Google Australia and Optiver. For full details of our sponsors, see our website.

Danielle Madeley: Django and PostgreSQL composite types

Sun, 2016-05-22 23:02

PostgreSQL has this nifty feature called composite types that you can use to create your own types from the built-in PostgreSQL types. It’s a bit like hstore, only structured, which makes it great for structured data that you might reuse multiple times in a model, like addresses.

Unfortunately to date, they were pretty much a pain to use in Django. There were some older implementations for versions of Django before 1.7, but they tended to do things like create surprise new objects in the namespace, not be migrateable, and require connection to the DB at any time (i.e. during your build).

Anyway, after reading a bunch of their implementations and then the Django source code I wrote django-postgres-composite-types.

Install with:

pip install django-postgres-composite-types

Then you can define a composite type declaratively:

from django.db import models from postgres_composite_type import CompositeType class Address(CompositeType): """An address.""" address_1 = models.CharField(max_length=255) address_2 = models.CharField(max_length=255) suburb = models.CharField(max_length=50) state = models.CharField(max_length=50) postcode = models.CharField(max_length=10) country = models.CharField(max_length=50) class Meta: db_type = 'x_address' # Required

And use it in a model:

class Person(models.Model): """A person.""" address = Address.Field()

The field should provide all of the things you need, including formfield etc and you can even inherit this field to extend it in your own way:

class AddressField(Address.Field): def __init__(self, in_australia=True, **kwargs): self.in_australia = in_australia super().__init__(**kwargs)

Finally to set up the DB there is a migration operation that will create the type that you can add:

import address from django.db import migrations class Migration(migrations.Migration): operations = [ # Registers the type address.Address.Operation(), migrations.AddField( model_name='person', name='address', field=address.Address.Field(blank=True, null=True), ), ]

It’s not smart enough to add it itself (can you do that?). Nor would it be smart enough to write the operations to alter a type. That would be a pretty cool trick. But it’s useful functionality all the same, especially when the alternative is creating lots of 1:1 models that are hard to work with and hard to garbage collect.

It’s still pretty early days, so the APIs are subject to change. PRs accepted of course.

Maxim Zakharov: Restoring gitstats

Sat, 2016-05-21 23:04

gitstats tool has stopped working on our project after upgrade to Ubuntu 16.04. Finally I have got time to have a look. There were two issues with it:

  1. we do not need to use process wait as process communicate waits until process termination and the last process in the pipeline do not finish until all processes before it in the pipeline terminate, plus process wait may deadlock on pipes with huge output, see notice at https://docs.python.org/2/library/subprocess.html
  2. On Ubuntu 16.04 grep has started to give "Binary file (standard input) matches" notice into the pipe which breaks parsing.

I have made a pull request which fixes this issue: https://github.com/hoxu/gitstats/pull/65
Also you can clone fixed version from my account: https://github.com/Maxime2/gitstats

Glen Turner: Heatsink for RPi3

Fri, 2016-05-20 18:36

I ordered a passive heatsink for system-on-chip of the Raspberry Pi 3 model B. Since it fits well I'll share the details:

Order
  • Fischer Elektronik ICK S 14 X 14 X 10 heatsink (Element 14 catalogue 1850054, AUD3.70).

  • Fischer Elektronik WLFT 404 23X23 thermally conductive foil, adhesive (Element 14 catalogue 1211707, AUD2.42 ).

Install

To install you need these parts: two lint-free isopropyl alcohol swabs; and these tools: sharp craft knife, a anti-static wrist strap.

Prepare the heatsink: Swab the base of the heatsink. Wait for it to dry. Remove the firm clear plastic from the thermal foil, taking care not to get fingerprints in the centre of the exposed sticky side. Put the foil on the bench, sticky side up. Plonk the heatsink base onto the sticky side, rolling slightly to avoid air bubbles and then pressing hard. Trim around the edges of the heatsink with the craft knife.

Prepare the Raspberry Pi 3 system-on-chip: Unlug everything from the RPi3, turn off the power, wait a bit, plug the USB power lead back in but don't reapply power (this gives us a ground reference). If the RPi3 is in a case, just remove the lid. Attach wrist strap and clamp to ethernet port surround or some other convenient ground. Swab the largest of the chips on the board, ensuring no lint remains.

Attach heat sink: Remove the plastic protection from the thermal foil, exposing the other sticky side. Do not touch the sticky side. With care place the heatsink squarely and snuggly on the chip. Press down firmly with finger of grounded hand for a few seconds. Don't press too hard: we're just ensuring the glue binds.

Is it worth it?

This little passive heatsink won't stop the RPi3 from throttling under sustained full load, despite this being one of the more effective passive heatsinks on the market. You'll need a fan blowing air across the heatsink to prevent that happening, and you might well need a heatsink on the RAM too.

But the days of CPUs being able to run at full rate continuously are numbered. Throttling the CPU performance under load is common in phones and tablets, and is not rare in laptops.

What the heatsink allows is for a delay to the moment of throttling. So a peaky load can have more chance of not causing throttling. Since we're only talking AUD7.12 in parts a passive heatsink is worth it if you are going to use the RPi3 for serious purposes.

Of course the heatsink is also a more effective radiator. When running cpuburn-a53 the CPU core temperature stabilises at 80C with a CPU clock of 700MHz (out of 1200MHz). It's plain that 80C is the target core temperature for this version of the RPi3's firmware. That's some 400MHz higher than without the heatsink. But if your task needs sustained raw CPU performance then you are much better off with even the cheapest of desktops, let alone a server.