Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 44 min 25 sec ago

Joshua Hesketh: New Blog

Mon, 2016-05-23 13:05

Welcome to my new blog.

You can find my old one here: http://josh.opentechnologysolutions.com/blog/joshua-hesketh

I intend on back-porting those posts into this one in due course. For now though I’m going to start posting about my adventures in openstack!

Joshua Hesketh: Introducing turbo-hipster for testing nova db migrations

Mon, 2016-05-23 13:05

Zuul is the continuous integration utility used by OpenStack to gate patchsets against tests. It takes care of communicating with gerrit (the code review system) and the test workers – usually Jenkins. You can read more about how the systems tie together on the OpenStack Project Infrastructure page.

The nice thing is that zuul doesn’t require you to use Jenkins. Anybody can provide a worker to zuul using the gearman protocol (which is a simple job server). Enter turbo-hipster*.

“Turbo-hipster is a CI worker with pluggable tasks initially designed to test OpenStack’s database migrations against copies of real databases.”

This will hopefully catch scenarios where changes to the database schema may not work due to outliers in real datasets and also help find where a migration may take an unreasonable amount of time against a large database.

In zuuls layout configuration we are able to specify which jobs should be ran against which projects in which pipelines. For example, for nova we want to run tests when a patchset is created, but we don’t need to run tests against it (necessarily) once it is merged etc. So in zuul we specify a new gate (aka job) to test nova against real databases.

turbo-hipster then listens for jobs created on that gate using the gearman protocol. Once it receives a patchset from zuul it creates a virtual environment and tests the upgrades. It then compiles and sends back the results.

At the moment turbo-hipster is still under heavy development but I hope to have it reporting results back to gerrit patchsets soon as part of zuuls report summary. For the moment I have a separate zuul instance running to test new nova patches and email the results back to me. Here is an example result report:

<code>Build succeeded. - http://thw01.rcbops.com/logviewer/?q=/results/47/47162/9/check/gate-real-db-upgrade_nova_mysql/c4bc35c/index.html : SUCCESS in 13m 31s </code>

*The name was randomly generated and does not necessarily contain meaning.

Joshua Hesketh: LinuxCon Europe

Mon, 2016-05-23 13:05

After travelling very close to literally the other side of the world[0] I’m in Edinburgh for LinuxCon EU recovering from jetlag and getting ready to attend. I’m very much looking forward to my first LinuxCon, meeting new people and learning lots :-).

If you’re around and would like to catch up drop me a comment here. Otherwise I’ll see you at the conference!

[0] http://goo.gl/maps/JeJO2

Joshua Hesketh: OpenStack infrastructure swift logs and performance

Mon, 2016-05-23 13:05

Turns out I’m not very good at blogging very often. However I thought I would put what I’ve been working on for the last few days here out of interest.

For a while the OpenStack Infrastructure team have wanted to move away from storing logs on disk to something more cloudy – namely, swift. I’ve been working on this on and off for a while and we’re nearly there.

For the last few weeks the openstack-infra/project-config repository has been uploading its CI test logs to swift as well as storing them on disk. This has given us the opportunity to compare the last few weeks of data and see what kind of effects we can expect as we move assets into an object storage.

  • I should add a disclaimer/warning, before you read, that my methods here will likely make statisticians cringe horribly. For the moment though I’m just getting an indication for how things compare.
The set up

Fetching files from an object storage is nothing particularly new or special (CDN’s have been doing it for ages). However, for our usage we want to serve logs with os-loganalyze giving the opportunity to hyperlink to timestamp anchors or filter by log severity.

First though we need to get the logs into swift somehow. This is done by having the job upload its own logs. Rather than using (or writing) a Jenkins publisher we use a bash script to grab the jobs own console log (pulled from the Jenkins web ui) and then upload it to swift using credentials supplied to the job as environment variables (see my zuul-swift contributions).

This does, however, mean part of the logs are missing. For example the fetching and upload processes write to Jenkins’ console log but because it has already been fetched these entries are missing. Therefore this wants to be the very last thing you do in a job. I did see somebody do something similar where they keep the download process running in a fork so that they can fetch the full log but we’ll look at that another time.

When a request comes into logs.openstack.org, a request is handled like so:

  1. apache vhost matches the server
  2. if the request ends in .txt.gz, console.html or console.html.gz rewrite the url to prepend /htmlify/
  3. if the requested filename is a file or folder on disk, serve it up with apache as per normal
  4. otherwise rewrite the requested file to prepend /htmlify/ anyway

os-loganalyze is set up as an WSGIScriptAlias at /htmlify/. This means all files that aren’t on disk are sent to os-loganalyze (or if the file is on disk but matches a file we want to mark up it is also sent to os-loganalyze). os-loganalyze then does the following:

  1. Checks the requested file path is legitimate (or throws a 400 error)
  2. Checks if the file is on disk
  3. Checks if the file is stored in swift
  4. If the file is found markup (such as anchors) are optionally added and the request is served
    1. When serving from swift the file is fetched via the swiftclient by os-loganlayze in chunks and streamed to the user on the fly. Obviously fetching from swift will have larger network consequences.
  5. If no file is found, 404 is returned

If the file exists both on disk and in swift then step #2 can be skipped by passing ?source=swift as a parameter (thus only attempting to serve from swift). In our case the files exist both on disk and in swift since we want to compare the performance so this feature is necessary.

So now that we have the logs uploaded into swift and stored on disk we can get into some more interesting comparisons.

Testing performance process

My first attempt at this was simply to fetch the files from disk and then from swift and compare the results. A crude little python script did this for me: http://paste.openstack.org/show/122630/

The script fetches a copy of the log from disk and then from swift (both through os-loganalyze and therefore marked-up) and times the results. It does this in two scenarios:

  1. Repeatably fetching the same file over again (to get a good average)
  2. Fetching a list of recent logs from gerrit (using the gerrit api) and timing those

I then ran this in two environments.

  1. On my local network the other side of the world to the logserver
  2. On 5 parallel servers in the same DC as the logserver

Running on my home computer likely introduced a lot of errors due to my limited bandwidth, noisy network and large network latency. To help eliminate these errors I also tested it on 5 performance servers in the Rackspace cloud next to the log server itself. In this case I used ansible to orchestrate the test nodes thus running the benchmarks in parallel. I did this since in real world use there will often be many parallel requests at once affecting performance.

The following metrics are measured for both disk and swift:

  1. request sent – time taken to send the http request from my test computer
  2. response – time taken for a response from the server to arrive at the test computer
  3. transfer – time taken to transfer the file
  4. size – filesize of the requested file

The total time can be found by adding the first 3 metrics together.

 

Results Home computer, sequential requests of one file

 

The complementary colours are the same metric and the darker line represents swift’s performance (over the lighter disk performance line). The vertical lines over the plots are the error bars while the fetched filesize is the column graph down the bottom. Note that the transfer and file size metrics use the right axis for scale while the rest use the left.

As you would expect the requests for both disk and swift files are more or less comparable. We see a more noticable difference on the responses though with swift being slower. This is because disk is checked first, and if the file isn’t found on disk then a connection is sent to swift to check there. Clearly this is going to be slower.

The transfer times are erratic and varied. We can’t draw much from these, so lets keep analyzing deeper.

The total time from request to transfer can be seen by adding the times together. I didn’t do this as when requesting files of different sizes (in the next scenario) there is nothing worth comparing (as the file sizes are different). Arguably we could compare them anyway as the log sizes for identical jobs are similar but I didn’t think it was interesting.

The file sizes are there for interest sake but as expected they never change in this case.

You might notice that the end of the graph is much noisier. That is because I’ve applied some rudimentary data filtering.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 54.89516183 43.71917948 56.74750291 194.7547117 849.8545127 838.9172066 7.121600095 7.311125275 Mean 283.9594368 282.5074598 373.7328851 531.8043908 5091.536092 5122.686897 1219.804598 1220.735632

 

I know it’s argued as poor practice to remove outliers using twice the standard deviation, but I did it anyway to see how it would look. I only did one pass at this even though I calculated new standard deviations.

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 13.88664039 14.84054789 44.0860569 115.5299781 541.3912899 515.4364601 7.038111654 6.98399691 Mean 274.9291111 276.2813889 364.6289583 503.9393472 5008.439028 5013.627083 1220.013889 1220.888889

 

I then moved the outliers to the end of the results list instead of removing them completely and used the newly calculated standard deviation (ie without the outliers) as the error margin.

Then to get a better indication of what are average times I plotted the histograms of each of these metrics.

Here we can see a similar request time.
 

Here it is quite clear that swift is slower at actually responding.
 

Interestingly both disk and swift sources have a similar total transfer time. This is perhaps an indication of my network limitation in downloading the files.

 

Home computer, sequential requests of recent logs

Next from my home computer I fetched a bunch of files in sequence from recent job runs.

 

 

Again I calculated the standard deviation and average to move the outliers to the end and get smaller error margins.

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 54.89516183 43.71917948 194.7547117 56.74750291 849.8545127 838.9172066 7.121600095 7.311125275 Mean 283.9594368 282.5074598 531.8043908 373.7328851 5091.536092 5122.686897 1219.804598 1220.735632 Second pass without outliers Standard Deviation 13.88664039 14.84054789 115.5299781 44.0860569 541.3912899 515.4364601 7.038111654 6.98399691 Mean 274.9291111 276.2813889 503.9393472 364.6289583 5008.439028 5013.627083 1220.013889 1220.888889

 

What we are probably seeing here with the large number of slower requests is network congestion in my house. Since the script requests disk, swift, disk, swift, disk.. and so on this evens it out causing a latency in both sources as seen.
 

Swift is very much slower here.

 

Although comparable in transfer times. Again this is likely due to my network limitation.
 

The size histograms don’t really add much here.
 

Rackspace Cloud, parallel requests of same log

Now to reduce latency and other network effects I tested fetching the same log over again in 5 parallel streams. Granted, it may have been interesting to see a machine close to the log server do a bunch of sequential requests for the one file (with little other noise) but I didn’t do it at the time unfortunately. Also we need to keep in mind that others may be access the log server and therefore any request in both my testing and normal use is going to have competing load.
 

I collected a much larger amount of data here making it harder to visualise through all the noise and error margins etc. (Sadly I couldn’t find a way of linking to a larger google spreadsheet graph). The histograms below give a much better picture of what is going on. However out of interest I created a rolling average graph. This graph won’t mean much in reality but hopefully will show which is faster on average (disk or swift).
 

You can see now that we’re closer to the server that swift is noticeably slower. This is confirmed by the averages:

 

  request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 32.42528982 9.749368282 245.3197219 781.8807534 1082.253253 2737.059103 0 0 Mean 4.87337544 4.05191168 39.51898688 245.0792916 1553.098063 4167.07851 1226 1232 Second pass without outliers Standard Deviation 1.375875503 0.8390193564 28.38377158 191.4744331 878.6703183 2132.654898 0 0 Mean 3.487575109 3.418433003 7.550682037 96.65978872 1389.405618 3660.501404 1226 1232

 

Even once outliers are removed we’re still seeing a large latency from swift’s response.

The standard deviation in the requests now have gotten very small. We’ve clearly made a difference moving closer to the logserver.

 

Very nice and close.
 

Here we can see that for roughly half the requests the response time was the same for swift as for the disk. It’s the other half of the requests bringing things down.
 

The transfer for swift is consistently slower.

 

Rackspace Cloud, parallel requests of recent logs

Finally I ran just over a thousand requests in 5 parallel streams from computers near the logserver for recent logs.

 

Again the graph is too crowded to see what is happening so I took a rolling average.

 

 

request sent (ms) – disk request sent (ms) – swift response (ms) – disk response (ms) – swift transfer (ms) – disk transfer (ms) – swift size (KB) – disk size (KB) – swift Standard Deviation 0.7227904332 0.8900549012 434.8600827 909.095546 1913.9587 2132.992773 6.341238774 7.659678352 Mean 3.515711867 3.56191383 145.5941102 189.947818 2427.776165 2875.289455 1219.940039 1221.384913 Second pass without outliers Standard Deviation 0.4798803247 0.4966553679 109.6540634 171.1102999 1348.939342 1440.2851 6.137625464 7.565931993 Mean 3.379718381 3.405770445 70.31323922 86.16522485 2016.900047 2426.312363 1220.318912 1221.881335

 

The averages here are much more reasonable than when we continually tried to request the same file. Perhaps we’re hitting limitations with swifts serving abilities.

 

I’m not sure why we have sinc function here. A network expert may be able to tell you more. As far as I know this isn’t important to our analysis other than the fact that both disk and swift match.
 

Here we can now see swift keeping a lot closer to disk results than when we only requested the one file in parallel. Swift is still, unsurprisingly, slower overall.
 

Swift still loses out on transfers but again does a much better job of keeping up.
 

Error sources

I haven’t accounted for any of the following swift intricacies (in terms of caches etc) for:

  • Fetching random objects
  • Fetching the same object over and over
  • Fetching in parallel multiple different objects
  • Fetching the same object in parallel

I also haven’t done anything to account for things like file system caching, network profiling, noisy neighbours etc etc.

os-loganalyze tries to keep authenticated with swift, however

  • This can timeout (causes delays while reconnecting, possibly accounting for some spikes?)
  • This isn’t thread safe (are we hitting those edge cases?)

We could possibly explore getting longer authentication tokens or having os-loganalyze pull from an unauthenticated CDN to add the markup and then serve. I haven’t explored those here though.

os-loganalyze also handles all of the requests not just from my testing but also from anybody looking at OpenStack CI logs. In addition to this it also needs to deflate the gzip stream if required. As such there is potentially a large unknown (to me) load on the log server.

In other words, there are plenty of sources of errors. However I just wanted to get a feel for the general responsiveness compared to fetching from disk. Both sources had noise in their results so it should be expected in the real world when downloading logs that it’ll never be consistent.

Conclusions

As you would expect the request times are pretty much the same for both disk and swift (as mentioned earlier) especially when sitting next to the log server.

The response times vary but looking at the averages and the histograms these are rarely large. Even in the case where requesting the same file over and over in parallel caused responses to go slow these were only in the magnitude of 100ms.

The response time is the important one as it indicates how soon a download will start for the user. The total time to stream the contents of the whole log is seemingly less important if the user is able to start reading the file.

One thing that wasn’t tested was streaming of different file sizes. All of the files were roughly the same size (being logs of the same job). For example, what if the asset was a few gigabytes in size, would swift have any significant differences there? In general swift was slower to stream the file but only by a few hundred milliseconds for a megabyte. It’s hard to say (without further testing) if this would be noticeable on large files where there are many other factors contributing to the variance.

Whether or not these latencies are an issue is relative to how the user is using/consuming the logs. For example, if they are just looking at the logs in their web browser on occasion they probably aren’t going to notice a large difference. However if the logs are being fetched and scraped by a bot then it may see a decrease in performance.

Overall I’ll leave deciding on whether or not these latencies are acceptable as an exercise for the reader.

Michael Still: Potato Point

Mon, 2016-05-23 13:00
I went to Potato Point with the Scouts for a weekend wide game. Very nice location, apart from the ticks!

                                       

See more thumbnails

Tags for this post: blog pictures 20160523 photo coast scouts bushwalk
Related posts: Exploring the Jagungal; Scout activity: orienteering at Mount Stranger

Comment

Richard Jones: PyCon Australia 2016: Registration Opens!

Mon, 2016-05-23 09:04

We are delighted to announce that online registration is now open for PyCon Australia 2016. The seventh PyCon Australia is being held in Melbourne, Victoria from August 12th – 16th at the Melbourne Convention and Exhibition Centre, will draw hundreds of Python developers, enthusiasts and students from Australasia and afar.

Starting today, early bird offers are up for grabs. To take advantage of these discounted ticket rates, be among the first 90 to register. Early bird registration starts from $60 for full-time students, $190 for enthusiasts and $495 for professionals. Offers this good won’t last long, so register right away.

We strongly encourage attendees to organise their accommodation as early as possible, as demand for cheaper rooms is very strong during the AFL season.

PyCon Australia has endeavoured to keep tickets as affordable as possible. Financial assistance is also available: for information about eligibility, head to our financial assistance page and apply. We are able to make such offers thanks to our Sponsors and Contributors.

To begin the registration process, and find out more about each level of ticket, visit our registration information page.

Important Dates to Help You Plan

  • 22 May: Registration opens - ‘Early bird’ prices for the first 90 tickets
  • 17 June: Last day to apply for financial assistance
  • 26 June: Last day to purchase conference dinner tickets
  • 9 July: Last day to order conference t-shirts
  • 12 August: PyCon Australia 2016 begins!

About PyCon Australia

PyCon Australia is the national conference for the Python programming community. The seventh PyCon Australia will be held on August 12-16 2016 in Melbourne, bringing together professional, student and enthusiast developers with a love for programming in Python. PyCon Australia informs the country’s developers with presentations by experts and core developers of Python, as well as the libraries and frameworks that they rely on.

To find out more about PyCon Australia 2016, visit our website at pycon-au.org, follow us at @pyconau or e-mail us at contact@pycon-au.org.

PyCon Australia is presented by Linux Australia (www.linux.org.au) and acknowledges the support of our Platinum Sponsors, DevDemand.co and IRESS; and our Gold sponsors, Google Australia and Optiver. For full details of our sponsors, see our website.

Danielle Madeley: Django and PostgreSQL composite types

Sun, 2016-05-22 23:02

PostgreSQL has this nifty feature called composite types that you can use to create your own types from the built-in PostgreSQL types. It’s a bit like hstore, only structured, which makes it great for structured data that you might reuse multiple times in a model, like addresses.

Unfortunately to date, they were pretty much a pain to use in Django. There were some older implementations for versions of Django before 1.7, but they tended to do things like create surprise new objects in the namespace, not be migrateable, and require connection to the DB at any time (i.e. during your build).

Anyway, after reading a bunch of their implementations and then the Django source code I wrote django-postgres-composite-types.

Install with:

pip install django-postgres-composite-types

Then you can define a composite type declaratively:

from django.db import models from postgres_composite_type import CompositeType class Address(CompositeType): """An address.""" address_1 = models.CharField(max_length=255) address_2 = models.CharField(max_length=255) suburb = models.CharField(max_length=50) state = models.CharField(max_length=50) postcode = models.CharField(max_length=10) country = models.CharField(max_length=50) class Meta: db_type = 'x_address' # Required

And use it in a model:

class Person(models.Model): """A person.""" address = Address.Field()

The field should provide all of the things you need, including formfield etc and you can even inherit this field to extend it in your own way:

class AddressField(Address.Field): def __init__(self, in_australia=True, **kwargs): self.in_australia = in_australia super().__init__(**kwargs)

Finally to set up the DB there is a migration operation that will create the type that you can add:

import address from django.db import migrations class Migration(migrations.Migration): operations = [ # Registers the type address.Address.Operation(), migrations.AddField( model_name='person', name='address', field=address.Address.Field(blank=True, null=True), ), ]

It’s not smart enough to add it itself (can you do that?). Nor would it be smart enough to write the operations to alter a type. That would be a pretty cool trick. But it’s useful functionality all the same, especially when the alternative is creating lots of 1:1 models that are hard to work with and hard to garbage collect.

It’s still pretty early days, so the APIs are subject to change. PRs accepted of course.

Maxim Zakharov: Restoring gitstats

Sat, 2016-05-21 23:04

gitstats tool has stopped working on our project after upgrade to Ubuntu 16.04. Finally I have got time to have a look. There were two issues with it:

  1. we do not need to use process wait as process communicate waits until process termination and the last process in the pipeline do not finish until all processes before it in the pipeline terminate, plus process wait may deadlock on pipes with huge output, see notice at https://docs.python.org/2/library/subprocess.html
  2. On Ubuntu 16.04 grep has started to give "Binary file (standard input) matches" notice into the pipe which breaks parsing.

I have made a pull request which fixes this issue: https://github.com/hoxu/gitstats/pull/65
Also you can clone fixed version from my account: https://github.com/Maxime2/gitstats

Glen Turner: Heatsink for RPi3

Fri, 2016-05-20 18:36

I ordered a passive heatsink for system-on-chip of the Raspberry Pi 3 model B. Since it fits well I'll share the details:

Order
  • Fischer Elektronik ICK S 14 X 14 X 10 heatsink (Element 14 catalogue 1850054, AUD3.70).

  • Fischer Elektronik WLFT 404 23X23 thermally conductive foil, adhesive (Element 14 catalogue 1211707, AUD2.42 ).

Install

To install you need these parts: two lint-free isopropyl alcohol swabs; and these tools: sharp craft knife, a anti-static wrist strap.

Prepare the heatsink: Swab the base of the heatsink. Wait for it to dry. Remove the firm clear plastic from the thermal foil, taking care not to get fingerprints in the centre of the exposed sticky side. Put the foil on the bench, sticky side up. Plonk the heatsink base onto the sticky side, rolling slightly to avoid air bubbles and then pressing hard. Trim around the edges of the heatsink with the craft knife.

Prepare the Raspberry Pi 3 system-on-chip: Unlug everything from the RPi3, turn off the power, wait a bit, plug the USB power lead back in but don't reapply power (this gives us a ground reference). If the RPi3 is in a case, just remove the lid. Attach wrist strap and clamp to ethernet port surround or some other convenient ground. Swab the largest of the chips on the board, ensuring no lint remains.

Attach heat sink: Remove the plastic protection from the thermal foil, exposing the other sticky side. Do not touch the sticky side. With care place the heatsink squarely and snuggly on the chip. Press down firmly with finger of grounded hand for a few seconds. Don't press too hard: we're just ensuring the glue binds.

Is it worth it?

This little passive heatsink won't stop the RPi3 from throttling under sustained full load, despite this being one of the more effective passive heatsinks on the market. You'll need a fan blowing air across the heatsink to prevent that happening, and you might well need a heatsink on the RAM too.

But the days of CPUs being able to run at full rate continuously are numbered. Throttling the CPU performance under load is common in phones and tablets, and is not rare in laptops.

What the heatsink allows is for a delay to the moment of throttling. So a peaky load can have more chance of not causing throttling. Since we're only talking AUD7.12 in parts a passive heatsink is worth it if you are going to use the RPi3 for serious purposes.

Of course the heatsink is also a more effective radiator. When running cpuburn-a53 the CPU core temperature stabilises at 80C with a CPU clock of 700MHz (out of 1200MHz). It's plain that 80C is the target core temperature for this version of the RPi3's firmware. That's some 400MHz higher than without the heatsink. But if your task needs sustained raw CPU performance then you are much better off with even the cheapest of desktops, let alone a server.

Steven Hanley: [mtb/events] UTA100 - The big dance through the blue mountains again

Fri, 2016-05-20 17:00
Back at Ultra Trail Australia running through the Blue Mountains wilderness

I am still fascinated by seeing how I can improve in this event, after running in pairs twice and now solo twice I signed up to come back this year still seeing how much time I can lop off my lap of the course. Though I continually claim I am not a runner with my mountain biking and adventure racing background I have been getting out on foot a lot since I got into doing this event. With an arbitrary number I apply to the time around this course before I may admit I am a runner of 12 hours I as coming back to see how close to this goal I would get.

My first year solo in 2014 I was positive I would finish just now sure how fast, thinking on the day I may take around 15 hours I managed 13:44 which at the time had me happy and a little surprised. In 2015 I had a few things interrupt my lead up and not everything felt great so though I hoped to go under 13 hours I was not sure, managing 13:15 was not what I wanted but I got around the loop again anyway.

In 2016 I continued to not have a training program and simply work toward goals by judging effort in my head and race schedule leading up to the event. However most running science seems to suggest the more you can run without getting injured the better. So on January 1st 2016 I kicked off a running streak to see how long it would last. I managed to run every day in 2016 until Wednesday before UTA100, so 132 days in a row with a minimum distance of 5km. This included the days before and after efforts such as the razorback ultra in Victoria and the Six Foot Track marathon in the Blue Mountains.

I never really managed to get much speed work into my prep again this year however had definitely upped my volume doing between 70 and 125km every week of the year with most of it on trails with some good altitude gain at times. I also remained un injured and able to run every day which was great, even with the odd fall or problem I could work around and keep moving through I was feeling good before the event. Due to my tendency to waste time at the check points on course I also had my sister here to support me this year so I would be able to run into CP 3, 4 and 5. Grab new bottles, have food shoved at me and head on out.

All was looking fairly good and I was sure I could go under 13 hours this year the question remained how far under I could get. Then Wednesday night before the race I got home feeling awful and shivering and needed to crawl into bed early and get sleep, waking up Thursday I felt worse if possible and was worried it was all over I had gotten sick and nothing would help. I left work at 2pm that day and headed home to sleep the rest of the day. Fortunately by the time I woke on Friday morning I no longer felt so awful, and actually felt I may be able to run the next day. I had stopped my running streak on Wednesday, no real need to continue it and feeling so bad for two days definitely had to stop.

I arrived Friday afternoon, spent money with Graham and Hanny in their store for some stuff I needed from Find Your Feet and headed to the briefing. The welcome to country form David King was once again a highlight of the runners briefing it is a fantastic part of the race every year and really heart felt, genuine and funny. Met my sister Jane at our accommodation and discussed the race day and estimated times while eating dinner. Fortunately I finally felt ready to run again by the time I went to sleep Friday night. I had a few runs the week before with what I call Happy Legs where you feel awesome running and light and happy on your feet. Though I hoped for that on Saturday I knew I just had to get out on the track and keep moving well.

I was in wave 1 and starting at 6:20am, had a chat with my mate Tom Reeve on the start line and then we got moving, taking it easy on the 5km bitumen loop I had a chat with Phil Whitten who was worried after stomach issues in six foot caused him problems he may have issues today too (in the end he did alas), still it was nice to be moving and cruising along the out and back before the steps. In wave 1 it was nice and open and even the descent down Furber steps was pretty open. Ran through toward the golden stairs feeling OK, never awesome but not like it was going to be a horrible day out.

I got onto the fire road out Narrow Neck and realised I was probably a few beats higher than I probably should be HR wise however decided to stay with it and ensure I not push too hard on the hills climbs along here. With the start out and back slightly extended this year it was good to pass through CP1 in the same time as last year so on course for slightly faster, however I would not have a proper idea of time ad how I was going until Dunphys camp. On the climb from Cedar gap I noticed some people around me seemed to be pushing harder than I thought they should however that had nothing to do with me so I kept moving and hoping I survived. On the descent down to the camp I had my left adductor cramp a bit which seems to happen here every year so I have to manage it and keep going.

At Dunphys CP I had a chat to Myf happy to actually see her or Matt this year (I missed seeing them here last year) and got moving aware I would need to take it easy on iron pot to keep the cramps at bay. I got onto Iron Pot and loved being able to say thanks to David King and his colleagues welcoming us to country with Didgeridoo and clap sticks up there, the short out and back made it easier this year and then I took it really easy on the loose ski slope sort of descent down due to cramps being close to the surface. Continued taking it easy chatting with other runners as we went back past the outgoing track on our right and then we dropped down to the bottom of the valley to start heading up Megalong Rd.

Looking at my watch I was probably behind time to do sub 12 hours already at this point but would have a much better idea once I got to Six Foot CP in a little while. I took it easy climbing the rd at strong power walk and then managed a comfortable 4 to 5 minute pace along the road into the CP. I got out of CP3 just before the 5 hour mark, this was confirming I was unlikely to go under 12 hours, I expected I needed to be gone from here in 4h40m to manage sub 12 knowing how I was feeling. I grabbed some risotto and baked potatoes with salt from Jane to see if I could eat these for some variety rather than sweet crap while climbing to Katoomba. On the way into the CP I passed Etienne who had an injury so asked her to see if he needed help when he got in (though that made it harder for her to get to e in time at Katoomba, fortunately Etienne had his parents there to help him out when he had to withdraw there)

Trying to eat the solid food was difficult and slowing me down so I gave up by the time I hit the single track just before the stairs. I had a chat with a blonde woman (it may have been Daniela Burton) and it was her first 100 so I told her not to get discouraged how long the next leg (CP4 to CP5) takes and to keep focusing on moving forward. I also had a chat with Ben Grimshaw a few times on the way up Nellies as I was passed by him while trying to eat solid food and then caught him again on the stairs once I started pushing up there reasonably fast once more. We cruised through the single track at the top passing a few runners and got into CP4 pretty much together.

I had to refill my water bladder here as well as get two new bottles, still with Jane's help I got out of here fast and left by 6 hours 30 minutes on the race clock. Though behind Ben now as he was quicker in the CP. Now I was happy to hit my race goal of feeling pretty good at Katoomba and still being keen to run which is always the way I think you need to feel at this point as the next leg is the the crux of the race, the half marathon of stairs is really a tough mental and physical barrier to get through.

I headed along to echo point through some crowds on the walk way near the cliff edge and it was nice to have some of the tourists cheering us on, a few other runners were near by and we got through nicely. On the descent down the giant stair case I seemed to pass a few people pretty comfortably and then on to Dardanelle's pass and it was nice running through there for a while. Of course getting down to Leura forest we got to see some 50km runners coming the other way (a few asked me where I was going worried they had made a wrong turn, when I said I was a 100km runner they realised all was cool told me well done and kept going).

I caught Ben again on the way up the stairs from Leura forest and we were near each other a bit for a while then however I seemed to pull ahead on stairs a bit so over the next while I got away from him (he caught me later in the race anyway though). Last year I had a diabetic low blood sugar incident in this leg, somewhere just before the wentworth falls lookout carpark I think. So I was paying more attention through the day on constant calorie intake with lots of clif shot blocks and gu gels. I kept moving well enough through this whole leg so that turned out well. I Said hi to Graham (Hammond) who was cheering runners on at the Fairmont resort water station and ran on for a few more stairs.

Running in to CP 5 on king tableland road I still felt alright and managed to eat another three cubes of shot block there. I had run out of plain water (bladder) again so had not had a salt tablet for a little while. This year I had decided to run with more salt consumption and had bought hammer enduralyte salt tablets, I was downing 1 or 2 of them every time I ate all day which I think may have helped, though I still had cramps around Dunphys that happens every year and I knew I had run a bit hard early anyway (hoping to hit splits needed for sub 12). However even though it was a hot day and many people struggled more in the heat than other years I seemed to deal with it well. However I had discovered I struggled to down the tablets with electrolyte drink from my bottles (high 5 tablets, usually berry flavour) so I needed plain water from the camelback for them.

I got more food from Jane at CP5, re lubed myself a bit refilled the bladder and got moving. I also grabbed a second head torch, though I was carrying one already I liked the beam pattern more on the one I grabbed here, though with full water, bottles and the extra torch I felt pretty heavy running out of CP 5. Still just 3 hours to go now I expected. I got out of there at 9h25m on the race clock which was good, thus if I could have a good run through here I may be able to get in under 12h20m (2h50m run would be nice for this leg at this point). I got moving on the approach to the kedumba descent joking with a few others around me it was time to smash the quads and say good bye to them as they were no longer needed after this really. (only one short sort of descent to Leura creek) I was asked if we needed quads on the stairs, my response was they were a glute fest and allowed use of arms due to the railing so who needs quads after Kedumba.

However as I got on to the descent and passed under the overhang I noticed my legs were a bit off and I could not open up well, I thought about it and realised I was probably low on sugar and needed to eat, eating at this sort of downhill pace was a bit hard (especially as some food was making me feel like throwing up (gels)). I thought I would try to hang on until the bottom as I could walk up out of Jamisons creek eating. However I needed to slow to a walk just after passing the Mt Solitary turn off and down a gel. Then a few minutes later trying to run still did not work so I had to stop and walk and eat for a while again rather than descending at full speed. Doing all of that I was passed by a few people (I think the woman who came 5th, the guy I joked about not needing Quads with and a few others).

Oh well I should have eaten more while stopped at the CP or on the flat at the top, oops, lost time (in the results comparing with people I ran similar splits all day to I may have lost as much as 15 minutes here with this issue). Once I got onto the climb out of Jamisons creek I ate some more and focused on holding a reasonably strong hike, the people who passed me were long gone and I could not motivate myself to push hard to see if I would catch them or not. I was passing a number of 50km runners by this point (I think the sweep must have been at CP5 when I went through). They were fun to cheer on and chat with as I caught and passed them, getting down to Leura creek was good as it was still day light and I could get moving up there to the last aid and onto the finish before I thought about lights.

Ben caught me a gain here saying he had really pushed hard on the kedumba descent and he was looking good so sat a little ahead of me up to the aid station. I refilled my bottles and kept going chatting with 50 km runners as I passed them. I got to the poo farm a bit quicker than I expected (going on feeling as I was not looking at my watch much) however it was good to finally be up on Federal pass not long after that and this is where I decided to focus on moving fast. The last two years I crawled along here and I think I lost a lot of time, I know last year I had mentally given up so was crawling, the year before I think I was just a bit stuffed by then. This time I focused on running whenever it was not a steep up and on getting over to the stairs as quickly as possible.

It was still fun cheering on the 50km runners and chatting with them as I passed, I even saw some women in awesome pink outfits I had seen here a few weeks earlier while training so it was good to cheer them on, when I asked them about it they said it was them and they recognised me (it's pinky they exclaimed) as I passed. I got to the base of the stairs at 12:14 so knew I had to work hard to finish in under 12:30 but it was time to get that done if possible. On the climb up the stairs it felt like I was getting stuck behind 50km runners on many of the narrow sections of stairs however it probably was not much time slowing up the pace (one occasion a race doctor was walking up the stairs with a runner just to help them get to the finish). I managed to get across the finish line in 12:29:51 (57th overall) which was a good result all things considered.

Thanks go to Jane for coming up from Sydney and supporting me all day, Tom, Al and AROC for keeping the fun happening for all the runners, Dave and co for some excellent course markings, all the other AROC people and volunteers. David, Julie, Alex and others for company on lots of the training the last few months. I have a few ideas for what I need to work on next to faster on this course, however am thinking I may have a year off UTA100 to go do something else. The Hubert race in South Australia at the start of may looks like it could be awesome (running in the wilpena pound area through the Flinders ranges) and it will probably be good to develop my base and speed a bit more over time before my next attempt to see if I can become a runner (crack 12 hours on this course).

UTA100 really is the pinnacle of trail running in Australia with the level of competition, course fun quality, vibe on ocurse and the welcome to country, the event history and everything else so I hightly recommend it to anyone keen to challenge themselves. Even if so far this year the event that has really grabbed my attention the most is probably the Razorback Ultra, it is a very different day out to UTA100 so it is all good fun to get outdoors and enjoy the Australian wilderness.

Gary Pendergast: Introducing: Linkify for Chrome

Thu, 2016-05-19 23:03

In WordPress 4.2, a fun little feature was quietly snuck into Core, I’m always delighted to see people’s reactions when they discover it.

Thank you kind WordPress Devs for introducing the paste-a-link-on-highlighted-text feature. It's already saved me half an hour this week!

— Meagan Hanes (@mhanes) May 10, 2016

I love being able to highlight text in @WordPress, paste a URL, and a link appearing. Thanks to @ellaiseulde for leading recent changes!

— Morgan Estes (@morganestes) April 29, 2016

But there’s still a problem – WordPress is only ~26% of the internet, how can you get the same feature on the other 74%? Well, that problem has now been rectified. Introducing, Linkify for Chrome:



Linkify

Automatically transform pasted URLs into links.

chrome.google.com

Thank you to Davide for creating Linkify’s excellent icon!

Linkify is a Chrome extension to automatically turn a pasted URL into a link, just like you’re used to in WordPress. It also supports Trac and Markdown-style links, so you can paste links on your favourite bug trackers, too.

Speaking of bug trackers, if there are any other link formats you’d like to see, post a ticket over on the Linkify GitHub repo!

Oh, and speaking of Chrome extensions, you might be like me, and find the word “emojis” to be extraordinarily awkward. If so, I have another little extension, just for you.

Stewart Smith: Fuzzing Firmware – afl-fuzz + skiboot

Thu, 2016-05-19 09:00

In what is likely to be a series on how firmware makes some normal tools harder to use, first I’m going to look at american fuzzy lop – a tool for fuzz testing that if you’re not using then you most certainly have bugs it’ll find for you.

I first got interested in afl-fuzz during Erik de Castro Lopo’s excellent linux.conf.au 2016 in Geelong earlier this year: “Fuzz all the things!“. In a previous life, the Random Query Generator managed to find a heck of a lot of bugs in MySQL (and Drizzle). For randgen info, see Philip Stoev’s talk on it from way back in 2009, a recent (2014) blog post on how Tokutek uses it and some notes on how it was being used at Oracle from 2013. Basically, the randgen was a specialized fuzzer that (given a grammar) would randomly generate SQL queries, and then (if the server didn’t crash), compare the result to some other database server (e.g. your previous version).

The afl-fuzz fuzzer takes a different approach – it’s a much more generic fuzzer rather than a targeted tool. Also, while tools such as the random query generator are extremely powerful and find specialized bugs, they’re hard to get started with. A huge benefit of afl-fuzz is that it’s really, really simple to get started with.

Basically, if you have a binary that takes input on stdin or as a (relatively small) file, afl-fuzz will just work and find bugs for you – read the Quick Start Guide and you’ll be finding bugs in no time!

For firmware of course, we’re a little different than a simple command line program as, well, we aren’t one! Luckily though, we have unit tests. These are just standard binaries that include a bunch of firmware code and get run in user space as part of “make check”. Also, just like unit tests for any project, people do send me patches that break tests (which I reject).

Some of these tests act on data we get from a place – maybe reading other parts of firmware off PNOR or interacting with data structures we get from other bits of firmware. For testing this code, it can be relatively easy to (for the test), read these off disk.

For skiboot, there’s a data structure we get from the service processor on FSP machines called HDAT. Basically, it’s just like the device tree, but different. Because yet another binary format is always a good idea (yes, that is laced with a heavy dose of sarcasm). One of the steps in early boot is to parse the HDAT data structure and convert it to a device tree. Luckily, we structured our code so that creating a unit test that can run in userspace was relatively easy, we just needed to dump this data structure out from a running machine. You can see the test case here. Basically, hdat_to_dt is a binary that reads the HDAT structure out of a pair of files and prints out a device tree. One of the regression tests we have is that we always produce the same output from the same input.

So… throwing that into AFL yielded a couple of pretty simple bugs, especially around aborting out on invalid data (it’s better to exit the process with failure rather than hit an assert). Nothing too interesting here on my simple input file, but it does mean that our parsing code exits “gracefully” on invalid data.

Another utility we have is actually a userspace utility for accessing the gard records in the flash. A GARD record is a record of a piece of hardware that has been deconfigured due to a fault (or a suspected fault). Usually this utility operates on PNOR flash through /dev/mtd – but really what it’s doing is talking to the libflash library, that we also use inside skiboot (and on OpenBMC) to read/write from flash directly, via /dev/mtd or just from a file. The good news? I haven’t been able to crash this utility yet!

So I modified the pflash utility to read from a file to attempt to fuzz the partition reading code we have for the partitioning format that’s on PNOR. So far, no crashes – although to even get it going I did have to fix a bug in the file handling code in pflash, so that’s already a win!

But crashing bugs aren’t the only type of bugs – afl-fuzz has exposed several cases where we act on uninitialized data. How? Well, we run some test cases under valgrind! This is the joy of user space unit tests for firmware – valgrind becomes a tool that you can run! Unfortunately, these bugs have been sitting in my “todo” pile (which is, of course, incredibly long).

Where to next? Fuzzing the firmware calls themselves would be nice – although that’s going to require a targeted tool that knows about what to pass each of the calls. Another round of afl-fuzz running would also be good, I’ve fixed a bunch of the simple things and having a better set of starting input files would be great (and likely expose more bugs).

Binh Nguyen: More PSYOPS, Social Systems, and More

Tue, 2016-05-17 22:18
- I think that most people would agree that the best social systems revolve around the idea that we have fair and just laws. If the size of the security apparatus exceeds a certain point (which seems to be happening in a lot of places) are we certain that we have the correct laws and societal laws in place? If they can't convince through standard argumentation then the policy is probably not

Colin Charles: London roadshow wrap-up, see you in Paris next week

Sun, 2016-05-15 01:01

Just a few days ago, I presented at the MariaDB Roadshow in London, and I had a lot of fun. While I had canned slides, I did know the topic intimately well, so it was good to get further in-depth. In addition, we had these MasterMind sessions, basically the place to get one-on-one time with Anders/Luisa/or me, I noticed that pretty much everyone said they were buying services afterwards (which more or less must mean the event was rather successful from that standpoint!).

In addition to that, I was happy to see that from attendee feedback, I did have the highest averages – thank you!

So here’s to repeating this in Paris next week — Gestion des données pour les applications vitales – MariaDB Roadshow Paris. I look forward to seeing you there, and I know we are repeating the MasterMind sessions. To fine-tune it, try to bring as much information as you possibly can so our time can be extremely effective.

Tim Serong: The Politics of Resentment

Sat, 2016-05-14 23:04

I’ve been reading The Archdruid Report regularly for a long time now, because unlike me, John Michael Greer posts every week and always writes something interesting. Given that we’ve got a federal election coming up in Australia and that I’ve mentioned one of JMG’s notions on the current state of politics to several people over the last few months, I though I’d provide a TL;DR here:

If you want, you can split people in the US into four classes, based on how they get most of their income:

  1. The investment class (income derived from returns on investment)
  2. The salary class (who receive a monthly salary)
  3. The wage class (who receive an hourly wage)
  4. The welfare class (who receive welfare payments)

According to JMG, over the last fifty years or so, three of these classes of people have remained roughly where they are; the investment class still receives returns on investment (modulo a recession or two), the salary class still draws a reasonable salary, and life still sucks for people on welfare. But the wage class, to be blunt, has been systematically fucked over this time period. There’s a lot of people there, and it’s this disenfranchised group who sees someone outside the political establishment status quo (Trump) as someone they can get behind. Whether or not Trump is elected in the US, there’s still going to be a whole lot of people out there pissed off with the current state of things, and it’s going to be really interesting to see how this plays out.

You should probably go read the full post, because I doubt I’ve done it justice here, but I don’t think it’s unreasonable to imagine the same (or a similar) thesis might be valid for Australia, so my question is: what, if anything, does this mean for our 2016 federal election?

Russell Coker: Xen CPU Use per Domain again

Sat, 2016-05-14 21:03

8 years ago I wrote a script to summarise Xen CPU use per domain [1]. Since then changes to Xen required changes to the script. I have new versions for Debian/Wheezy (Xen 4.1) and Debian/Jessie (Xen 4.4).

Here’s a new script for Debian/Wheezy:

#!/usr/bin/perl
use strict;

open(LIST, "xm list --long|") or die "Can't get list";

my $name = "Dom0";
my $uptime = 0.0;
my $cpu_time = 0.0;
my $total_percent = 0.0;
my $cur_time = time();

open(UPTIME, "</proc/uptime") or die "Can't open /proc/uptime";
my @arr = split(/ /, <UPTIME>);
$uptime = $arr[0];
close(UPTIME);

my %all_cpu;

while(<LIST>)
{
  chomp;
  if($_ =~ /^\)/)
  {
    my $cpu = $cpu_time / $uptime * 100.0;
    if($name =~ /Domain-0/)
    {
      printf("%s uses %.2f%% of one CPU\n", $name, $cpu);
    }
    else
    {
      $all_cpu{$name} = $cpu;
    }
    $total_percent += $cpu;
    next;
  }
  $_ =~ s/\).*$//;
  if($_ =~ /start_time /)
  {
    $_ =~ s/^.*start_time //;
    $uptime = $cur_time – $_;
    next;
  }
  if($_ =~ /cpu_time /)
  {
    $_ =~ s/^.*cpu_time //;
    $cpu_time = $_;
    next;
  }
  if($_ =~ /\(name /)
  {
    $_ =~ s/^.*name //;
    $name = $_;
    next;
  }
}
close(LIST);

sub hashValueDescendingNum {
  $all_cpu{$b} <=> $all_cpu{$a};
}

my $key;

foreach $key (sort hashValueDescendingNum (keys(%all_cpu)))
{
  printf("%s uses %.2f%% of one CPU\n", $key, $all_cpu{$key});
}

printf("Overall CPU use approximates %.1f%% of one CPU\n", $total_percent);

Here’s the script for Debian/Jessie:

#!/usr/bin/perl

use strict;

open(UPTIME, "xl uptime|") or die "Can't get uptime";
open(LIST, "xl list|") or die "Can't get list";

my %all_uptimes;

while(<UPTIME>)
{
  chomp $_;

  next if($_ =~ /^Name/);
  $_ =~ s/ +/ /g;

  my @split1 = split(/ /, $_);
  my $dom = $split1[0];
  my $uptime = 0;
  my $time_ind = 2;
  if($split1[3] eq "days,")
  {
    $uptime = $split1[2] * 24 * 3600;
    $time_ind = 4;
  }
  my @split2 = split(/:/, $split1[$time_ind]);
  $uptime += $split2[0] * 3600 + $split2[1] * 60 + $split2[2];
  $all_uptimes{$dom} = $uptime;
}
close(UPTIME);

my $total_percent = 0;

while(<LIST>)
{
  chomp $_;

  my $dom = $_;
  $dom =~ s/ .*$//;

  if ( $_ =~ /(\d+)\.[0-9]$/ )
  {
    my $percent = $1 / $all_uptimes{$dom} * 100.0;
    $total_percent += $percent;
    printf("%s uses %.2f%% of one CPU\n", $dom, $percent);
  }
  else
  {
    next;
  }
}

printf("Overall CPU use approximates  %.1f%% of one CPU\n", $total_percent);

Related posts:

  1. Xen CPU use per Domain The command “xm list” displays the number of seconds of...
  2. Running a Shell in a Daemon Domain allow unconfined_t logrotate_t:process transition; allow logrotate_t { shell_exec_t bin_t }:file...
  3. Securely Killing Processes Joey Hess wrote on Debian-devel about the problem of init...

Chris Smart: Signal Return Orientated Programming attacks

Fri, 2016-05-13 23:07

When a process is interrupted, the kernel suspends it and stores its state in a sigframe which is placed on the stack. The kernel then calls the appropriate signal handler code and after a sigreturn system call, reads the sigframe off the stack, restores state and resumes the process. However, by crafting a fake sigframe, we can trick the kernel into executing something else.

My friend Rashmica, an intern at OzLabs, has written an interesting blog post about this for some work she’s doing with the POWER architecture in Linux.

sthbrx - a POWER technical blog: SROP Mitigation

Fri, 2016-05-13 22:22
What is SROP?

Sigreturn Oriented Programming - a general technique that can be used as an exploit, or as a backdoor to exploit another vulnerability.

Okay, but what is it?

Yeah... Let me take you through some relevant background info, where I skimp on the details and give you the general picture.

In Linux, software interrupts are called signals. More about signals here! Generally a signal will convey some information from the kernel and so most signals will have a specific signal handler (some code that deals with the signal) setup.

Signals are asynchronous - ie they can be sent to a process/program at anytime. When a signal arrives for a process, the kernel suspends the process. The kernel then saves the 'context' of the process - all the general purpose registers (GPRs), the stack pointer, the next-instruction pointer etc - into a structure called a 'sigframe'. The sigframe is stored on the stack, and then the kernel runs the signal handler. At the very end of the signal handler, it calls a special system call called 'sigreturn' - indicating to the kernel that the signal has been dealt with. The kernel then grabs the sigframe from the stack, restores the process's context and resumes the execution of the process.

This is the rough mental picture you should have:

Okay... but you still haven't explained what SROP is..?

Well, if you insist...

The above process was designed so that the kernel does not need to keep track of what signals it has delivered. The kernel assumes that the sigframe it takes off the stack was legitimately put there by the kernel because of a signal. This is where we can trick the kernel!

If we can construct a fake sigframe, put it on the stack, and call sigreturn, the kernel will assume that the sigframe is one it put there before and will load the contents of the fake context into the CPU's registers and 'resume' execution from where the fake sigframe tells it to. And that is what SROP is!

Well that sounds cool, show me!

Firstly we have to set up a (valid) sigframe:

By valid sigframe, I mean a sigframe that the kernel will not reject. Luckily most architectures only examine a few parts of the sigframe to determine the validity of it. Unluckily, you will have to dive into the source code to find out which parts of the sigframe you need to set up for your architecture. Have a look in the function which deals with the syscall sigreturn (probably something like sys_sigreturn() ).

For a real time signal on a little endian powerpc 64bit machine, the sigframe looks something like this:

struct rt_sigframe { struct ucontext uc; unsigned long _unused[2]; unsigned int tramp[TRAMP_SIZE]; struct siginfo __user *pinfo; void __user *puc; struct siginfo info; unsigned long user_cookie; /* New 64 bit little-endian ABI allows redzone of 512 bytes below sp */ char abigap[USER_REDZONE_SIZE]; } __attribute__ ((aligned (16)));

The most important part of the sigframe is the context or ucontext as this contains all the register values that will be written into the CPU's registers when the kernel loads in the sigframe. To minimise potential issues we can copy valid values from the current GPRs into our fake ucontext:

register unsigned long r1 asm("r1"); register unsigned long r13 asm("r13"); struct ucontext ctx = { 0 }; /* We need a system thread id so copy the one from this process */ ctx.uc_mcontext.gp_regs[PT_R13] = r13; /* Set the context's stack pointer to where the current stack pointer is pointing */ ctx.uc_mcontext.gp_regs[PT_R1] = r1;

We also need to tell the kernel where to resume execution from. As this is just a test to see if we can successfully get the kernel to resume execution from a fake sigframe we will just point it to a function that prints out some text.

/* Set the next instruction pointer (NIP) to the code that we want executed */ ctx.uc_mcontext.gp_regs[PT_NIP] = (unsigned long) test_function;

For some reason the sys_rt_sigreturn() on little endian powerpc 64bit checks the endianess bit of the ucontext's MSR register, so we need to set that:

/* Set MSR bit if LE */ ctx.uc_mcontext.gp_regs[PT_MSR] = 0x01;

Fun fact: not doing this or setting it to 0 results in the CPU switching from little endian to big endian! For a powerpc machine sys_rt_sigreturn() only examines ucontext, so we do not need to set up a full sigframe.

Secondly we have to put it on the stack:

/* Set current stack pointer to our fake context */ r1 = (unsigned long) &ctx;

Thirdly, we call sigreturn:

/* Syscall - NR_rt_sigreturn */ asm("li 0, 172\n"); asm("sc\n");

When the kernel receives the sigreturn call, it looks at the userspace stack pointer for the ucontext and loads this in. As we have put valid values in the ucontext, the kernel assumes that this is a valid sigframe that it set up earlier and loads the contents of the ucontext in the CPU's registers "and resumes" execution of the process from the address we pointed the NIP to.

Obviously, you need something worth executing at this address, but sadly that next part is not in my job description. This is a nice gateway into the kernel though and would pair nicely with another kernel vulnerability. If you are interested in some more in depth examples, have a read of this paper.

So how can we mitigate this?

Well, I'm glad you asked. We need some way of distinguishing between sigframes that were put there legitimately by the kernel and 'fake' sigframes. The current idea that is being thrown around is cookies, and you can see the x86 discussion here.

The proposed solution is to give every sighand struct a randomly generated value. When the kernel constructs a sigframe for a process, it stores a 'cookie' with the sigframe. The cookie is a hash of the cookie's location and the random value stored in the sighand struct for the process. When the kernel receives a sigreturn, it hashes the location where the cookie should be with the randomly generated number in sighand struct - if this matches the cookie, the cookie is zeroed, the sigframe is valid and the kernel will restore this context. If the cookies do not match, the sigframe is not restored.

Potential issues:

  • Multithreading: Originally the random number was suggested to be stored in the task struct. However, this would break multi-threaded applications as every thread has its own task struct. As the sighand struct is shared by threads, this should not adversely affect multithreaded applications.
  • Cookie location: At first I put the cookie on top of the sigframe. However some code in userspace assumed that all the space between the signal handler and the sigframe was essentially up for grabs and would zero the cookie before I could read the cookie value. Putting the cookie below the sigframe was also a no-go due to the ABI-gap (a gap below the stack pointer that signal code cannot touch) being a part of the sigframe. Putting the cookie inside the sigframe, just above the ABI gap has been fine with all the tests I have run so far!
  • Movement of sigframe: If you move the sigframe on the stack, the cookie value will no longer be valid... I don't think that this is something that you should be doing, and have not yet come across a scenario that does this.

For a more in-depth explanation of SROP, click here.

sthbrx - a POWER technical blog: Tell Me About Petitboot

Fri, 2016-05-13 15:23

A Google search for 'Petitboot' brings up results from a number of places, some describing its use on POWER servers, others talking about how to use it on the PS3, in varying levels of detail. I tend to get a lot of general questions about Petitboot and its behaviour, and have had a few requests for a broad "Welcome to Petitboot" blog, suggesting that existing docs deal with more specific topics.. or that people just aren't reading them :)

So today we're going to take a bit of a crash course in the what, why, and possibly how of Petitboot. I won't delve too much into technical details, and this will be focussed on Petitboot in POWER land since that's where I spend most of my time. Here we go!

What

Aside from a whole lot of firmware and kernel logs flying past, the first thing you'll see when booting a POWER serverIn OPAL mode at least... is Petitboot's main menu:

Petitboot is the first interact-able component a user will see. The word 'BIOS' gets thrown around a lot when discussing this area, but that is wrong, and the people using that word are wrong.

When the OPAL firmware layer Skiboot has finished its own set up, it loads a certain binary (stored on the BMC) into memory and jumps into it. This could hypothetically be anything, but for any POWER server right now it is 'Skiroot'. Skiroot is a full Linux kernel and userspace, which runs Petitboot. People often say Petitboot when they mean Skiroot - technically Petitboot is the server and UI processes that happen to run within Skiroot, and Skiroot is the full kernel and rootfs package. This is more obvious when you look at the op-build project - Petitboot is a package built as part of the kernel and rootfs created by Buildroot.

Petitboot is made of two major parts - the UI processes (one for each available console), and the 'discover' server. The discover server updates the UI processes, manages and scans available disks and network devices, and performs the actual booting of host operating systems. The UI, running in ncurses, displays these options, allows the user to edit boot options and system configuration, and tells the server which boot option to kexec.

Why

The 'why' delves into some of the major architectural differences between a POWER machine and your average x86 machine which, as always, could spread over several blog posts and/or a textbook.

POWER processors don't boot themselves, instead the attached Baseboard Management Controller (BMC) does a lot of low-level poking that gets the primary processor into a state where it is ready to execute instructions. PowerVM systems would then jump directly into the PHYP hypervisor - any subsequent OS, be it AIX or Linux, would then run as a 'partition' under this hypervisor.

What we all really want though is to run Linux directly on the hardware, which meant a new boot process would have to be thought up while still maintaining compatibility with PowerVM so systems could be booted in either mode. Thus became OPAL, and its implementation Skiboot. Skipping over so much detail, the system ends up booting into Skiboot which acts as our firmware layer. Skiboot isn't interactive and doesn't really care about things like disks, so it loads another binary into memory and executes it - Skiroot!

Skiroot exists as an alternative to writing a whole new bootloader just for POWER in OPAL mode, or going through the effort to port an existing bootloader to understand the specifics of POWER. Why do all that when Linux already exists and already knows how to handle disks, network interfaces, and a thousand other things? Not to mention that when Linux gains support for fancy new devices so do we, and adding new features of our own is as simple as writing your average Linux program.

Skiroot itself (not including Skiboot) is roughly comparable to UEFI, or at least much more so than legacy BIOS implementations. But whereas UEFI tends to be a monolithic blob of fairly platform-specific code (in practice), Skiroot is simply a small Linux environment that anyone could put together with Buildroot.

A much better insight into the development and motivation behind Skiroot and Petitboot is available in Jeremy's LCA2013 talk

Back to Petitboot

Petitboot is the part of the 'bootloader' that did need to be written, because users probably wouldn't be too thrilled if they had to manually mount disks and kexec their kernels every time they booted the machine. The Petitboot server process mounts any available disk devices and scans them for available operating systems. That's not to say that it scans the entire disk, because otherwise you could be waiting for quite some time, but rather it looks in a list of common locations for bootloader configuration files. This is handy because it means the operating system doesn't need to have any knowledge of Petitboot - it just uses its usual install scripts and Petitboot reads them to know what is available. At the same time Petitboot makes PXE requests on configured network interfaces so we can netboot, and allows these various sources to be given relative priorities for auto-boot, plus a number of other ways to specially configure booting behaviour.

A particularly neat feature of existing in a Linux environment is the ability to easily recover from boot problems; whereas on another system you might need to use a Live CD to fix a misconfiguration or recover a broken filesystem, in Skiroot you can just drop to the shell and fix the issue right there.

In summary, Petitboot/Skiroot is a small but capable Linux environment that every OPAL POWER machine boots into, gathering up all the various local and remote boot possibilities, and presenting them to you in a state-of-the-art ncurses interface. Petitboot updates all the time, and if you come across a feature that you think Petitboot is missing, patches are very welcome at petitboot@lists.ozlabs.org (or hassle me on IRC)!

sthbrx - a POWER technical blog: SROP Mitigation

Fri, 2016-05-13 15:23
What is SROP?

Sigreturn Oriented Programming - a general technique that can be used as an exploit, or as a backdoor to exploit another vulnerability.

Okay, but what is it?

Yeah... Let me take you through some relevant background info, where I skimp on the details and give you the general picture.

In Linux, software interrupts are called signals. More about signals here! Generally a signal will convey some information from the kernel and so most signals will have a specific signal handler (some code that deals with the signal) setup.

Signals are asynchronous - ie they can be sent to a process/program at anytime. When a signal arrives for a process, the kernel suspends the process. The kernel then saves the 'context' of the process - all the general purpose registers (GPRs), the stack pointer, the next-instruction pointer etc - into a structure called a 'sigframe'. The sigframe is stored on the stack, and then the kernel runs the signal handler. At the very end of the signal handler, it calls a special system call called 'sigreturn' - indicating to the kernel that the signal has been dealt with. The kernel then grabs the sigframe from the stack, restores the process's context and resumes the execution of the process.

This is the rough mental picture you should have:

Okay... but you still haven't explained what SROP is..?

Well, if you insist...

The above process was designed so that the kernel does not need to keep track of what signals it has delivered. The kernel assumes that the sigframe it takes off the stack was legitimately put there by the kernel because of a signal. This is where we can trick the kernel!

If we can construct a fake sigframe, put it on the stack, and call sigreturn, the kernel will assume that the sigframe is one it put there before and will load the contents of the fake context into the CPU's registers and 'resume' execution from where the fake sigframe tells it to. And that is what SROP is!

Well that sounds cool, show me!

Firstly we have to set up a (valid) sigframe:

By valid sigframe, I mean a sigframe that the kernel will not reject. Luckily most architectures only examine a few parts of the sigframe to determine the validity of it. Unluckily, you will have to dive into the source code to find out which parts of the sigframe you need to set up for your architecture. Have a look in the function which deals with the syscall sigreturn (probably something like sys_sigreturn() ).

For a real time signal on a little endian powerpc 64bit machine, the sigframe looks something like this:

struct rt_sigframe { struct ucontext uc; unsigned long _unused[2]; unsigned int tramp[TRAMP_SIZE]; struct siginfo __user *pinfo; void __user *puc; struct siginfo info; unsigned long user_cookie; /* New 64 bit little-endian ABI allows redzone of 512 bytes below sp */ char abigap[USER_REDZONE_SIZE]; } __attribute__ ((aligned (16)));

The most important part of the sigframe is the context or ucontext as this contains all the register values that will be written into the CPU's registers when the kernel loads in the sigframe. To minimise potential issues we can copy valid values from the current GPRs into our fake ucontext:

register unsigned long r1 asm("r1"); register unsigned long r13 asm("r13"); struct ucontext ctx = { 0 }; /* We need a system thread id so copy the one from this process */ ctx.uc_mcontext.gp_regs[PT_R13] = r13; /* Set the context's stack pointer to where the current stack pointer is pointing */ ctx.uc_mcontext.gp_regs[PT_R1] = r1;

We also need to tell the kernel where to resume execution from. As this is just a test to see if we can successfully get the kernel to resume execution from a fake sigframe we will just point it to a function that prints out some text.

/* Set the next instruction pointer (NIP) to the code that we want executed */ ctx.uc_mcontext.gp_regs[PT_NIP] = (unsigned long) test_function;

For some reason the sys_rt_sigreturn() on little endian powerpc 64bit checks the endianess bit of the ucontext's MSR register, so we need to set that:

/* Set MSR bit if LE */ ctx.uc_mcontext.gp_regs[PT_MSR] = 0x01;

Fun fact: not doing this or setting it to 0 results in the CPU switching from little endian to big endian! For a powerpc machine sys_rt_sigreturn() only examines ucontext, so we do not need to set up a full sigframe.

Secondly we have to put it on the stack:

/* Set current stack pointer to our fake context */ r1 = (unsigned long) &ctx;

Thirdly, we call sigreturn:

/* Syscall - NR_rt_sigreturn */ asm("li 0, 172\n"); asm("sc\n");

When the kernel receives the sigreturn call, it looks at the userspace stack pointer for the ucontext and loads this in. As we have put valid values in the ucontext, the kernel assumes that this is a valid sigframe that it set up earlier and loads the contents of the ucontext in the CPU's registers "and resumes" execution of the process from the address we pointed the NIP to.

Obviously, you need something worth executing at this address, but sadly that next part is not in my job description. This is a nice gateway into the kernel though and would pair nicely with another kernel vulnerability. If you are interested in some more in depth examples, have a read of this paper.

So how can we mitigate this?

Well, I'm glad you asked. We need some way of distinguishing between sigframes that were put there legitimately by the kernel and 'fake' sigframes. The current idea that is being thrown around is cookies, and you can see the x86 discussion here.

The proposed solution is to give every sighand struct a randomly generated value. When the kernel constructs a sigframe for a process, it stores a 'cookie' with the sigframe. The cookie is a hash of the cookie's location and the random value stored in the sighand struct for the process. When the kernel receives a sigreturn, it hashes the location where the cookie should be with the randomly generated number in sighand struct - if this matches the cookie, the cookie is zeroed, the sigframe is valid and the kernel will restore this context. If the cookies do not match, the sigframe is not restored.

Potential issues:

Multithreading: Originally the random number was suggested to be stored in the task struct. However, this would break multi-threaded applications as every thread has its own task struct. As the sighand struct is shared by threads, this should not adversely affect multithreaded applications. Cookie location: At first I put the cookie on top of the sigframe. However some code in userspace assumed that all the space between the signal handler and the sigframe was essentially up for grabs and would zero the cookie before I could read the cookie value. Putting the cookie below the sigframe was also a no-go due to the ABI-gap (a gap below the stack pointer that signal code cannot touch) being a part of the sigframe. Putting the cookie inside the sigframe, just above the ABI gap has been fine with all the tests I have run so far! Movement of sigframe: If you move the sigframe on the stack, the cookie value will no longer be valid... I don't think that this is something that you should be doing, and have not yet come across a scenario that does this.

For a more in-depth explanation of SROP, click here.